Unpivoting and Pivoting your Data to make it suitable for analysis


Data preprocessing describes the concept of processing raw data to prepare it for analysis. The preprocessing will transform the data into a format that will be more easily and effectively processed for the purpose of the data analysis. There are a number of different tools and methods used for preprocessing.  Some people will perform preprocessing in a database, others will perform it in a 3rd party application, and yet another solution is to use Spotfire to both pre process your raw data and also analyze the output.

Even though Spotfire is quite good at supporting large scale enterprise-wide analytical applications, it is also useful for smaller-scale analysis.  Typically, in this scenario, the data that needs to be quickly analyzed is not in a suitable form, but there is too much overhead pre-processing them in another application.  This is where some of the built-in capabilities come in handy. Spotfire provides a variety of capabilities to pre-process data before (and after) it is loaded inside Spotfire.   One such capability we will look at in this article is unpivoting and pivoting.

 
Assume you have survey data which asks a series of questions where responders can choose one of many possible responses.  When the data is collected, it may be stored in a data table as shown below:

 

However, we want to analyze the distribution of results for each individual question. To do this, we need to transform the data.  But how? This is actually a two step process.

The first process is we need to normalize the data so that we have a single column for the questions asked in the survey and then another single column for the responses. We do this by using the 'unpivot' transformation. We select the questions we want to analyze as the 'Columns to transform' as shown in the screen shot below:


 

 

At this point we have the normalized data we need. The next step is we need to then pivot the data to have a row for each unique question which has fields showing the count for each response for the question.

You can do this using the 'pivot' transformation to pivot as shown below.

 

 

Using TIBCO Spotfire Professional, you can combine both the unpivot and pivot into the data access process so this process happens before the data is even loaded into TIBCO Spotfire (or you can also do this to the raw data set after it is loaded in TIBCO Spotfire so you can have both the raw data and the transformed data into Spotfire at once):

 

 

The final result is one row for each question and a column for each response:

 

 

If you have many different questions with different responses, then you should unpivot and pivot each grouping so they end up as a different data table in Spotfire.