Creating Multiple Columns From One


Throughout this tip of the week series, we have displayed many solutions which utilized the combination of Property Controls and Script Controls. These two features, when used together, can create endless solutions in Spotfire.  This week we will look at one related to transforming data.

Many times when a user loads their data in an exploratory fashion,  they realize it is not in the correct format required for analysis. Spotfire does have a series of ‘Data Transformations’ that can be applied when you load data, like Pivot, Unpivot, etc…but there are a few scenarios where these will not work.  One specific scenario is when data comes in from a log or similar and a single column includes a comma-separated list of values, like 342,234,324,546, which need to be  broken down into separate columns for each value (4 columns in this case).

We can use the powerful Regular Expression functions included in our Calculation Column expressions to parse the column, but that will only work  when you have a known quantity of delimiters in each column and even then you would have to write the expression for each new column desired manually.


If you wish to automatically detect the number of delimiters and then loop through to create multiple new columns at once, you can utilize the combination of a Property Control and a Script Control.

First, we createa a Drop-down list Property Control which allows you select which column to transform.

 

Then we create a Script Control to break up the selected Column into multiple Columns based off the comma delimiter.  The Script will first create a Column that counts the number of delimiters for each row. It will then loop through all rows in the dataset and create a new Column for each value before each delimiter.So, for the value 342,234,324,546, the result will be one new Column that counts the number of delimiters( 4 in this case) , and then 4 new Columns, for the specific values. For this specific row, the values would be 342, 234, 324, and 546.

The Script to accomplish this is shown below. It assumes the Property which is attached to the Drop down list Property Control is called 'myColumnSelection'

curDT = Document.ActiveDataTableReference
cols = curDT.Columns
targetCol = Document.Properties["myColumnSelection"]

#Create a new column that counts the comma delimiter
myExpression = '1+len(RXReplace(string([${myColumnSelection}]),"([A-Za-z0-9]+)","","g"))'
myNewColName = cols.CreateUniqueName("NumElements")
cols.AddCalculatedColumn(myNewColName, myExpression)

#Get max number of elements
maxElements = curDT.Columns.Item[myNewColName].RowValues.GetMaxValue().ValidValue

if maxElements > 1:
    #Generate Columns upto but not the last item
    index = 1
    while index < maxElements:
        myExpression = 'RXReplace([C],"((\\\d+)[,]){' + str(index) + '}.*","$2","")'
        newCol = targetCol + str(index)
        myNewColName = cols.CreateUniqueName(newCol)
        cols.AddCalculatedColumn(myNewColName, myExpression)
        index = index + 1

    #Generate Column for last item
    myExpression = 'RXReplace([C], "((\\\d+),){' + str(index-1) + '}(.*)", "$3", "")'
    newCol = targetCol + str(index)
    myNewColName = cols.CreateUniqueName(newCol)
    cols.AddCalculatedColumn(myNewColName, myExpression)



While this solution may work well for ad hoc analytics, the down side is that the data is loaded first in Spotfire, and then transformed. A more production ready version of this would leverage the Spotfire SDK to build a Custom Data Transformation. With this approach, the transformation happens before the data is loaded and displayed in Spotfire Professional. In addition, it  would automatically be re-applied as data is reloaded or replaced.