Performing in-database connections and calculations with TIBCO Spotfire


Included in the latest release of TIBCO Spotfire (5.0) is the ability to connect to external data sources so that all calculations occur in-database rather than in-memory. This allows users to analyze larger data volumes as these calculations can be done before the data is passed back into Spotfire.

Let’s examine a use case for this functionality.  Assume you are trying to analyze sales data for your store chain. You may want to start your analysis by showing the average sales across each region as shown below.

 

If this dataset has 1.6 million rows, is this really needed to start your analysis? How many data points are really used in visualization above? 2 columns (Avg (Total) and Business Location) and 4 rows (Boston, Los Angeles, New York, and Seattle).

Why should you bring all 1.6 million rows back into Spotfire only to aggregate it  in your starting visualization which only requires 4 rows?

What you can do in this case is have the database perform the aggregations and only return the 4 rows.  Before version 5.0 you could do this by setting up an Information Link which includes a column that specifies the avg([Total]) and the Group By is set to the Business Location Column.  In additional, the distinct conditioning is applied to the Information Link.

This works well, but when you want to update the aggregation or use more complex group by expressions, you need to update the Information Link (which not every user has access to do).

In 5.0, you can now perform the same concept directly in TIBCO Spotfire Professional.

To start, let’s create a connection to the same SQL Server database we used above. This time we will connect to it as an external datasource so we can do all the calculations in the database.

We do this by going to the File > Add Data Tables menu and then selecting the Connection To option  from the Add  button. You then select which of the supported databases you want to connect to.

 



You will then be presented with a connection dialog. You will enter the connection details to the SQL Server database and select the database you want to connect to.
 



Finally we pick which database tables we want to connect to.
 



When we are done, we have added a new External Datasource in Spotfire Professional.

We can then create the same exact Bar Chart visualization as we created earlier, but only the data that needs to be visualized is returned. Since there is an aggregation on the Y-Axis (Avg),  this means Spotfire generated and optimized a SQL statement based off the visualization’s configuration and then sent the statement over to the datasource to execute. Then , only the resulting data is returned and visualized on the Bar Chart.

If the user adds a Filter and filters some data, Spotfire will generate and optimize a new SQL statement to the datasource, which will re-excecute, perform the calculations on the datasource, and return the  resulting data to be visualized.  



The advantage of using this approach over the Information Link mentioned earlier is that the author can change the aggregations and calculations used without having to update anything in an information model.   

Let’s assume we don’t want to use a simple Bar Chart, but we want to use a Cross Table which shows  average sales across the various departments by age group and gender.

We can create this visualization in Spotfire, and using the External Datasource connection, all calculations are performed in the database directly and just the data needed to visualize the Cross Table is returned.