Predictions using Multiple Linear Regression in TIBCO Spotfire


Multiple linear regression analysis is one of the most commonly used statistical modeling techniques in the business world for predictions.  In this tip we will look at how to do multiple linear regression analysis inside TIBCO Spotfire (from version 5.0 on). In this use case, assume a hotel chain is looking to understand where to build its next hotel.  They have identified 20 possible site locations and their goal is to select the one which will give them the highest possible profit margin. They have a hypothesis that the following metrics are explanatory (or predictor) variables for making predictions to hotel’s profit margin:


  • The number of competing hotel rooms within a 5 mile area

  • Distance to the nearest hotel

  • Size of local office space in a 10 mile area

  • Number of students enrolled in colleges within a 10 mile area

  • Median household income of town locals

  • Distance to town center

To start with, we take a sample of already established hotels in the chain and load into Spotfire.  The data includes each hotel’s profit margin as well as values for the variables listed above.

Now we can apply one of the regression models from the Tools menu in Spotfire Professional (starting in version 5.0).

 


In this case, we will be doing regression modeling, so we choose that tool.  In the resulting dialog, we configure the inputs for our model. 

Our response column, or the column we wish to predict, is the profit margin column, and the predictor columns are the variables we listed earlier. We leave the regression type as linear.

After we fill out the fields in the tool’s dialog, we click OK and a model will be generated for us to use.

When the model is generated we also get a few new Data Tables, model summary information and Visualizations to help us evaluate the model.  We can create additional models using different combinations of the predictor columns till we find the best model.

Once you have the model you want to use, you can now load a separate Data Table that includes metrics from the various lots which you may purchase for your new hotel.

You can then predict the profit margin for these lots, using the model you previously created, by going to the Insert menu and selecting the ‘Predicted Columns…’ menu


In that menu, you will select the model you wish to use for the prediction, which Data Table contains the data you want to predict, and then you need to match the Columns from the model’s Data Table to the columns from the predicted Data Table (in our example the column names for the various metrics , like Office, are identical in both Data Tables).
 

When you complete this, a new column called ‘Predicted’ will be added to all your potential site locations. From here you can see which site is predicted to have the highest profit margin.

You can then analyze additional details about the site location’s to determine the best location.  This is done by mashing up with another data source specific to each site location which includes variables for potential taxes, price of the lot to purchase, expected building costs, etc…

For more information on other types of regression and classification modeling please take our free v4.5 to v5.0 Delta Training Jumpstart at http://spottrain.tibco.com/sln/course/view.php?id=114. Click the ‘Proceed without an Account’ button to login as a guest if you do not have an account.