Performing Logical Comparisons of Data


There are times when you want to create a visualization that will show the relationship of various sets of data. Some of this can be done using a details visualizations, but it is also nice to be able to do this in a single visualization. For this, we will use the Pie Chart.

Let's assume we are analyzing stock data.  Inside our data table, we have a column called Signal, which is an indicator from analysts whether the stock should be bought or sold and also a column called Trend, which describes the current trend of the stock, whether it is going up (designated with a + sign) or down  (designated with a ' sign).  If we would like to do a comparison of both of these columns to see their relationships, we can create a Pie chart which colors based off both the Signal and Trend columns.
 


When we do that, you can see the finished Pie Chart:


This chart shows us that 54.2% of the stocks signaling as Sell are on the down trend, and only 19% are on an upward trend. For stocks signaling as Buy, it is about even split between downward and upward trends.

What if your data is not structured properly to do this type of comparison? For example what if there was no column in your raw data that comprised what you wanted to compare? You do have a couple options.

First you can create an expression on one of the color axis. For example, let's assume we are looking at the same stock data and we want to compare the Signal (which is in our dataset) and also which stocks have the highest volume.  We do have a column in the data table which is for the stocks Avg Volume, but its not categorical data, its continuous.  We can use an expression to make it categorical based off some logic.  We can say we want all stocks with average volumes over the UOF (Upper Outer Fence) to be labeled as Yes and then all others as No.

To do this we right click on one of the axis in the Color legend, and select Custom Expression.



In the expression dialog, we want to use the following expression:

  if(([Avg Vol])>=(UOF([Avg Vol])),"Yes","No") As [Highest Volumn]

Since this color axis also has another column in it, the Signal column, we will need to use the NEST keyword.  For more information on the NEST keyword, see the link on it to the help file or consider taking our Computational Analytics training course to master the entire expression language.

The final expression will look like the following:



 
And the resulting Pie Chart looks like the following:
 

Let's show another example from another domain.  This data set contains generic sales values for 'My Product' and Total Market sales.  Customers are identified with a numeric CUSTOMER CODE, and the customers are divided into four different Class groups based on corporate opinion on how the customer should be approached.  In addition, the Potential of each customer is rated as High or Low, and information about the location (country and global region) of each customer is provided.



 
When then may want to identify 'key customers' as those who are higher than one standard deviation above the average series value.  For this , we can create an expression , either as a Calculated Column or inside the Color Axis:

if(([My Product Sales])>(Avg([My Product Sales]) + StdDev([My Product Sales])),"My Key Customers","No") As [My Product Customers]


We can then do the same thing for Total Market Sales instead of My Product Sales to get those sales which are higher than one standard deviation above the average series value for all the market sales.

if(([Total Market Sales])>(Avg([Total Market Sales]) + StdDev([Total Market Sales])),"Total Market Key Customers","No") As [Total Market Customers]

If we put these expressions on the color axis, we can compare the logical groupings of the two.



Representing a logical AND, how many customers are key in both My Product Sales and Total Market Sales?  The Answer is shown in blue and is 13.


If you want to know how many customers are key in My Product Sales but not in Total Market Sales, the answer is shown in red and is 33.

If you want to know how many customers are key in Total Market Sales but not in My Product Sales, the answer is shown in yellow and is 34.

If you want to know how many customers are not key in either My Product Sales or Total Market Sales, the answer is shown in green and is 220.

In our example above, we were able to categorize the column using an expression, but let's say the column you want to use as pie of the logical comparison cannot easily be used in an expression. Then, we can rely on the Tag Panel to create a Tag Collection for this column.   Below is a picture showing the same exact data set, but the records above one standard deviation are manually marked and tagged: