Retrieving values from one column to use in an expression in another Column


A couple weeks ago we did a post that discussed how to rank something over various groups in their dataset. As a variation of this, many situations arise where customs need to find values in a specific column inside an expression available in another column. We will explain this scenario with an example.

Let's assume we have the following data set:

 


 

As part of the analytics required for a workflow, we may need to retrieve the value from the Volume column that corresponds to the last Date entry per month.  This value will eventually be used in another expression. So , for example, using the data above, for November 2009, we want to use the Volume value from Nov 20,2009 (the last entry in November). This value is 32212.


First, we need to use what we learned in our post from a couple weeks ago to create a unique id column for each month. For this we can create a calculated column using the expression below , and call the column UID:

      integer(Year([Date])*100 + Month([date]))

 

Now, what we need to do is ideally use the expression below to get the value from the Volume column:

      Sum(if(([Date])=(Max([Date]) OVER [UID]),[Volume],0))

However, this will not work, as we cannot put the Max([Date]) aggregate function inside another expression like this.

To solve this, we need to add an intermediate step, which is to create another calculated column. This calculated column will contain the last date entry per each month:

      Max([Date]) OVER UID


 


Once we have this, assuming the newly created column is called 'Max Date per Month', we can use the following expression to retrieve the value from the Volume column for the last entry in each Month:

    if(([Date]) = ([Max Date per Month]),[Volume],0)

 


To go one step further, what if we wanted each row to have  the value corresponding to the last entry for that month, so all rows from November 2009 would have  the value 32212? For this, we would need to add another OVER statement to group by the Max Date per Month.

    Sum(if(([Date])=([Max Date per Month]),[Volume],0)) OVER ([Max Date per Month])

Since this has an OVER method in it, we need to add an aggregation method.  For this, we used Sum.