Ranking Over Groups


One common question we hear all the time in training is how a person can rank something over various groups in their dataset. The answer to this depends on your groups and how they are constructed. Ill explain by using two examples.  

First, let's assume your groups are subjects for a film.  

 


 
Since all the records in each desired group (subject) have the same values, we can use the rank function inside our expression language and build a calculated column to define the rank per subject. In this case, let's assume we want to rank the length of each film in each subject, in descending order.



 
The Rank function takes in one mandatory argument and two optional arguments.  The mandatory argument is the column you wish to rank (in this case the Length column). The next argument, which is optional, defines the sort order. The default value is ascending 'asc', and if you want to sort in descending order you use 'desc'.  The final argument, also optional, defines a column you want to group  the rankings by. In this case, we want to group based off the subject column.

This works fine and the length of the movies in each subject are ranked independently of movies in other subjects.  But what if we wanted to group the rankings based off something that does not have unique values, like the Subject column does?  Well, the best answer is to create a new column that does create a unique value for each group.  The example we will show below is using a Date column where we want to group by each month in each year.

In the data shown below, each value in the date column is unique.  

 



 
We can create an intermediary calculated column to create a value per month/year.   To do this, we can retrieve the Year from each date, multiply it by 100, and then add the numeric representation of the Month to it (1 for Jan ,2 for Feb, etc'). Finally we cast it all as an integer and we get a Column which has the same value for all rows of the same month and year.

  

 

You can then use the UID column as the grouping column inside the Rank function: