Friday 19 December 2014

Moving Summarize

This is a tool I have been meaning to write for a long time.  It was requested at Inspire this year and I have had requests for it before that.

The basic premise is the ability to create a moving average (or in fact any moving summarize method).  For background on what a moving average is and why it is useful check out the excellent Wikipedia article on the subject http://en.wikipedia.org/wiki/Moving_average.

In simple terms a moving average creates for each row in a data-set the average of the last x number of rows.  The easiest way to do this without my new macro is to use a multi-row formula and write a formula to calculate the average.  Which works fine until you want a moving average of the last 30 rows for example, where upon writing the formula becomes a little tedious.

If you want to jump straight to the macro you can download it here.  If you are interested in how it works read on.

The basic premise of my macro is to use a summarize tool to calculate the moving summaries. Obvious huh?

So if we imagine a simple dataset with 9 records and we want to do a moving average of the previous 3 rows, we attach a RecordId and use a formula to group the records into groups of 3:

Ceil([RecordID]/3)


Then we can use a summarize tool grouping on the "Grouping formula" field and we have calculated our moving averages!



Wasn't that easy?  Well... OK... Those of you who were paying attention might be saying to yourself "wait a minute we've only calculated three of the answers!" and you would be right.  To be precise we have calculated the answer for rows 3, 6, and 9.  So how are we going to calculate the others?

Well this is when the macro gets interesting. We are going to use a batch macro to do that same process again and again until we have calculated all of our answers.  But of course we can't just run the exact same process or we'd just keep calculating the exact same results!  So each time we go round we "throw away" the first n records where n is the iteration number we are on - 1.

So second time round the loop.  Iteration 2 we throw away the first record and repeat:


Now we have calculated the answer for rows 4 and 7.  We also calculated a number that we didn't need (the 29.66667) but we will discard that at the end.

Then one last time round the loop.  Iteration 3 so discard the first 2 records an repeat:


Gives us the answer for rows 5 and 8 and we have all the answers we need.  Rows 1 and 2 don't have an answer as there are not three previous rows to summarize.

Then all we need to do is get all of the answers back together which we can do using this formula to map our new "Grouping formula" field back to our original RecordIds:

[Grouping formula]*3+[Engine.IterationNumber]

The great thing about this solution is it works as well for moving summarizes of 50 records as it does for 3 and also will work for any summarize method.  Moving Concatenate? Moving Create Centroid?

The final solution is then slightly more complex as I decided to give the user the ability to choose how many rows forwards and backwards they want to include in their moving summarize.  But that is just a case of working out how big a moving summarize we need to calculate and then appropriately offsetting the solution.

This macro will be included in the 2014 Q4 release of the CReW macro pack which should be out early next year or download it here now.