Friday, 18 November 2011

Optimisation Quiz

A Friday quiz for you to test your Alteryx skills. This is based on a real world module from a colleague at Alteryx which I was looking at earlier this week, and is a question about optimisation.

I have isolated the particular section of the module which we were looking at and have two versions: Module A and Module B. The question is simple: which module is faster, if either, and why?

I've set up a poll on the right side of the site for you to cast your votes and will post the answer and explanation on Monday.

So the 2 modules look like this (I flipped a coin to decide which was A or B so you can't to read anything into the labeling):

Module A
Module B

The modules are identical apart from that single sort. So effectively the question is does that sort make things faster, slower or make no difference?

The input data is unsorted with about 10M records in my example (obviously any optimisation makes more of a difference when there is more work to be done. The purpose of the module is there is a large number of records with a grouping field attached (in this example there are 50 groups) which need to be run through a process in batches.

The summarise works out how many groups there are and feeds that into the control parameter of a batch macro. The batch macro then uses the GroupBy feature to read records from its input in batches based on that control parameter. My example module just writes those records to a file, but for the question it does not matter what the batch process actually does, the answer is the same.

So vote for your answer on the poll on the right and look out for the solution on Monday.