Monday 21 November 2011

Optimisation Quiz Answer

Well the answer is that module B is faster.  That is to say adding an extra sort actually speeds things up. So using my sample data whose counts and sizes you can see in the images:  Module A runs in about 45 seconds while Module B runs in less than 30 seconds.

Which kind of seems counter intuitive, how does adding more processing make things run faster overall?

Well a great place to start looking for optimisations in an Alteryx module is to look at where you are sorting data.  A sort is a pretty resource intensive task, so the less you can do it the quicker your module will run.

In module A, behind the scenes Alteryx is sorting in two places:

  • The summarise needs to sort the data in order to do the group by on the grouping field.
  • The batch macro also needs to sort the data to be able to batch it into chunks.
So module A has two sorts.

But if sorts are bad for run time, then how can adding an extra sort speed things up I hear you ask?  

Well that's the clever bit...  When a tool in Alteryx sorts some data it tells the tools down stream of it that the data is sorted and what it is sorted by.

So back to our example:  We noted above that module A was performing two sorts, but actually it was doing the same sort (by GroupingField) twice.  By adding a sort in where we did the data gets sorted once there.  When the data gets to the summarise tool and the batch macro tool, it is already sorted by the GroupingField and does not need to be sorted again.  Rather than adding a sort in we have actually reduced the number of sorts the module needs to do by one, thus gaining the saving in the run time that we see.

Friday 18 November 2011

Optimisation Quiz

A Friday quiz for you to test your Alteryx skills. This is based on a real world module from a colleague at Alteryx which I was looking at earlier this week, and is a question about optimisation.

I have isolated the particular section of the module which we were looking at and have two versions: Module A and Module B. The question is simple: which module is faster, if either, and why?

I've set up a poll on the right side of the site for you to cast your votes and will post the answer and explanation on Monday.

So the 2 modules look like this (I flipped a coin to decide which was A or B so you can't to read anything into the labeling):

Module A
Module B

The modules are identical apart from that single sort. So effectively the question is does that sort make things faster, slower or make no difference?

The input data is unsorted with about 10M records in my example (obviously any optimisation makes more of a difference when there is more work to be done. The purpose of the module is there is a large number of records with a grouping field attached (in this example there are 50 groups) which need to be run through a process in batches.

The summarise works out how many groups there are and feeds that into the control parameter of a batch macro. The batch macro then uses the GroupBy feature to read records from its input in batches based on that control parameter. My example module just writes those records to a file, but for the question it does not matter what the batch process actually does, the answer is the same.

So vote for your answer on the poll on the right and look out for the solution on Monday.

Friday 11 November 2011

Alteryx 7.0 - New Features Announced To Date

As I mentioned in my last post I think Alteryx 7.0 is going to be the most exciting release that I have seen (and that's not just because I have been working on it!)

I thought it would be worth a brief re-cap of the new features announced so far (complete with links to the Alteryx Engine Works blog):

Guzzler Improvements (link)
Some improvements to the core drive time engine in Alteryx's functionality offering: "which allows for more accurate drive time and drive radius calculations"

XML Parsing (link)
A feature that I know users have been asking for, for a long time.  Alteryx 7.0 sees XML support in the input tool along with a new XML parsing tool.

Alteryx web and the Private Cloud (link)
Alteryx 6.2 saw the release of Alteryx web On-Premise solution, another hugely exciting development for Alteryx. "It is an installable web application that allows end users to upload Alteryx wizards, manage users and run wizards all via a web browser." And being On-Premise clients can keep all of their data and modules secure on their on servers within their company.  Alteryx 7.0 promises more exciting features around "Permissions, Scheduling, Active Directory and viewing yxdbs and PCXML files."

Alteryx Map Changes (link)
This one is close to my heart as it is the main feature I have been working on since I joined Alteryx last February.  It is actually the second major re-write the mapping tool has seen and hopefully fulfills many of the requests we have received for mapping.

Input/Output Enhancements (link)
A new split button and latest used files gives you faster access to your files and database connections.  Plus the new Alias feature to manage your data connections and passwords.

Chaos Reigns Within

Well it has been a rather long time since I have last posted here.  What can I say?  I've been busy.

But I am back; and with a few changes.

First one is we have a name change.  "UK Alteryx User" just wasn't quite right: I'm not in the UK anymore and I'm not really a user anymore.  So I am proud to announce the new name "Chaos Reigns Within" and the new URL (don't worry all the old links should redirect just fine).  If you know what the new name means, then congratulations you can call yourself an advanced Alteryx user.

Other changes you will notice are a restyle of the site and a legal disclaimer, which I can't say I really like, but that's the world we live in.

Other than that I'm planning to post more regularly here and am looking forward to the release of Alteryx 7.0 next year which I think is going to the most exciting and feature packed release since I have been using the tool.