Tuesday, 24 January 2012

How To Generate Weighted Centroids in Alteryx

Just before Christmas I received a query from a colleague I used to work with at Experian UK who now works for Experian Norway.  He knew he could find the centroid of a polygon in Alteryx using the spatial info tool, but wanted to be able to calculate the Population Weighted Centroid of a polygon.  Well, off the top of my head I didn't know how to do that, but a quick ask around the office yeilded a fairly simple algorithm for the calculation and now a few weeks later I have packaged that algorithm up into a simple to use macro.  So here it is:

http://downloads.chaosreignswithin.com/macros/6.0/SpatialWeightedAverage.yxzp


The module is actually fairly simple and works by taking a weighted average on the x and y co-ordinates of the spatial points.  To test it out I took the US Census Data 2010 Population estimates at Block Group level for the states of Colorado, Wyoming and Utah and calculated the Population Weighted Centroids  for those states.  The results of which you can see below:

The blue diamonds are the spatial centers of the state and the red stars are the population weighted centroids as generated by my macro.  I was very pleased with the results, in each case you can see how the population centroid is "pulled" towards the state capital.




Saturday, 7 January 2012

Happy New Year

So 2012 is finally here and what a great year for Alteryx it is going to be.  We are now less than two months away from the annual conference: Inspire 2012 and the launch of Alteryx 7.0.

Inspire as always promises to be a great learning event, with the chance to see first hand the new Alteryx 7.0 features and learn how you can leverage them in your business.  If you are looking for some reasons to justify to management why you should go, check out the excellent template letter on the Alteryx website here. Remember that Alteryx 7.0 is going to be a huge release packed full of new tools and features so there will be plenty to learn and take back to your company.

If you are already planning on attending, then application is still open for the annual Alteryx Grand Prix.  For more information take a look at the recent post on the Alteryx Engine Works blog here.


Also if you haven't checked it out recently: Ron House's All About Alteryx Blog has some really interesting posts on using Alteryx 7.0 to leverage some online Location-Based APIs.  So head over there and take a look.

(First day back in the office after Christmas break, found this awesome mug waiting for me.  Great way to drink my morning coffee!)

Monday, 19 December 2011

December Enigma - Solution

and the answer is the 1st and 2nd of October '25.  You can find my solution here:


Interestingly the other two solutions I have seen from other people, both look quite different to my solution.  But that is one of the huge strengths of Alteryx: there is never just one way of solving a problem!

Friday, 9 December 2011

December Enigma

With the next Alteryx Grand Prix just around the corner I thought I would offer up another enigma for you all to  test your Alteryx skills on:

http://www.newscientist.com/article/mg21228341.400-enigma-number-1668.html

This particular puzzle will test your knowledge of the Alteryx date time functions.

If you enjoy a challenge and want to test your skills why not think about signing up for the Grand Prix.  Entry is currently open and full details can be found here.  Check out last year's winner Jason Dunkel's blog on his experience here for some hints, tips and inspiration.

Look out for the solution next week.

Monday, 21 November 2011

Optimisation Quiz Answer

Well the answer is that module B is faster.  That is to say adding an extra sort actually speeds things up. So using my sample data whose counts and sizes you can see in the images:  Module A runs in about 45 seconds while Module B runs in less than 30 seconds.

Which kind of seems counter intuitive, how does adding more processing make things run faster overall?

Well a great place to start looking for optimisations in an Alteryx module is to look at where you are sorting data.  A sort is a pretty resource intensive task, so the less you can do it the quicker your module will run.

In module A, behind the scenes Alteryx is sorting in two places:

  • The summarise needs to sort the data in order to do the group by on the grouping field.
  • The batch macro also needs to sort the data to be able to batch it into chunks.
So module A has two sorts.

But if sorts are bad for run time, then how can adding an extra sort speed things up I hear you ask?  

Well that's the clever bit...  When a tool in Alteryx sorts some data it tells the tools down stream of it that the data is sorted and what it is sorted by.

So back to our example:  We noted above that module A was performing two sorts, but actually it was doing the same sort (by GroupingField) twice.  By adding a sort in where we did the data gets sorted once there.  When the data gets to the summarise tool and the batch macro tool, it is already sorted by the GroupingField and does not need to be sorted again.  Rather than adding a sort in we have actually reduced the number of sorts the module needs to do by one, thus gaining the saving in the run time that we see.

Thursday, 17 November 2011

Optimisation Quiz

A Friday quiz for you to test your Alteryx skills. This is based on a real world module from a colleague at Alteryx which I was looking at earlier this week, and is a question about optimisation.

I have isolated the particular section of the module which we were looking at and have two versions: Module A and Module B. The question is simple: which module is faster, if either, and why?

I've set up a poll on the right side of the site for you to cast your votes and will post the answer and explanation on Monday.

So the 2 modules look like this (I flipped a coin to decide which was A or B so you can't to read anything into the labeling):

Module A
Module B


The modules are identical apart from that single sort. So effectively the question is does that sort make things faster, slower or make no difference?

The input data is unsorted with about 10M records in my example (obviously any optimisation makes more of a difference when there is more work to be done. The purpose of the module is there is a large number of records with a grouping field attached (in this example there are 50 groups) which need to be run through a process in batches.

The summarise works out how many groups there are and feeds that into the control parameter of a batch macro. The batch macro then uses the GroupBy feature to read records from its input in batches based on that control parameter. My example module just writes those records to a file, but for the question it does not matter what the batch process actually does, the answer is the same.

So vote for your answer on the poll on the right and look out for the solution on Monday.

Friday, 11 November 2011

Alteryx 7.0 - New Features Announced To Date

As I mentioned in my last post I think Alteryx 7.0 is going to be the most exciting release that I have seen (and that's not just because I have been working on it!)

I thought it would be worth a brief re-cap of the new features announced so far (complete with links to the Alteryx Engine Works blog):


Guzzler Improvements (link)
Some improvements to the core drive time engine in Alteryx's functionality offering: "which allows for more accurate drive time and drive radius calculations"

XML Parsing (link)
A feature that I know users have been asking for, for a long time.  Alteryx 7.0 sees XML support in the input tool along with a new XML parsing tool.

Alteryx web and the Private Cloud (link)
Alteryx 6.2 saw the release of Alteryx web On-Premise solution, another hugely exciting development for Alteryx. "It is an installable web application that allows end users to upload Alteryx wizards, manage users and run wizards all via a web browser." And being On-Premise clients can keep all of their data and modules secure on their on servers within their company.  Alteryx 7.0 promises more exciting features around "Permissions, Scheduling, Active Directory and viewing yxdbs and PCXML files."

Alteryx Map Changes (link)
This one is close to my heart as it is the main feature I have been working on since I joined Alteryx last February.  It is actually the second major re-write the mapping tool has seen and hopefully fulfills many of the requests we have received for mapping.

Input/Output Enhancements (link)
A new split button and latest used files gives you faster access to your files and database connections.  Plus the new Alias feature to manage your data connections and passwords.