Thursday 11 November 2010

Inspired Ingenuity or Overly Engineered? (Part 2)

So I think my solution was probably inspired somewhat by my gantt chart module.  I was trying to visualise the problem and realised I could actually turn the problem into a spatial one and then use Alteryx's spatial tools to solve it.

So here it is:

Imagine a rectangle for each task sitting on a time line (like a gantt chart but with all the tasks on one row).



If we make the length of the rectangles proportional to the length of the tasks,  then we can say the area of the rectangles represents the time taken for that task.

Now imagine another rectangle on the same time line representing the entire reporting period.  The area of this rectangle (in green below) represents the total time available.



Now if we cut the smaller task rectangles from the large reporting period rectangle, then the area remaining of the larger rectangle is the unused time.

With the spatial tools of Alteryx, cutting one set of polygons from another is easy.  So that's what I did:



You can download the module here:



I think you'll agree a rather unique way of using the spatial capabilities of Alteryx in a way they were never designed for.

So where does that leave my Occam's razor?  Well Ned has already sent me his solution and I have to say it doesn't involve any spatial tools and if I'd manage to think about the generate rows in the right way it would have been a quicker solution to implement than mine.  My module does run more quickly (0.9 seconds against 2.0 seconds on my machine) but the report only gets run once a month in the middle of the night so speed isn't really a necessity...

You can download Ned's solution here:



I think this one is a simpler solution than mine and for somebody opening the module to make changes in the future somewhat less daunting. So, without the necessity of a fast runtime, by applying Occam's razor I think Ned's is the better solution.

However having said that I hope that my solution has been interesting and might inspire some future spatial solutions to non-spatial problems.

Wednesday 10 November 2010

Inspired Ingenuity or Overly Engineered?

Inspiring Ingenuity is one Alteryx's taglines and I think the module I will cover in this post is a great example of a way to solve a problem using the tools in a way they were never designed for.  However (and hence the title of the post) I wonder whether my solution is clever for the sake of being clever and in reality there is a simpler solution which I have just overlooked.

I believe strongly in an "Occam's Razor" approach to problem solving and Alteryx modules are no exception.  Occam's Razor is stated in a number of forms and really relates to scientific hypotheses.  Wikipedia says

 The principle is popularly summarized as "the simplest explanation is usually the correct one."


I've also heard it stated as "The simplest solution is the best solution".  Now this isn't quite the version I subscribe to.  Consider the example in Alteryx of needing to match against a large lookup file of several million rows on a regular basis.  Well the simplest solution would be two input tools, a join and browse.  But is that the best solution?  Well if the time it takes for the module to run is important to you, then no.  A better solution would be to load your master file into a Calgary dataset, index the columns you need to match on and then rebuild your module with an input tool, Calgary append and browse.  More work to set up, but in the end a better solution.


So where does that leave our Occam's razor.  Well continue down the Wikipedia article and we have this version






"pluralitas non est ponenda sine necessitate" ("plurality should not be posited without necessity")


in other words complexity in a solution should only be there if it is necessary. So in our above example we introduce the extra complexity of loading our data into Calgary to make our solution run faster.


Which brings me to a question about my module.  Is the complexity of my solution necessary or can somebody think of a simpler solution?


First the problem:


We have an Alteryx server which can run up to four Alteryx modules simultaneously.  Every time a module is run the start and end time of that module is logged in a SQL server database, which is then used for looking at  how the server is being utilized (using Alteryx to generate the reports of course).  


So one of the metrics we are interested in is server utilization:  how much of the processing time of the server did we use last month?  How much more capacity have we got until we need to get a second server? 


And the difficult question: 


How much time is the server completely idle? 


If the tasks were non overlapping this would be as simple as calculating the time for each of the jobs and subtracting the total of these from the total time for the reporting period. But the tasks do overlap...


I'm going to leave you to think about possible solutions and post my proposed solution in the next post.  If you want to have a go then I've provided my example data below to see if you can match my answer.



Monday 1 November 2010

SAS Transpose

Last week I visited a client who had been trialling Alteryx.  The team who had been conducting the trial were SAS analysts and were particularly impressed by Alteryx's spatial capabilities.  At the end of the meeting I asked if there was anything which they thought they could do in SAS and not in Alteryx.  The reply was the only thing was that the transpose tool didn't really to seem to behave as they expected given their experience of SAS's Proc Transpose.

So this weekend I thought I would take a look at what the differences might be.  Now it has been a long time since I have written anything in SAS, so first step was a quick google to find out how the Proc Transpose tool worked.  I came across this excellent tutorial:

http://support.sas.com/resources/papers/proceedings09/060-2009.pdf

I decided I would recreate the lessons from this document using Alteryx.  Upon starting I realised why they had been having trouble, the Proc Transpose function of SAS is closer to an Alteryx cross tab than transpose,  but in actual fact to reproduce the results exactly Alteryx needs a transpose and cross tab tool. 

Just to be clear it is not that Alteryx tools aren't as efficient as SAS they are just different and I have added an extra example to each lesson which I have titled "The Alteryx Way" which shows what I think an Alteryx analyst would do to transform the example data.

I could have tried to turn my modules into a SAS Proc Transpose macro for Alteryx however I don't believe that would be the most efficient way of doing things in Alteryx.  I am not presenting these examples as the best way of doing things in Alteryx; more of a way for SAS analysts to get a handle on the differences between the two tools.  I think if you are going to program efficiently in any language you need to think about your solution in terms of how that language works.

My example modules can be downloaded below:



As always feel free to post any comments/questions here or e-mail me directly.