Wednesday, 10 November 2010

Inspired Ingenuity or Overly Engineered?

Inspiring Ingenuity is one Alteryx's taglines and I think the module I will cover in this post is a great example of a way to solve a problem using the tools in a way they were never designed for.  However (and hence the title of the post) I wonder whether my solution is clever for the sake of being clever and in reality there is a simpler solution which I have just overlooked.

I believe strongly in an "Occam's Razor" approach to problem solving and Alteryx modules are no exception.  Occam's Razor is stated in a number of forms and really relates to scientific hypotheses.  Wikipedia says

 The principle is popularly summarized as "the simplest explanation is usually the correct one."

I've also heard it stated as "The simplest solution is the best solution".  Now this isn't quite the version I subscribe to.  Consider the example in Alteryx of needing to match against a large lookup file of several million rows on a regular basis.  Well the simplest solution would be two input tools, a join and browse.  But is that the best solution?  Well if the time it takes for the module to run is important to you, then no.  A better solution would be to load your master file into a Calgary dataset, index the columns you need to match on and then rebuild your module with an input tool, Calgary append and browse.  More work to set up, but in the end a better solution.

So where does that leave our Occam's razor.  Well continue down the Wikipedia article and we have this version

"pluralitas non est ponenda sine necessitate" ("plurality should not be posited without necessity")

in other words complexity in a solution should only be there if it is necessary. So in our above example we introduce the extra complexity of loading our data into Calgary to make our solution run faster.

Which brings me to a question about my module.  Is the complexity of my solution necessary or can somebody think of a simpler solution?

First the problem:

We have an Alteryx server which can run up to four Alteryx modules simultaneously.  Every time a module is run the start and end time of that module is logged in a SQL server database, which is then used for looking at  how the server is being utilized (using Alteryx to generate the reports of course).  

So one of the metrics we are interested in is server utilization:  how much of the processing time of the server did we use last month?  How much more capacity have we got until we need to get a second server? 

And the difficult question: 

How much time is the server completely idle? 

If the tasks were non overlapping this would be as simple as calculating the time for each of the jobs and subtracting the total of these from the total time for the reporting period. But the tasks do overlap...

I'm going to leave you to think about possible solutions and post my proposed solution in the next post.  If you want to have a go then I've provided my example data below to see if you can match my answer.