Monday 5 November 2018

Trick or Treat:  Find Closest 

I've been tinkering with spatial requirements from clients and have found myself inside of a rabbit's hole.  The result was the creation of a new macro, Find Closest
Unlike the Distance Tool, the Find Closest macro calculates distances without the need of a spatial object.  In speed comparisons, this macro is 100 times faster than the distance tool. This macro also differs from the Alteryx Find Nearest tool as it does not use the spatial or drivetime functions for calculating distances.  Underneath the covers it uses the Haversine formula for calculation of distances on a sphere.  Although the earth is flattened, it does very well for calculation of distances.  It is virtually exact on what would be a trade area.  I've shown this macro and formula to a few ACE friends @andy_moncla and @Treyson for their opinions and have had positive feedback.  I am going to release it via the Alteryx Gallery as a CReW macro with a caveat that the distances calculated may vary from those using the Distance tool.

This macro is simple, but offers features of “Distance”, “Find Nearest” and “Create Points” tools in one macro.  While reviewing a workflow with a large set of customers and potential store locations, I was hoping to improve the performance of the functions without the use of drivetime data.  The client could then take the potential matches through an API call to gather drivetime data (if needed) for the subset of “best” records.  Starting with a latitude and longitude I was pleased to see the performance gains from a single formula tool.  After validation of the results I wanted to make a general-purpose tool for the calculation of distances.  With their needs in mind (note: The engagement had already been completed) I started to think of how other analysts may benefit from a tool like this.  Creation of the first prototype took roughly 4 hours to build, test and review.  After that, I tinkered with more user options and the creation of a macro image. 

The find closest macro calculates the distance between two sets of latitude & longitude coordinates.  Given an input stream of Target and Source locations it will calculate the distance between all combinations of Targets and Sources.  Optionally, it can accept a stream of data that contains both the source and target coordinates (for distance calculations only).  When calculating via the dual inputs, it allows the analyst to limit the distance output (matches) and can (if required downstream) create spatial objects.

  • Inputs
    • Allows for either 1 or 2 input data streams
  • Distance Configuration
    • Ignore 0 Distance Matches
  • Find Closest Matches
    • Maximum Distance (Kilometre or Mile)
    • Limit by quantity
  • Spatial Objects
    • Optional creation of spatial object(s)
  • Speed.  Tested on 1 million records this macro runs in under 3 seconds compared to over 12 minutes using spatial tools.
  • Spatial Data.  Because the distances are only "as the crow flies", no spatial data is required (Find Nearest requires TomTom data).
  • Create Points.  Create points are NOT required.  An option to create Target, Source or Both points is available if needed downstream.
  • Output Miles.  Both Kilometre and Mile distances are displayed on output.
If you take a look under the covers of this macro you will find the Haversine formula implemented using native mathematic expressions.  Detours are used to avoid unnecessary functions and as a last-minute addition, I added the use of a formula tool to create spatial objects (if someone really wants them on output).  If you haven’t discovered yet, you can perform many spatial functions from within the formula tool.  Please check out my YouTube video that demonstrates the raw power of this elegant beast in computing distances.  If you haven’t yet subscribed to my YouTube channel, please do consider it.

As always, your comments and feedback are welcomed.  You can download the macro here.

Saturday 7 July 2018

CReW Needs Your Opinion

What's your opinion man?

While conducting CReW macro training at Inspire18 with Joe Miller, we revealed several common use cases where the "Expect Equal (Beta)" macro was a solution.  In fact, Joe Miller has begun incorporating the macro into his weekly challenges as a way of confirming the challenge results.  We got a question from the class about this macro which I'm about to share:  "Can you set which fields to check between the two files (instead of checking all fields)?"

What are your thoughts?  My response was to "Select" the same fields from both sets of data for comparison prior to the macro.  Adam was in the room and while I had asked him prior to the conference about why the "CReW Test" macros were still in Beta and his response was that he hadn't received any feedback on them from the Alteryx Community.

Did you know?

  • Expect Equals requires every Field to be in the same order?
  • Expect Equals is case sensitive for Field Names?
  • Expect Equals is NOT case sensitive for data?
Are these bugs or are they features?  I don't know.  But I did have fun with some exploration.  I wanted to create a CReW Delta macro.  It would function similarly to the Expect Equal macro in that nothing happens when everything is equal.  Above that, it would output every difference found for every record compared.  It would also provide an error message explaining where the differences were found.

Error: CReW Delta (WIP) (3): There were 1 changed records found.  2 fields had differences:  {Field4, Field5}.  1 unmatched left records and 0 unmatched right records.

In this macro, we compare records and output the Sequence number of each input file along with the field names where differences are found.  The left and right value pairs are easily reviewed from the "Delta" output anchor.

This works wonderfully with 2 records.  We wanted to explore the performance for millions of records.  The number of records along with the number of fields compared will create challenges for computing resources.  We experimented with 2 approaches for comparing the data and the results may surprise you.

Approach #1:  Do it like the CReW Expect Equal Macro

Transpose the data into Name + Value pairs and JOIN the results.  When they match, do nothing and when they don't, then they are errors.

Pros & Cons:

  • You will be creating one (1) record for every field in the incoming data times the number of incoming records time two (2).  Comparing 2 million records with 20 fields will result in 80 million comparisons.  This seems to be a Con.
  • It is very easy to configure this macro.  This seems to be a Pro.
  • We don't have an interface tool to allow you to select which fields to compare.  This was a con for our CReW student.

Approach #2:  Use MD5 Hash on each record and compare the hash results

For each "selected" field on the incoming records, HASH the selected fields and compare the Hash value in a join that requires NO sort. Where the expected values differ, they are errors.

Pros & Cons:

  • You will be comparing one field for every record.  This seems to be a Pro.
  • It is complex to create the dynamic formula to create the HASH function.  This seems to be a Con (for our maintenance).
  • We included an interface tool to allow you to select which fields to compare.  This was a pro.


With 2 million input records (4 fields) approach #1 took roughly 10 seconds to complete.  With the same input, approach #2 took roughly a minute to complete.  Why did it take so long?  Because the conversion of all fields to strings and the creation of the MD5 Hash was an expensive function to call.  Sorting and handling small records was FASTER than creating the dynamic formula and hash.

Please take a look at the macro and let us know your thoughts.  I appreciate your feedback and want to know if there are features or functions that you would want in this macro if we do decide to make it into a future release.

Macro Location



Wednesday 13 June 2018

CReW - Who would cross the Bridge of Death must answer me these questions three

#Alteryx18 marked the beginning of a journey for me.  I've long been on this quest, but now take the lead as Adam focuses his efforts on family and Alteryx e2.  My first CReW action was the creation of new SWAG.  These die-cut stickers were gobbled up in Anaheim and I've reserved more for my friends at #Alteryx18 in Europe.

Second on my list of accomplishments was the assumption of the CReW Macro training (only scheduled in Anaheim for #Alteryx18).  Together with Joe Miller from Alteryx we either wowed the attendees or we gave sufficient pleas for good survey results (9.6/10.0).  Putting CReW macros to the "test" and opening eyes about how to create value from these gems really sunk home as users (beginner to expert) were enlightened by their seeming simplicity, but awesome results.  "Oooohs and Aaaahs" abounded as we delighted the students with tips and tricks.  Defensive configuration and use of CReW tools took center stage as Joe provided step by step instructions and I added colour to the session.

Where do we go from here?  I want to hear from you!  What ideas and what needs do you have?  I've created some beta macros (they are in the macro district of the Alteryx gallery), but CReW is more than just one person.  Whether your an Alteryx guru or a newbie I can use your skills.  Macros need naming, icons need designing, and we also need to manage the CReW inventory.  Please let me know your thoughts and availability to join the CReW.  

I'm very honored and excited to carry this torch.  Join me and let's make Adam proud.


Wednesday 2 May 2018

Passing The Baton

The theme of this year’s Inspire conference is “Alter Everything” and Alteryx has truly changed my life, it has brought me new friends, new skills, given me the chance to live on a different continent and through all of that the opportunity to build the product that I love.

But times change, and with that said I am sad to announce that I am stepping down as the curator of the CReW macros. Which if I am honest I think already happened a couple of years ago, it has just taken me this long to realise it and make it official. There is no big reason, my life has just changed: I have a family who I want to spend my free time with and the time I don’t spend with them I want to get out of the house/office and exercise or see friends (and there is this little project at work I am working on that really eats up all of my thinking time right now). So apologies to everyone who has sent me bug fixes or new macros over the last couple of years and joined my list of “things I need to get to” I realise now I am not going to get to that list.

But in thinking of what next for the CReW macros there really was only one person who I could think I might pass the reigns to (that pun just wrote itself). A long term contributor to the macros, a good friend and all round legend of Alteryx, Mr. Mark Frisch or as some of you may know him MarqueeCrew. I will let Mark post a welcome post here soon with his ideas for the macros, but all there is left for me to do is thank Mark for taking over and for all of you for all of your suggestions and help with the macros over the last 4 years.

Hope to see many of you at one of the Inspire conferences this year.

(Former curator of the CReW macros)