Friday, 18 October 2019

Run away!

A quick note as #Alteryx19 closed in London that should make for a great weekend.  I've been told by Alteryx support  that latest release of Designer 2019.3 – version 2019.3.5.17947 fixes the CReW Macro issue.



Thursday, 5 September 2019

2019.3 Caution

There are reported issues with the runner macros with 2019.3. Alteryx has been alerted to the issue. At this time I recommend reverting to 2019.2 until Alteryx has an update for us. If you're not using the runner macros, enjoy the many new features.

For more information please reference the linked Alteryx Community article.



Friday, 30 August 2019

Dynamic Formula: A performance use case

In the Alteryx Community a post caught my attention.  The member wanted to perform calculations based upon the previous field (column).  For the purpose of this post, I've modified the use case as follows:

Within my Alteryx workflow I need to calculate the difference between each numeric column of data (e.g. monthly data) and create "delta" columns that calculate that difference.  Success of the solution depends upon both not needing to update the workflow (e.g. dynamic calculations) and the performance of the solution (not needing to transpose the data).
My friends Thales & Esther (EstherB47recommended transposing the data.

When dealing with this kind of situation, the best you could do is to transpose your columns into rows.
This way you can work with Multi-Row Formula, that can be much more flexible in regards to what you're looking for.

So basically use Transpose Tool and you'll have a better situation to compare previous records against the current ones.
Not me!  I know how to avoid the transpose tool and how to dynamically calculate formulas.  Try this yourself and don't use a Transpose Tool or Multi-Field Formula.  Imagine hundreds of columns with thousand or more rows.  Here's some simple input data:

This is the desired output:

The solution looks like this:

Step 1:  Use a SELECT tool to limit your input to just the "KEY" field(s) plus the numeric fields used in the calculation.  I assume that the key is STRING.
Step 2:  Use a FIELD INFO tool to get the metadata (names & types) of incoming data.
Step 3:  Use a FILTER tool to remove any non-numeric fields.  If the KEY is numeric, we're going to have a problem.  You'd need to use a SAMPLE tool to skip N records (N is the number of key fields).
Step 4:  Configure a FORMULA tool to create "Output Field" names for each of the new fields.
Step 5:  Configure a MULTI-ROW Formula tool to create an expression of:
"[" + ToString([Name]) + "] - [" + ToString([Row-1:Name]) + "]"
Step 6:  Use a sample tool to remove the calculation of Delta from the first record.
Step 7:  Configure the CReW Dynamic Formula tool.

The workflow has been saved within the Community post  (above) and a link is available to the yxmd file (here).

This might make for a clever CReW macro addition.  Is there a volunteer to prototype a generic macro for the community?



Thursday, 20 June 2019

#Alteryx19 - Nashville: We turned it up to Eleven!

Can you CReW with the best?

As part of #Alteryx19 we conducted not one, but two sold out CReW macro training classes.  With the assistance of @TessaEnns we demonstrated how to gain productivity through the use of CReW macros.  We delved into some alternative methods for getting to the same results, how to open and investigate the macros along with how to get involved with CReW.

Tessa drove and I was the navigator and tour guide for the sessions.  There were some twists and turns as we fielded questions from the class participants.  We explored existing and new use cases that included questions like:  "How can you make the DISTANCE tool execute faster when applying TomTom drive time distances?"  Our answers included options of:  Don't use it when you've previously calculated that distance (save your results),  Overlay a grid to your catchment (trade) area and calculate the distance/times from the center of each grid rather than from the specific household address, Amp up your memory.

Upon return from Nashville, we've added 5 short YouTube videos explaining the following macros:

If you haven't yet subscribed to my (MarqueeCrew) YouTube channel, please do so.

There's been lots of discussion about incorporating CReW functionality into the Alteryx product.  It's no secret that many of the macros would benefit the product and make it easier on new and existing members of #AlterNation.  These ideas have been officially posted to Community and your vote (star) couldn't hurt:

Elevate Crew Macro Pack support from Community based to Officially Supported
Can we add some of the CReW tools and JDunkerly formulae to Alteryx

Even if Alteryx brings all of the macros to life within the product, we'll likely still be here as a proving ground for new innovation.  As a Minimum Viable Product (MVP) or better, the CReW macros assist in making Alteryx more useful and easier to configure.

Hidden in plain view (at the end of this blog) is a note of caution.  It has come to our attention that the Runner Macros do not run on computers with Windows 10 Enterprise.  If you have overcome this obstacle, please let me know.  There are several posts on community about this and we don't currently have plans to address the functionality.

At Inspire we discussed as part of Build2 the creation of new macros.  Maybe you've already created one of these macros:

  • Summary Append:  Merge the functionality of a Summary and Join tool to apply grouping logic and put the result of the summation (e.g. Total by Group) on each incoming record.
  • Record-Row ID:  Extend the RecordID tool to allow to reset to a starting value (e.g. 1) for every record within a group or set.
If you have feedback on these or other ideas, please comment in the blog or reach out to me directly.



Monday, 14 January 2019

Block All Records

Block All Records

Hold the presses!  The first Alteryx Certified Expert, Jesse Clark has contributed to the CReW macros.  There is simplicity here for all to learn from.  But first, why block all records?  I know that I’ve used the function many times, but how about you?  Have you ever wanted to stop all data from being output?  Essentially, you want to sample zero records.  Oh! That makes everything so clear.  Maybe if I suggest a use case, this will make more sense.

You have an input dataset and want to make sure that the output of your process (e.g. macro) contains all of the fields in the same order as the original file.  You could use the Block All Records macro and stream that into a UNION tool.  Then set the UNION to read #1 first.  It is then up to you whether you will auto-configure by Name or manually configure the data.  The bottom line is that the incoming data fields are in the same order on output.

Another example use case is where you have a list of files that you’ve already read.  If you read this list into a process looking for new files (JOIN with Left output), you might one day want to ignore all read files and rerun the whole set of data.  If you Block All Records, then you’ll end up processing all data as new.

Thanks Jesse for the contribution!




Monday, 5 November 2018

Trick or Treat:  Find Closest 

I've been tinkering with spatial requirements from clients and have found myself inside of a rabbit's hole.  The result was the creation of a new macro, Find Closest
Unlike the Distance Tool, the Find Closest macro calculates distances without the need of a spatial object.  In speed comparisons, this macro is 100 times faster than the distance tool. This macro also differs from the Alteryx Find Nearest tool as it does not use the spatial or drivetime functions for calculating distances.  Underneath the covers it uses the Haversine formula for calculation of distances on a sphere.  Although the earth is flattened, it does very well for calculation of distances.  It is virtually exact on what would be a trade area.  I've shown this macro and formula to a few ACE friends @andy_moncla and @Treyson for their opinions and have had positive feedback.  I am going to release it via the Alteryx Gallery as a CReW macro with a caveat that the distances calculated may vary from those using the Distance tool.

This macro is simple, but offers features of “Distance”, “Find Nearest” and “Create Points” tools in one macro.  While reviewing a workflow with a large set of customers and potential store locations, I was hoping to improve the performance of the functions without the use of drivetime data.  The client could then take the potential matches through an API call to gather drivetime data (if needed) for the subset of “best” records.  Starting with a latitude and longitude I was pleased to see the performance gains from a single formula tool.  After validation of the results I wanted to make a general-purpose tool for the calculation of distances.  With their needs in mind (note: The engagement had already been completed) I started to think of how other analysts may benefit from a tool like this.  Creation of the first prototype took roughly 4 hours to build, test and review.  After that, I tinkered with more user options and the creation of a macro image. 

The find closest macro calculates the distance between two sets of latitude & longitude coordinates.  Given an input stream of Target and Source locations it will calculate the distance between all combinations of Targets and Sources.  Optionally, it can accept a stream of data that contains both the source and target coordinates (for distance calculations only).  When calculating via the dual inputs, it allows the analyst to limit the distance output (matches) and can (if required downstream) create spatial objects.

  • Inputs
    • Allows for either 1 or 2 input data streams
  • Distance Configuration
    • Ignore 0 Distance Matches
  • Find Closest Matches
    • Maximum Distance (Kilometre or Mile)
    • Limit by quantity
  • Spatial Objects
    • Optional creation of spatial object(s)
  • Speed.  Tested on 1 million records this macro runs in under 3 seconds compared to over 12 minutes using spatial tools.
  • Spatial Data.  Because the distances are only "as the crow flies", no spatial data is required (Find Nearest requires TomTom data).
  • Create Points.  Create points are NOT required.  An option to create Target, Source or Both points is available if needed downstream.
  • Output Miles.  Both Kilometre and Mile distances are displayed on output.
If you take a look under the covers of this macro you will find the Haversine formula implemented using native mathematic expressions.  Detours are used to avoid unnecessary functions and as a last-minute addition, I added the use of a formula tool to create spatial objects (if someone really wants them on output).  If you haven’t discovered yet, you can perform many spatial functions from within the formula tool.  Please check out my YouTube video that demonstrates the raw power of this elegant beast in computing distances.  If you haven’t yet subscribed to my YouTube channel, please do consider it.

As always, your comments and feedback are welcomed.  You can download the macro here.

Saturday, 7 July 2018

CReW Needs Your Opinion

What's your opinion man?

While conducting CReW macro training at Inspire18 with Joe Miller, we revealed several common use cases where the "Expect Equal (Beta)" macro was a solution.  In fact, Joe Miller has begun incorporating the macro into his weekly challenges as a way of confirming the challenge results.  We got a question from the class about this macro which I'm about to share:  "Can you set which fields to check between the two files (instead of checking all fields)?"

What are your thoughts?  My response was to "Select" the same fields from both sets of data for comparison prior to the macro.  Adam was in the room and while I had asked him prior to the conference about why the "CReW Test" macros were still in Beta and his response was that he hadn't received any feedback on them from the Alteryx Community.

Did you know?

  • Expect Equals requires every Field to be in the same order?
  • Expect Equals is case sensitive for Field Names?
  • Expect Equals is NOT case sensitive for data?
Are these bugs or are they features?  I don't know.  But I did have fun with some exploration.  I wanted to create a CReW Delta macro.  It would function similarly to the Expect Equal macro in that nothing happens when everything is equal.  Above that, it would output every difference found for every record compared.  It would also provide an error message explaining where the differences were found.

Error: CReW Delta (WIP) (3): There were 1 changed records found.  2 fields had differences:  {Field4, Field5}.  1 unmatched left records and 0 unmatched right records.

In this macro, we compare records and output the Sequence number of each input file along with the field names where differences are found.  The left and right value pairs are easily reviewed from the "Delta" output anchor.

This works wonderfully with 2 records.  We wanted to explore the performance for millions of records.  The number of records along with the number of fields compared will create challenges for computing resources.  We experimented with 2 approaches for comparing the data and the results may surprise you.

Approach #1:  Do it like the CReW Expect Equal Macro

Transpose the data into Name + Value pairs and JOIN the results.  When they match, do nothing and when they don't, then they are errors.

Pros & Cons:

  • You will be creating one (1) record for every field in the incoming data times the number of incoming records time two (2).  Comparing 2 million records with 20 fields will result in 80 million comparisons.  This seems to be a Con.
  • It is very easy to configure this macro.  This seems to be a Pro.
  • We don't have an interface tool to allow you to select which fields to compare.  This was a con for our CReW student.

Approach #2:  Use MD5 Hash on each record and compare the hash results

For each "selected" field on the incoming records, HASH the selected fields and compare the Hash value in a join that requires NO sort. Where the expected values differ, they are errors.

Pros & Cons:

  • You will be comparing one field for every record.  This seems to be a Pro.
  • It is complex to create the dynamic formula to create the HASH function.  This seems to be a Con (for our maintenance).
  • We included an interface tool to allow you to select which fields to compare.  This was a pro.


With 2 million input records (4 fields) approach #1 took roughly 10 seconds to complete.  With the same input, approach #2 took roughly a minute to complete.  Why did it take so long?  Because the conversion of all fields to strings and the creation of the MD5 Hash was an expensive function to call.  Sorting and handling small records was FASTER than creating the dynamic formula and hash.

Please take a look at the macro and let us know your thoughts.  I appreciate your feedback and want to know if there are features or functions that you would want in this macro if we do decide to make it into a future release.

Macro Location