Thursday 17 December 2020

CReW Data Cleanse

Yet another new macro is about to be released!  Alteryx has "AMP"ed up their game and while not all tools are AMP-Compatible, surely more speed is coming.  I've long thought about taking a stab at a CReW Data Cleanse tool, but since the release of an ALTERYX macro that delivers that functionality I have stayed in the peanut gallery.  I'll be releasing a CReW macro before Christmas 2020.  On Friday, December 18th I will surprise the Dallas-Fort Worth User Group (Virtual) with a presentation on the macro instead of (or in addition to) their expected presentation on CReW Generate Dummy Variables.



When you use one (or more) Data Cleanse macros in your workflow, you do sacrifice performance.  Think of it as cleaning the house with a toothbrush and a cup of water.  I am offering you a mop and a bucket.  Make your workflows run faster and give others a chance to use the shared resources.  If you're working with large sets of data, this macro upgrade is essential.

Alteryx Community Post:  https://bit.ly/34mfA0U

Presentation (YouTube) on this macro can be found here:  https://youtu.be/kIDqnwT4g5Q  

Walk-thru (YouTube) of macro can be found here:  https://youtu.be/uA5oyfmE2uc

Cheers,

Mark






Thursday 10 December 2020

Building CReW Generate Dummy Variables

Background:

This story goes back to the 1980's when I was asked to test new bank software and to document what errors I uncovered.  I completed an Error Incident Report and walked that form to the next building for review by the development manager.  Finding errors meant that I got to leave my windowless cubicle and walk in the California sunshine (at least temporarily).  The developers needed to know what process they would need to perform to recreate the incident and what the expected results of such actions should be.  Upon return to my desk, I would have to summarize the days findings and explain back to the project management what issues were found and what their potential impacts would be on the bank's production environment if left unresolved.

The tried and true waterfall development process left testing until after the requirements and design were completed and I eagerly awaited software to find what were sure to be plenty of bugs.  The biggest contention was whether the "expected" results were well defined in the requirements because I would report "errors" that could be considered as missed requirements.

Creating a CReW macro does not include requirement documents, design specifications, use cases, or an approval process.  In my, Mark's, POV I create what I think captures the essence of a design pattern (assembly of Alteryx tools) into a single tool.  Some of these patterns are fairly simple and straight-forward and reduce the time to configure multiple-tools and some patterns are difficult for beginners to gain the concepts for.  In either case, the tool obfuscates the pattern and makes something easier for the user.

As an Alteryx ACE, I'm able to attend developer days where we have an opportunity to share feedback with the development (sometimes with product managers) team and discuss our views and needs as well as to give our feedback on their ideas.  Observant as I am to the NDA, that's where that story ends and this macro blog begins.

Let's talk Cross Tab"

Who doesn't like a Cross Tab tool?  Frankly, I don't know why it is a "Cross Tab" tool and isn't a "Crosstab" tool.  There are more than 200 ideas in the Alteryx Community returned when you query that term where 1/3 of these ideas are authored by Alteryx ACEs.  Just because there are literally hundreds of ideas posted, doesn't mean that the tool is broken.  It means that improvements are requested.  The cross tab tool is at the heart of the create dummy variables process and this process was the topic of my latest rant to Alteryx.

Cross Tab - Areas of potential improvement

Here are a few of my personal grievances:
  • Configuration is not saved as some new tools do
  • Dropdown and select are not searchable
  • Domain/headers replace non-alphanumeric characters with underscore
  • Output columns are in alphabetical order
  • You need data that supports all possible values otherwise Output columns are not generated
  • Downstream tools fail if not run with sufficient data

Cross Tab - Work-arounds

  • Use a find replace tool to prepend a sort sequence to each field value (to change the incoming data) so that the column name sorts data (e.g. 00_Sunday, 01_Monday).  Later I can dynamically rename the columns
  • Use an Ensure Fields (CReW) macro to create needed columns of data (after Cross Tab). 
  • Place a select in front of the cross tab.  Only bring a minimum amount of fields into cross tab.  Join back on either record sequence or recordid depending on need (e.g. use of AMP or if output records aren't 1:1 to input)
  • Sample the output data (0 records) and save to YXDB. This saves the metadata for a good output.  You can later Union this data within the workflow and ensure order and presence of data 

Let's talk Dummy Variables

After my short (not so) rant, I realized that I should take on the challenge to improve Alteryx myself through the creation of the Generate Dummy Variables tool.  It won't improve the Cross Tab, but does address the issues that take the greatest amount of my time when creating workflows that either model or score data.  With 30 or more community posts on the subject, maybe it will help folks in the same boat as I am in.



Defining Scope:

Since I don't apply requirement documents or use cases to my CReW activities, I need to set boundaries for functionality and time to build.  What can I create in a day?  Yes, when I build a macro I hope to have it created in a day.  That day generally spans for more than a week, but that includes blogs, builds, testing, feedback, documentation and icon selection.  

Scope for Dummy Variables:

The scope is for a single variable (despite my knowledge that you may have multiple source fields to convert).  I want to accept categorical data elements and provide the ability to rename the output columns to include the column name.  If possible, I want to add the functionality of an ENSURE FIELDS macro (CReW), but not require the user to have that macro downloaded.

Scope Creep:

That idea to include the variable name as the first part of the output column name (e.g. Color_G) created a thought that a custom prefix might be preferred to the actual field name.  Then I ran into problems with spaces (thank you Cross Tab) in the output column name field.  When I added the ensure fields capability (to pretend that data values not present were accounted for), I had space problems and NULL value issues that needed more creativity.  Finally, I was ready for testing and went to fellow ACE, Dan Languedoc for testing support.  He suggested (insisted) that I allow for INT fields to be supported as categorical variables.  "Fine!", I'll add them.  He later said that my macro was indestructible.

Development:

I was unhappy with my final product.  I felt that it wasn't easy to walk-through to explain to others and didn't feel satisfied with the macro.  I had added functionality and had to ensure compliance in multiple locations of the workflow.  It was not pretty.  So I salvaged what I could and re-wrote the macro.  It took me about an hour to come up with the current version.

Testing:

While I was building the macro, I was actively testing the macro in another workflow.  I was using different configuration settings as well as different incoming data (e.g. things with and without spaces).  When I want to see what's happening inside of the macro, I use a "debug" trick to place a Macro Output tool before and/or after a tool and look at the results in the secondary workflow.  I can see what the data looks like going into and out of a dynamic rename or other tools that are in the middle of the workflow and can quickly reconnect them to another point of interest.

Defensive Configuration or "APA Readiness":

Will this macro fail in the future?  I hope not.  I hope that it works and that I never have to maintain or support it again.  But what might be worse is to have consistently wrong results.  The biggest risk that I saw was dropping or muliplying records.  I use a TEST tool to ensure that the macro input record count exactly matches the output record count.  If you drop a record (e.g. Create Points) and don't let the user know about it, I think that it is an error.  I also added a check that if you ask to ENSURE field valuess that the macro will error if you don't actually input a field value.  Also, if you copy/paste a field value into the ensured fields, if you don't mark it as required, that we will ignore the request.  

Now if the macro fails, we can all blame Dan.  He said that it was good to go.

P.S.:  Here's a link to the macro!

Wednesday 25 November 2020

CReW For Dummies

New CReW macro is "ready". I'll be unveiling it at both the Twin Cities and Indiana user groups on December 10th and 11th (respectively).  Here are the sign-up details (click to register) admission is free, pay at the door. Take the best seat, as we haven't any:

12/10/2020 12:00 pm - 1:30 pm CST
Virtual


12/11/2020 2:30 pm - 4:00 pm EST
Virtually, On24

I've updated this blog post with more hints about the macro.  Because a picture is worth a thousand words, here's a picture of the macro:






If you'd like a further hint:  Here's a blog post from the Alteryx Community that is good reading material in preparation for the release:  https://community.alteryx.com/t5/Data-Science/Encoding-Variables-Translating-Your-Data-so-the-Computer/ba-p/445020

Maybe you can join the virtual meeting and talk to me there?



Cheers,

Mark

Sunday 23 February 2020

10 Year Anniversary


I know I've not been so active on this blog of late, but I realized the other day that it has now been 10 years since I started and I felt I needed a post to at least commenerate that occasion.

And what a decade it has been. In 2010 I was a data analyst and Alteryx user a few months away from heading to a conference which at that point was called Extend The Reach, but in fact turned out to be the first ever Inspire.

Ten years on and I am a Principal Software Engineer working for Alteryx and on my way to my 15th Inspire conference, which with the Analyticon rebranding might turn out to be the very last.

To say this software has changed my life would be an understatement. But 10 years on I still love this product as much as I did when I first started using it. Last week I requalified for my advanced certification and enjoyed the thrill of once more using the product against the clock.

I look forward to catching up with all of you who will be in Sydney this week. I'm still excited as ever about Inspire and the people and challenges I will meet in the solution centre.

Adam

Monday 10 February 2020

Building a Better Record ID tool ...


Whilst visiting the Shambles in York, I was treated to a set of ACE inspired ghosts from my friend and fellow ACE, Rafal Olbert.  We searched the community of ghosts at the Ghost Merchant for the perfect trio of goblins to complete the set.  If you notice the coloring, they closely resemble the ACE logo on Raf's polo.  Who except us would spend the time rummaging through hundreds of ghosts to complete this task?
Now comes the question that you might be asking, "What does this have to do with CReW macros?"  It has everything to do with them.  Often there will be a simple task that has to be configured by many in the community.  The time spent in configuring a task could be better spent elsewhere (e.g. visiting York with friends).

There are multiple posts in the Alteryx Community asking how to get unique record IDs on groups of records.  There are also multiple ideas in the community to modify the functionality of the RecordID tool.  James Dunkerley presented at Inspire 2020 on the magic of modifying XML.  One example that he showed was the adding of the grouping functionality to the Record ID.

I grabbed a copy of his macro and have had it waiting for an opportunity to release it to the CReW user-base.  If you think about the construction of a macro that allows for GROUPING, you'll realize that the dynamic nature of grouping requires you to build a solution that allows for multiple (any) selections of fields from the incoming data stream.  James solved the grouping by modifying the XML in a formula:

IIF(IsEmpty([#1]) OR [#1]='""', "<GroupByFields />", 
'<GroupByFields><Field field=' 
+ Replace([#1], '","', '"/><Field field="')
+ '/></GroupByFields>')+ "/*" + [#1] + "*/"

He's modifying the XML of a standard Multi-Row Formula to update the GROUP BY parameter.  When no fields are selected, the resulting xml (grouping) looks like this:

<GroupByFields />

If the incoming data has three (3) fields selected for grouping, the results need to look like this:

<GroupByFields>
   <Field field="Field1"  />
   <Field field="Field2"  />
   <Field field="Field3"  />
</GroupByFields>

The macro is configured to read the incoming metadata and will populate a LIST BOX interface tool with all of the incoming fields.  When the user selects the fields (e.g. Field1, Field2, Field3), the question results are passed to an ACTION tool as a comma delimited series of values.  James then creates a formula to modify the output to produce the XML component displayed above.

If James had attended the BUILD 2 event in Nashville, this was one of the challenges posed.  He would have won that event.

If you'd like to download a copy of the macro, it is in the Alteryx Gallery and can be found here . Any feedback is welcomed.

Cheers,

Mark

Thursday 30 January 2020

Then Is Not Now


I'm in the U.K. and visiting the Leeds Alteryx User Group.  Just prior to my journey I had generated an idea for Alteryx to improve the generate rows tool:  Please Read & Give Likes.  At the same time I suggested that if you: 1) click on a tool that is on your canvas & 2) go to the pallet and double-click a tool, that the tool will be inserted onto the canvas after the previous tool and that your cursor is moved to the configuration window Please Read & Give Likes.  But now I'm in Manchester and what have I done for CReW?

With a little help from my friends, I decided to tackle joining to date ranges (Tool Mastery: How to Join on Date Ranges).  The question comes up frequently in the community.  What I did was simplify the GENERATE ROWS tool and make it create date or datetime intervals between two end date or datetime points.

If you need to have a record for every day in 2020, you could feed it a start of 2020-01-01 and 2020-12-31 and get 366 output records.  Note:  I just tested and updated the macro.  If you set the TYPE of the output field as datetime and try this, you'll only get 365 records.  Thankfully, I caught this.

So if you want a simple way to generate date records, please look to the Alteryx Gallery for the new macro.  A link is provided below:


Cheers,

Mark