Wednesday 3 August 2022

CReW SHA256 Macro: Enter the Python

 Did CReW just freeze over?

CReW SHA256 Macro

This wasn't the first rodeo for me, but it is the first time that I've publicly created a macro that uses the python tool in Alteryx.  I pride myself in doing things with BASE-A (100% Alteryx) whenever possible.  That happens virtually 100% of the time.  But until Alteryx adds SHA256 (or other encryption) to their products (note:  a FIPS version of Alteryx is available help.alteryx.com), this macro will get SHA256 encryption on your current (2022.2) version.

Where is the macro?

This link is at the top of the article for those who don't care about the making of the macro....


How does it work?

Select a single data field (String Type) and it will be SHA256 encrypted as:  [SHA256_Output].  There are warnings if EMPTY data is present or if the incoming data is NOT lowercase (the macro will automagically lowercase the input field).  If you don't want these features turned-on, you can disable them in the macro configuration.

The entire record is read and output by the macro.  Only the SHA256Input field is passed into a python tool which uses HASHLIB and encrypts your field.  If the incoming data is Empty(), it is bypassed.  If you have lots of duplicate data, you might want to UNIQUE that data and only pass the unique values through the tool.  It runs about 2,000,000 records per minute on my computer (with AMP on).  Without AMP, it runs about 50% slower for this function.  

Memory and AMP and Python can lead to shortages of memory.  Use caution.  If you experience memory-related issues, you might want to keep things simple.  I ran 200 million records through this process in less than 1.5 hours while testing.

What's inside the box?

Input:

Your incoming data is routed through a common RecordID tool and the entire record circumvents the encryption process and awaits the results in a JOIN tool (append to end of record).

Process-Prepare for Encryption

A formula tool assigns the selected (Interface Drop-Down) string field into the __SHA256Input__ field.  Conversion warnings (MESSAGE) are issued if the input field is Empty().  Conversion warnings are also issued if the input field contains uppercase letters.  The input field is converted to lowercase() for all records.  If these function offend you, you can deselect the execution of the preparation functions in the configuration (via CheckBox that updates the DETOUR).

Python - SHA256 Encryption

Empty data is filtered and then unioned to the results of the encryption.  Data then goes into the Python tool and the following script is executed (please note that the script is saved as a comment inside of a disabled container):

#################################
from ayx import Package
from ayx import Alteryx
import hashlib
#################################
df = Alteryx.read('#1')
#################################
def text_to_sha256(text):
    h=hashlib.new('sha256')
    h.update(text.encode('ascii'))
    return h.hexdigest()
#################################
df['SHA256_Output'] = df.apply(lambda row: text_to_sha256(row['__SHA256Input__']),axis=1)
#################################
Alteryx.write(df,1)

The python tool reads #1 input and outputs via #1.  The imported library, hashlib, is used to created a function that converts text to sha256.  A dataframe for output uses a lambda function to parse the data from __SHA256Input__ and creates your SHA256_Output field.

In an effort to minimize the python use of memory (as tested python uses 50% more memory than Alteryx was consuming), only the RecordID and SHA256Input are passed through this tool.  The tool is configured in "Production Mode".  A copy of the script is saved as a comment as friends of mine tell me is a good practice (sometimes it is lost otherwise).

Error Checking:

What could go wrong?  I can't imagine records being lost in the join, but should it happen there are two (2) CReW Expect 0 macros waiting to throw error messages if any unjoined data tries to escape the macro.


In Closing:

It has been a while since I've posted.  The community gallery is where I've been stashing macros.  Every once in a while I hear questions about the use of CReW while using AMP.  On my to-do list is to look through the parallel-BUD tool and modify it to be AMP compatible.  In reading community posts it seems that it might not work without compatibility mode enabled.  If you have any CReW or AMP feedback/concerns (related to CReW), please let us know.

A special note of thanks to Alex Gross (AYX Community @grossal), ACE and friend who helped to get through some python woes.

Cheers,

Mark


No comments:

Post a Comment