Von Philipp Hirsbrunner
15 Januar 2021The two software manufacturers Alteryx and Snowflake have announced their joint partnership this week. This is reason enough for us to try out the new possibilities right away and see what advantages it brings with it.
As a test case, we want to load sales data into our Snowflake DB on Azure and then work with it there accordingly. For this purpose, we bulk load an Alteryx DB file (yxdb) with 32 million rows to Snowflake.
The ODBC connection is configured in no time and Alteryx already loads the data into our Snowflake.
Alteryx splits the huge DB file into small CSV files and loads them onto the Snowflake. This Process runs fully automatically and improves the performance significantly. This in turn integrates these files into the previously defined stage - in our case also on Snowflake (see below). The stage could also be on Amazon (currently only Amazon or Snowflake are supported by Alteryx). By the way: not to be confused with a DWH stage. With Snowflake, a stage is a storage location for files, which are then integrated into the relational DB.
The whole thing takes a while: Our 33 million lines (21 GB) took a little over 29 minutes with an XL Warehouse. It should be noted that one of the bottlenecks was certainly our Internet connection.
However, the convenience of the connection - in this case upload - already shows an advantage of how Snowflake can be seamlessly integrated into Alteryx, and thus existing workflows. Of course, the same applies when data is to be read from Snowflake.
As soon as the upload is finished, Snowflake starts integrating the corresponding packed CSV files. The workflow in Alteryx remains in the status "running" - until Snowflake reports the finished integration.
After about 45 minutes, Snowflake has integrated all 32 million lines and removed the corresponding source files from the stage.
Now we want to see how we can work with the data directly from Alteryx In-DB.
A great added value of the combination of these two tools should come to bear especially in the application of Data Science and ML workflows. Below is an example of how the data is prepared in Snowflake and then processed using Alteryx ML technology.
32 million lines were loaded from Snowflake within a few seconds and prepared in Alteryx according to my request. Processing on my local machine then took a bit longer - around 90 minutes (Alteryx had to reduce the sample a bit because after 10 GB my memory was full). In the process, the sales data was processed with the help of the Assisted Modeling analyzed and prepared for a suitable ML analysis. In the specific case, a Decision Tree Regression applied. This simplified example is only meant to show one possibility of the interaction between Snowflake and Alteryx. In order to be able to train and test the model, much less data would also be sufficient, which would of course have a positive effect on the processing. In addition, we could process even more activities already in Snowflake, thus we could additionally benefit from the performance there and at the same time from the graphical "programming" in Alteryx.
Conclusion: The integration in Alteryx is very easy and works right away. It brings enormous advantages in processing large amounts of data and enables the use of Snwoflake USPs (warehouse scaling, data integration, MPP, data sharing, etc.) directly from Alteryx. We are excited to see how the integration will continue and how the worlds will become even more connected.
Bilder-Gallery