Von Arno Bürgin
23 Oktober 2019In the third and last entry of our blog series "Using Alteryx" we would like to introduce you to a common use case in data processing: Linking multiple tables. In Excel you would probably use SVERWEIS (VLOOKUP) - but in Alteryx it's easier. By the way: Blog entry #1 and #2 can be found here:Part 1, Part 2.
Use case: SVERWEIS in Alteryx - simply link tables
The initial situation is quite simple. We have a long table called "Sales.csv" which contains all sales of the past months. Each row corresponds to a transaction and is uniquely identifiable by the variable "SalesID":
A very nice table! For each sale is indicated which product was sold and in what quantity. Unfortunately, however, the invoice amount is missing. So we have no choice but to multiply the quantity ("Quantity") by the unit price. The catch: product prices are kept in a separate table, "Product.csv". So the first thing we need to do is integrate the product prices into the source table, "Sales.csv". A closer look shows that both tables contain the column "ProductID":
In Excel you could copy the price column over using SVERWEIS (VLOOKUP). The equivalent to SVERWEIS in Alteryx is called "Find Replace". The tool is dragged and dropped onto the workspace and connected to the two tables. Then the column to be migrated can be specified:
Second possibility: For the migration of one or more columns - or even a whole table - Alteryx offers the Join tool. In the following illustration, two tables are joined together, but without any fuss, several tables can also be joined together. The only information needed is what the primary and foreign keys of the two tables to be joined are (in our case it is the "ProductID").
Compared to Excel, there are two clear advantages: A lower susceptibility to errors, since no formulas or markings of columns have to be made manually, and furthermore an incredible flexibility in determining the number of columns or tables to be integrated. The latter becomes even more important the more tables are to be processed. Because in databases it usually does not remain with two tables - several tables are related to each other by different foreign keys.
Use case 2.0: Relational data model
In an extended use case, this could look like this, for example:
As you can see, the data model consists of four tables ("Employee.csv", "Sales.csv", "Product.csv" and "ProductCategory.csv"). The primary keys are marked with a golden key symbol, the foreign keys with a silver key symbol. (By the way, we created this entity relationship model in our Innovator modeling tool).
The workflow we will create will end up looking like this:
Step 1
«Sales.csv» read in.
Step 2
Unfortunately, some records are incomplete or have incorrect formats. We exclude records without date and bring all data into the correct format:
Step 3
In the next step, we narrow down the data to January 2018. For this we use the handy filter tool from Alteryx.
Step 4
Clear the stage for the star of the day! The join tool connects two tables in no time at all. Via drag and drop you specify which two tables are to be joined, then you only have to specify which columns represent the keys. In our case this is the column "ProductID" twice, see the following visualization:
Physically, the linking of the data looks like this:
Step 5
The Formula tool can be used to perform an almost unlimited number of calculations. In this case, it remains simple arithmetic - we calculate quantity times price to calculate the revenue for each sale (each "SalesID").
Step 6
The Summarize tool is the tool of choice when it comes to aggregating tables. Two or three clicks and we have calculated the sum of sales for each "SalesPersonID":
Step 7
The result is not bad, but not really exciting. What is still missing are further details about the sales staff; these are stored in the table "Employee.csv". We integrate this table again with the join tool:
As can be seen in the figure, the join is performed using the two fields "SalesPersonID" and "EmployeeID", which have different names but contain the same values, i.e. act as foreign and primary keys respectively.
Step 8
Finally, we sort the table by sales size, so that the salesperson with the highest sales is displayed at the top. We get the following final result:
Done! Best of all, Alteryx makes it easy to automate workflows. In this example, you could run the ranking calculation every week - without any additional effort and even if the data source changes.
Got a taste for it? Here you can download a free trial version of Alteryx and test the tool.
Banian will be happy to answer any questions you may have.