Automated data retrieval from the Internet

Author Image

Von Dominik Imark

26 Juli 2021

This use case is about downloading available information from web pages, aggregating it and uploading it in this processed form to an SFTP server, saving it as an Excel or XML file. In step 1, the download of any number of the latest reports is discussed. Then, in step 2, the processing is shown and finally, in step 3, the flexible storage of the prepared report is discussed. However, this type of data retrieval is not only suitable for the financial sector. In principle, the automation of any repetitive process where data is to be obtained from the Internet or other sources is possible.

Schritt 1 - Download von SIX Trade Repository Weekly Statistics Daten

Excel files are identified from the HTML code of the web page and downloaded. The exact file to be searched for is predefined in Alteryx. In the download tool (light green field with lightning) the HTML code is downloaded. Since the Excel files are stored as links, "RegEx" is used to filter the HTML file for all Internet links that contain "weekly reports" and the Excel file format ".xlsx". In order to use only the newest files, these are additionally sorted by newness. Now a dynamically selected number of the newest Excel files is downloaded from Alteryx and stored locally, i.e. on the own system.

The entire process can also be easily packaged into an app and made available to other business users. All that needs to be done is to set the desired number of recent files and the storage path. The advantages of such an app are that the user can use an automated process without having to program anything himself and only has to adapt it minimally to his needs.

Step 2 - Processing of raw data and output as file with sending via SFTP or output as XML

The challenge in combining the files into one overview is that each SIX Excel file consists of three tables, which are placed one below the other. These tables have to be rearranged to generate an overview of all weeks. Lists with only the required information and without formatting are easier for the whole process in the further processing of the data. Basically, a list is created here as the user wants it.

To begin, the files downloaded in step 1 are opened. Alteryx's many pre-built data preparation tools make it easy to sort, filter and rearrange data. Instead of listing values such as "Commodities" or "Interest Rates" repeatedly for each category and again per week, they are aggregated and rearranged to create an overview across all weeks.

One of the great strengths of Alteryx is the convenient preparation of data in automated pipelines, such as rearranging and aggregating the various reports into a summary weekly overview. In addition, Alteryx also offers various options for accessing numerous file and storage types.

Step 3 - Deposit types

In step 3, the storage is explained on three storage types. First, the table can be created on an SFTP server. For this purpose, a specific macro was created for this use case, which takes the file as input and then stores it on the server. A macro allows pipelines or even Python /R code to be packaged and made available in a user-friendly way.

A second possibility for storage is the XML output. Again, a macro is used that wraps the underlying Alteryx workflow, thus simplifying the application or increasing clarity. This macro, like many specialized macros, has been shared by other Alteryx users in the Alteryx community for general use. The active Alteryx community offers not only specially made macros and bug-fixing discussions, but also learning paths and many other materials related to Alteryx.  

Finally, one last and also the easiest way to save the week summary table as Excel. This is done in one step with an output tool.

Data processing - dynamic automation from the source to the filing system

Alteryx offers very suitable software for process automation. This type of data collection can be applied in any area of a business activity. Manual processes that take a lot of time become minimally time-consuming tasks. This results in more efficient operation and more capacity for other tasks. The dynamics of data processing allows for the use of sources other than the Internet. Thus, for every industry and business sector, this process automation is a means of increasing efficiency.

We at Banian AG are happy to help you with an initial consultation on Alteryx or any other topic. Contact us by phone +41 (0)61 551 00 12, per Mail or with a message on LinkedIn.


Bilder-Gallery

Content


Twitter


Address

St. Jakobs-Strasse 3, 4052 Basel, CH

Phone number

+41 (0)61 551 0012

Linkedin

banian-ag

We look forward to hearing from you