Von Kirsten Kleim
07 Juni 2021This article describes the use of the "Analytical Process Automation" (APA) software Alteryx from the perspective of a Data Scientist: Using a Python plugin in the "Alteryx" software, a simple neural network for image recognition is trained and subsequently made available to other business users as a macro.
A well-known subfield of artificial intelligence that relies on neural networks is computer vision. Computer vision deals, for example, with the recognition of images or even videos. The hypothetical image recognition use case demonstrated here also comes from this illustrative field:
"A supermarket has decided to save customers from memorizing fruit and vegetable identification numbers and wants to develop a supermarket scale with integrated image recognition. This would involve holding the fruit in front of a camera after weighing it, which would then recognize the fruit and determine the correct price."
Two types of fruit were selected for an initial trial: Apple and Orange. "Fruits 360 Dataset" [4] can be used as training data. This dataset is available on Kaggle and contains images of a wide variety of fruits and vegetables.
When training neural networks, training data is a very fundamental element. Neural networks are often perceived as intelligent. This perception can be quite misleading. Unlike humans, neural networks cannot use an explanation to understand what knowledge is needed or even 'intuitively understand'. Instead, the neural network needs example data and, for each example, the exact solution (= label) it is supposed to predict.;>
For the intelligent supermarket scale for apples and oranges, this means the following input data set:
Also the high number of necessary training data can be surprisingly high.
For questions of usual complexity, it is recommended to offer several hundreds to thousands of examples per solution group as training data.
In this highly simplified use case, where the color already clearly indicates the solution, a good result can be achieved even with less data (in this case 50 oranges and 50 apples were used). However, if shadows, different types of lighting, or background elements come into play, more data would yield better results here as well.
Training
Many Data Scientist use R or Python code to train the neural networks. "Alteryx" provides a convenient way via plugins to transfer data between an Alteryx - workflow and the coding environment, e.g. a Jupyter notebook. Thus, either Python code can be written in Alteryx in an embedded Jupyter notebook, or the neural network can be trained in the preferred environment and then loaded into the "Alteryx" Python / R- plugin.
In this example, a neural network of the convolutional neural network (CNN) type was developed and trained in Python using Keras. Alternatively, R can be used. Additionally Alteryx offers tools for neural networks.
Those interested in the technical details of how neural networks work can get a good insight from MIT's introductory lectures on Deep Learning: https://www.youtube.com/watch?v=5tvmMX8r_OM
Once a neural network has been trained to satisfaction, it can be stored for future use.
With "Alteryx macros", a trained neural network can be addressed with corresponding input and output elements and made available to business users as an application. In this example, an input anchor takes image paths to the images to be predicted and an output anchor returns the image paths with the predictions of the trained model. Since the mode is set to "APPLY", the neural network is just loaded and applied.
This ease of incorporating code developed in Python or R as an Alteryx macro facilitates collaboration between Data Scientists and business users.
Which specifics can be important when applying neural networks will be explained in more detail in the next article.
Outlook:
In article 3 follows: The use of the Alteryx software from the point of view of a business user: If a neural network was produced by Data Scientists, the software "Alteryx" enables other business users to use such prefabricated neural networks themselves in an elegant way, without having to delve into Python or R code. In addition, an exemplary application illustrates the limitations of AI.
In article 4 follows: So in what areas can the strengths of AI be leveraged in SMEs? A typical implementation process flow is presented.
Further reading:
If you are interested in the details of how neural networks work, you can read about them in the MIT introductory lectures on Deep Learning. https://www.youtube.com/watch?v=5tvmMX8r_OM gain good insight.
For a more general insight into processing data using AI, Alteryx Community offers a Data Science learning path.
Literaturverzeichnis:
[4] Muresan, H., Oltean, M. “Fruit recognition from images using deep learning”, 2018, in Acta Univ. Sapientiae, Informatica. Vol. 10, №1, pp. 26–42. [online] Available: https://arxiv.org/abs/1712.00580.