RPA Tools in the Data Science Arsenal
Most Data Scientist are familiar with the most common programming language du jour. Some of them, like Python, are more than capable of running automated tasks such as sending out emails or modifying excel files. Then it begs the question as to why would we need to bother with Robotic Process Automation (RPA) tools such as UiPath or Blue Prism.
There are TWO key reasons for using RPA tools like UiPath or Blue Prims over the more conventional approach of running scheduled programming scripts:
Ease of use: RPA tools like UiPath have a user-friendly interface that allows even non-technical users to create and deploy bots. This is in contrast to traditional programming, which requires a higher level of technical expertise and coding skills.
Scalability: RPA bots can be easily replicated and deployed to perform the same task across different systems, making them highly scalable. This is especially useful for businesses that need to automate processes across multiple departments or locations.
In this blog post I will showcase how to implement Python scripts within a UiPath Workflow which involves reading, writing, and creating excel files, moving files across different folders, executing a Python machine learning model and finally to send out an email.
Background on the bot implementation
The rapid spread of the COVID-19 pandemic can be attributed, in part, to the limited availability of test kits in the early stages of the outbreak. This meant that asymptomatic patients were able to go undetected, leading to a significant increase in cases.
As testing capabilities increased, the availability of test kits increased and their cost decreased. However, the late implementation of these measures came at a high cost, with over 6 million deaths caused by the virus to date.
One potential solution that could have been implemented in the early stages of the pandemic was the use of machine learning (ML) models to detect possible COVID-19 cases through chest X-ray scans. This method is relatively fast and inexpensive compared to PCR testing for COVID-19 (Meng & Liu, 2020).
By incorporating robotic process automation (RPA), a system could be developed to quickly and efficiently detect potentially positive patients on a large scale in the event of future novel coronavirus outbreaks.
The machine learning model used for this project was developed using X-rays scans are obtained from a Kaggle dataset made available by a collaborative team of researchers from Qatar, Bangladesh, Malaysia, and Pakistan whose body of research involves the detection of COVID-19 using Artificial Intelligence (AI) (Chowdhury, et al., 2020; Rahman, et al., 2021).
Moving the X-ray Images to the staging folder
The RPA system begins with reading in the list of patients whose X-ray scans requires scoring. The Excel file contains the name of the patients, their details, and the file name of their corresponding scans.
The patient informations will then be read in as a Data Table in the UiPath sequence using the Read Range activity within the Excel Application Scope activity. The activity Copy File is then used to locate the necessary X-ray images and copy them over to the machine learning staging folder.
The activity For Each Row is used to iterate the column image_file_name in the Data Table. This is used with the value of the column image_file_name to locate the patients X-ray scans and move it to the machine learning staging folder.
The overwrite option was selected in the event that the patient would have to redo their scans and therefore the most recent scan of the patient will overwrite the older scans. Once all the files are moved to the staging area, we can then begin with the next phase of the process.
Scoring X-ray images using Python
The staging directory for the machine learning model contains the pickled file of the deep learning model used to score the X-ray images. It also contains the Python script needed to run the scoring process and generate the scoring results into an Excel file.
The RPA process introduces 7 new files in the directory in the form of the patients X-ray scans that was copied from the X-ray machine directory and 2 Excel files which contains the results of the scoring of those scans.
To execute the Python the UiPath package UiPath.Python.Activities v1.6.0 was installed. This package allows the use of Python activities such as loading in Python modules and invoking methods within the modules. Figure 5 shows the UiPath activities used for executing the python script main.py within the machine learning directory.
The activity Python Scope is used as the container for all the processes that institutes the use of the python modules. The properties of the Python Scope activity requires the path for the python executable as well as the working directory path of the main.py module. The scope contains 3 main activities that relates to executing the python code necessary to score the X-ray images that was imported into the directory.
These include the activity Load Python Script; that will load the python code into UiPath as a Python object and stores in the variable pyScript, Invoke Python Method; which loads in the variable pyScript from the previous activity and invoke a function or class within the loaded module and stores the output in a python object variable pyOutput.
The final activity Get Python Object converts the python object variable pyOutput into the UiPath string variable output. The string value stored in the variable output then gets written down to the UiPath terminal using the Write Line activity to indicate that the process ran successfully.
Outlook Email automation using UiPath
Once the scoring process has been completed, the RPA system will load in the scoring results from the excel file and store it as a data table in the variable dtScoreResults. This datatable will be joined with the patient information data table via the image_name column to create a single table that contains both he patient information and the scoring results from the scoring process.
This joined data table will be stored in a new variable called dtScoredPatients that will be used in the subsequent processes downstream. Next, the RPA will prompt the technical staff if he or she wishes to continue with emailing the attending physician the patients who were flagged to be at risk.
Should the user choses the option “No”, the RPA system will skip the emailing process and move on to the next phase. If the user chooses the option “Yes”, then the system will filter out the patients labelled as ‘covid’ in the data table and send the attending physician an email for each patient that was labelled ‘covid’ along with their corresponding X-ray scan.
This allows the physician to assess the patient and gives the physician the final say on determining whether the patient has COVID or not.
Two If condition are used, the first condition checks for the user input from the dialog box and if the output was ”Yes” it would then carry out the activities inside its body of sequence; the second If condition is nested within a data table for loop and it is used to filter for patients who are labelled as ‘covid’.
The filtered patient information then gets inserted into email body within the properties of the Send Outlook Mail Message activity. The X-ray scan image associated with the patient will also be attached to the body of the email.
Data Cleaning and Excel File Manipulation
The final phase of the RPA process involves cleaning the joined data table to remove the redundant column image_file_name that was created from joining the two data table in the previous process.
This is done using the Remove Data Column activity which takes in the column name and data table variable. The data table is then written into the machine learning directory path that is stored within the variable strJoinTablePath.
While RPA provides a solution that is much needed within the growing healthcare sector, due to demand increase as population number rises, concerns are raised regarding the ethics of using such a solution. The issue is one concerning the impact of RPA on the employability of human labour given the automation of the workforce.
However, such concerns are less distressing for the healthcare sector as most hospital staff are pulling double duty by attending to both the patient and administrative work. Introducing an RPA system that would address the administrative load would mean that the staff would then be able to better focus in providing care towards the patient.
RPA is not a substitute for good medical care and attention but is well suited to complement it.
Sources GitLab - https://gitlab.com/mohdshah27/x-ray-classification-with-pytorch Kaggle - https://www.kaggle.com/datasets/tawsifurrahman/covid19-radiography-database
comments powered by Disqus