Quick time-to-value data science: Delivering something functioning now over something perfect later
October 6, 2020
At a large organization in an established industry, there will always be“organizational inertia”: that is to say, it’s hard to adapt and respond to new threats (or opportunities) when the organization is busy staying focused on its day-to-day operations and delivering on expectations. The adage “if it’s not broken, don’t fix it” often reigns supreme. Change is hard, and staying focused is key – especially when people’s finances, businesses, and homes are at stake. In recent years, however, the banking industry in Canada hit an inflection point. The processing power of cloud computers has finally reached a critical point where it is now cheaper to rent secure space on a server, rather than buy and house data stores on-premise. Not to mention the data loss risk reduction provided by redundancy systems in cloud storage.
The cost of creating artificial intelligence algorithms for generating predictions has also dropped to the point where data science enthusiasts can compete in Kaggle competitions using nothing but a laptop and some Google Cloud Platform credits. This drastic decrease in the cost of prediction has resulted in a fundamental shift in the banking industry, and it is rapidly disrupting the business model of financial institutes of all sizes across Canada. Small fintechs, digital-only banks, and other new forms of banking services are popping up all across North America, eroding the margins of traditional financial institutes.
For data scientists within these larger traditional institutes, the task of transforming internal processes and deploying data science-driven products is more important than ever. But, how does one convince management that data science is worth the up-front investment? How can you get executives to take risks in an inherently risk-averse and regulation-bound industry?
The answer lies in the age-old sales technique of foot-in-the-door delivery. In other words, quick time-to-value (TTV) proof of concepts are critical in showcasing the power of artificial intelligence and machine learning-driven products. If a data science team can rapidly demonstrate the value of a new process or tool, management should have no choice but to adopt and implement to gain the upper hand over competition. Those who turn a blind eye to the value proposition will ultimately have their lunch eaten by startups who are using the same techniques to provide banking services of the 21st century.
At ATB, quick TTV data science projects are the lifeblood of the Enterprise Data Science (EDS) group. By working with other business units and applying data science techniques to solve banking problems faster and better than before, the EDS team is changing the way ATB makes banking work for Albertans.
One area ripe for value generation is Natural Language Processing (NLP). This is a fancy way of saying “taking what a human says, and having a machine interpret and act on it”. Since almost every stage of the banking process involves communication and interpretation, NLP can be leveraged to enhance existing processes, making them faster and more efficient without changing the basic principle of the service.
Case Study 1: Enterprise Data Science, Chatbots, and Stacking Micro-Efficiencies. As a case study, one can consider the EDS Chatbot Program, code-named EDGuR1.
Team members at ATB have a number of monthly administrative tasks to accomplish as part of internal record-keeping. This can involve completing timesheets, filling out vacation requests, finding required websites, resetting passwords, looking up internal acronyms, onboarding new staff; the list goes on. In any given month, these tasks can quickly add up and account for a significant percentage of a team member’s time – time that could otherwise be spent working on the core tasks that provide value to the company and our customers. To mitigate the time burden associated with these tasks, the EDS team leveraged the Google Natural Language and Hangouts Chat Application Programming Interfaces (APIs), to build a “helper chatbot” named EDGuR1. This chatbot is a “go-fetch” style bot developed within the Google Cloud Platform environment, which has a collective memory bank of all of the easy to describe (but hard to find) links and resources that EDS team members need to access every month.
Within about two weeks, EDS had developed a fully functioning chatbot that cost only a few dollars per month to operate.
This chatbot creates business value by stacking “micro-efficiencies.” For example, it often takes a user about a minute to find a rarely used link they’re searching for (look in history... no, wait, go to homepage and try to search… no, wait, maybe it’s in bookmarks…) Opening a hangouts chat window and asking the EDGuR1 chatbot, the link is returned almost instantaneously. For every interaction, one fulltime team member’s “minute” is saved. Now, that isn’t too impressive by itself. But, scale out that micro-efficiency to the 30 or so team members in EDS, and multiply it by 3 or 4 interactions a day, and suddenly we are talking about thousands of dollars a month in savings, with an operational cost of only a few dollars a month.
Of course, this isn’t a fully conversant general artificial intelligence system that can help us solve the meaning of life, but it’s not supposed to be. It’s simply an administrative assistant in a chat window, for any team member who would love an assistant but can’t afford it. It’s also quickly scaleable. By clicking a button, anyone in ATB can access EDGuR1, gaining access to the value-generating potential with zero added cost. The chatbot proof-of-concept program has gone on to create almost half a dozen other internal chatbots in other groups; each stacking their own micro-efficiencies to create tens of thousands of dollars in cost savings for the organization at large.
The time-to-value here is also incredible: in mere weeks ATB had a fully deployed and operational “conversational AI” project. Not only is there direct, measurable business value, there is untold value from the learnings associated with chatbots, Google Cloud infrastructure, and rapid, low-risk product deployment. A rapid spin-up chatbot assistant is a quick and easy way to deploy a data science solution in a large organization without spooking anyone, creating a tool that can be scaled almost for free, with minimal effort.
Case Study 2: Reducing Human Error and Enhancing Outcomes.
The next use case for quick time-to-value data science involves leveraging open source packages for Optical Character Recognition (OCR). Like any bank, ATB has manual processes associated with loan maintenance for business customers. One of these manual processes involves a dozen fulltime employees receiving emails of scanned financial statements from business clients, opening those files, scrolling through large documents until they find a specific element: a table, a company name, a target date – then extracting those values and manually entering them into a spreadsheet, so that the next month’s loan value can be calculated.
With thousands of loans of this nature, the workload is significant. And, humans make mistakes. Reading a number correctly doesn’t mean it will get entered into the spreadsheet correctly. To enhance this process, and super-charge the abilities of the staff associated with this process, the Operation Automate team joined up with the data scientists in EDS to automate the mundane and error-prone tasks in the value generation chain. This automation frees up team members to spend more time helping our customers in person, developing deeper relationships through one-on-one interactions – something a robot cannot do.
To pull off the automation of this process, EDS began by leveraging the open-source Tesseract package (https://en.wikipedia.org/wiki/Tesseract_(software)) to extract the relevant text from the scans of financial statement. Before text can be extracted, there are several machine-vision optimization techniques required for image “cleaning”, such as applying deskewing algorithms and custom denoising methods, as well as some machine-learning based classification of page types. Once the text is extracted from the documents, the easy part is over. Now comes the messy part. What is generated by the OCR portion of the software “stack” is a multi-layer nested data structure with text strings, text box coordinates, page numbers, and often a handful of mis-identified characters per page that was processed.
To make sense of this data landfill, the Data Scientists in EDS created custom algorithms for table extraction, table structure verification, company name detection, and statement date detection. Getting this software to function efficiently required the ‘science’ part of data science: lots of research, followed by lots of data analysis, data processing, a bit of trial and error to get the algorithms tuned right. Once the data science stack for relevant value extraction was built and tested to ensure accuracy, the development arm of EDS was brought in to build a fully-automated system that took care of document ingestion and data movement. After the relevant data is extracted, an Operation Automate team robot picks up the data outputs from the OCR stack and automatically calculates and updates the business client’s new loan balance.
All told, it took about five months to develop a fully automated stack that exists entirely within the cloud. This Robotic Process Automation (RPA) stack needs only to receive a financial statement in a cloud dropbox location. From there, all of the hard work of reading the document, extracting the relevant business information, verifying the information, and updating the customer’s loan is automated.
Of course, the process is being managed by humans to validate and check for accuracy, but once it’s trustworthy there is no reason not to give the RPA stack “the reigns”, and send the messy situations that fail the process to a human for investigation. This type of solution would have cost over a million dollars for an external vendor to create; instead, it was built in-house for a fraction of the cost, with all of the expertise and intellectual property staying within ATB.
Closing thoughts: The Balance of Relevance and Value.
The trade-off between speed, cost, efficiency, and value is a delicate line to walk. Take too long, and the business will lose interest. Cost too much, and the business will reject it. Not enough value, the business will lose faith. Not efficient enough, and the business will ask “why bother”. However, when done properly, quick time to value data science can be immensely transformational. Management and executives will see business value provided at a fraction of the previous cost, and (hopefully) say “we get it now”. By “getting a foot in the door” and showing other business units the power of data science, EDS is transforming the way ATB provides banking services to Albertans, using a low-risk, fail-fast process for delivering quick time-to-value solutions.