Reimagining Redundancy in a collaborative real-time machine learning system
November 10, 2020
“Do you want cheese on your cheeseburger?” The following is a summary of a recent white paper titled, “New Ideas and Emergent Results (NIER)” researched and written by Md Monir Hossain, Alireza Goli, Hamzeh Khazaei, Dmitriy Volinskiy, and Omid Ardakanian (as part of ATB’s research aimed towards building a real time machine learning system ). The paper explores the concept of “redundancy” in designing this system.
Redundancy in the software and information domain used to mostly be about the system’s fault tolerance. Fast forward to present day, however, and the increasing availability and affordability of cloud computing has changed many design paradigms and has led to the proliferation of software systems running on cloud providers’ “everything-as-a-service” platforms. The former can no longer be vertically-architected and centrally-managed.
On one hand, this shift provides a new blissful sense of dependability as, arguably, design diversity becomes both deeply intrinsic and multi-faceted. On the other hand, the typically loose coupling of system components creates new challenges: the state the system is in, the total number of its states, and other metrics turn from a quantity “uncertain” into a quantity “unknown”. This is where reimagining redundancy becomes critical. The very meaning of certain key redundancy notions begin to change as the system becomes a fluid agglomeration of modules built in-house, cloud’s services, APIs, distributed storage, etc.
At ATB, this shift in how we need to view and embed redundancy meant exploring the potential of incorporating Machine Learning to ensure redundancy in an ever-changing environment. what are we building (ie: describe messaging bus … Thus, at the center of the New Ideas and Emergent Results (NIER) paper is the design of a real-time machine learning system which we dub “collaborative.” Similar to how a social network enables its participants to interact in multiple planes, components of our system listen and/or publish to a variety of topics in a high-throughput pub/sub messaging bus. This leads to a de-facto complete decoupling, as any given component has no inherent knowledge regarding the existence, kind, state of any other component except for the messaging bus. There is no coordination nor any direct information flow between any set of components. Hence a cheeseburger needs to know if it already has cheese or not (or, in ATB’s sense, a __ needs to ___).
A key design feature this seemingly primordial architecture yields is the ease with which an author can connect their component of arbitrary design to the system, and how the system can naturally containerize its connected software. The system thus grows and optimizes itself, and develops more functionality – not according to a centrally provided blueprint – but as a result of collaboration of multiple contributing authors (hence the “collaborative” nomenclature piece). As it’s highly likely that multiple authors will supply multiple equivalent solutions, redundancy comes to the forefront.
To create the New Ideas and Emergent Results paper, ATB conducted two mini-studies; one related to software redundancy and the second discussing certain aspects of information redundancy. On the software redundancy front the team considered a case of two databases, of very different kinds, storing and retrieving identical information from the messaging bus.
Not only do we comment on the setup needed to achieve this — given that the information requestor knows nothing about the existence, nature and query syntax of the databases — but we also consider how one can piggyback off this redundancy to handle data requests intelligently given the requestor’s preference for either low latency or data consistency. Information redundancy that the second study is dealing with arises due to the fact that our real-time system has no facility to synchronize or dispatch data flows in a particular way.
Uncurated data gets released into the system the moment it becomes available, which may lead to it appearing with delays or in bursts. This could wreak havoc on machine learning models deployed on the orchestration level of the system, as the models use time series data summarized over various time windows and have no way of telling an artifact due to a delay or burst from a meaningful change in the data-generating process.
A technique we consider to remediate this is by employing artificial streams of data, which are regularized and blended (and de-blended, too) with the real data in case the handler detects an irregularity in the respective data stream.
This collaborative real-time machine learning system is being built as we speak; techniques that we’re discovering and observations that we continue to make will require a substantial level of maturity. Nonetheless, we believe that the present paper will be instrumental to practitioners in the field, hopefully shining new light on familiar concepts as it relates to redundancy in software systems and creating conversations and momentum with a broader community of talented minds along the way.
That way, our systems will never need to ask “do you want cheese with your algo-cheeseburger?
Here at ATB, we are redefining what it means to create happiness through exceptional financial services, leveraging the power of machine learning. Follow along with ATB’s Innovation Lab initiatives and subscribe to our newsletter below.