OSFF Insights: How can modern database software simplify reporting?

Written by Jeremy Taylor | 11/20/24 10:45 PM

Data challenges are universal and only grow more complex each year, but I’ve observed that many financial services firms still lack open source software capabilities to address basic data quality and accuracy challenges. Conversations with fellow delegates at the recent OSFF NYC further affirmed my belief that common reporting and regulatory compliance requirements are forcing software developers to re-implement ad hoc common concepts - such as soft-deletes, audit-logging, and as-of queries - in almost every system they build, regardless of scale. Beyond the sheer inefficiency of it all, this implies that firms are becoming overly reliant on the abilities of large (and expensive) engineering teams to build and maintain these capabilities correctly. Let’s consider how open source software might rise to meet this challenge.

Author: Jeremy Taylor, XTDB Head of Product, JUXT (a Grid Dynamics company)

Software Waste

“Billions of dollars in wasted engineering effort is spent on needless maintenance of private forks of open source software” - this was a rather jaw-dropping estimate mentioned during a FINOS-led roundtable session hosted at BMO’s office ahead of the main OSFF conference. The estimate undoubtedly includes vast numbers of frontend libraries on the one hand, and on the other a great many backend data infrastructure technologies. Just from a cost-savings standpoint, there is considerable potential, not to mention the many other upsides to be had by reducing the amount of work happening behind the firewall, and increasing the work happening in public, upstream development communities. Incidentally, Kelsey Hightower’s keynote talk illustrated this point superbly.

The researchers at FINOS who have been researching this topic of open source forks have focussed on a very narrow kind of waste, but at JUXT we often look much broader: How many systems needlessly exist because developers are working at the wrong level of abstraction? How much software should never have been written in the first place?

These are the kinds of questions that lead us to work on hard problems, like building a new open source SQL database. After all, databases were invented specifically to avoid the need to build new software for each kind of question someone might want to ask about their data.

As-Of 2024

Anyone who has worked in the financial services industry can attest that one of the most herculean tasks they undertake is managing massive amounts of data and the relationships between that data

[The 2024 State of Open Source in Financial Services report, Page 20]

A common unsolved problem I have observed across various projects is the need for as-of reporting. Put simply, the ability to look back at the history of data consistently, given a timestamp. Whether you are a trader who wants to understand your portfolio better, a back office developer working on a post-trade reconciliation system, or a quant looking to backtest strategies against historical data, almost everyone can identify with the desire to query against consistent snapshots of evolving data. Everyone is looking for relationships in data and trying to understand how data and relationships evolve.

Despite being the primary tool in an application developer’s tool belt, most SQL databases still don’t make this sort of as-of querying easy though, and that was really the point of my session at OSFF:

Open Source Gaps

In a previous blog post Why Financial Services Firms Are Choosing Open Source Databases To Drive Innovation (September 2024 by David Stokes, Percona) the case is made for avoiding vendor lock-in at the database layer and transitioning to open source database software, based on the premise that open source databases offer superior scalability, flexibility, and cost savings.

However, the most prominent open source SQL databases are all rather long in the tooth. MySQL and PostgreSQL were originally architected to reflect assumptions about storage and scaling which haven’t been relevant for over two decades.

The design of good storage layers in databases is deeply architectural. As a consequence, it is essentially a "forever" design decision.

In practice, the only way to change the fundamental architecture of a database is to write a new one, with everything that entails.

[@jandrewrogers, Hacker News, October 2024]

More important than assumptions about spinning disks vs. network flash storage arrays is the degree to which data is treated ‘mutably’ - this makes reporting very difficult without naive copying or advanced snapshotting techniques. Despite the SQL:2011 standard being over 13 years old, adoption of as-of reporting capabilities in database software like PostgreSQL or MySQL is almost non-existent in part because of the underlying assumptions that UPDATE actually mutates data, and that DELETE actually deletes data. Retrofitting is hard. But these ‘CRUD’ SQL semantics, which were established 50 years ago (SQL turned 50 years old in 2024!), are now preventing developers from easily building systems that reflect what regulators want to see - readily auditable systems that preserve an accurate view of history.

New Assumptions, Systems, and AIs

In the 2024 State of Open Source in Financial Services report there was a survey question on “Which open source technologies are valuable to the future of the financial services industry?” - the results suggested that organizations with 10,000 or more employees felt that “Database management” was only marginally more valuable to the future than “AR / VR, 3D technologies”. Should we presume from this that the database marketplace is essentially failing to deliver new value or be seen as a relevant source of new solutions to data challenges?

(The 2024 State Of Open Source In Financial Services, Figure 10)

Conversely, in the top spots for the results sit “AI / ML” and “Advanced analytics”, both of which are entirely predicated on timely and accurate use of data. It’s also well known that AI / ML generally needs as-of queries too under the guise of ‘reproducibility’.

As AI advances continue I expect we’ll see increasing focus on designing systems to establish the provenance and temporal accuracy of data consumed by AI in order to reduce hallucinations and improve predictions.

However, currently, many systems are being built to move and process data in support of all these new AI use cases that don’t even offer a SQL API. If we want to write and maintain less software as an industry then we should be pushing databases to do more, rather than unconsciously proliferating the number of database-like things in our software estate.

Various firms have recognised this imperative over the years and have attempted to build new databases in isolation, without the backing of external vendors. The 2024 State of Open Source in Financial Services report cited a couple of examples of FINOS-relevant database projects with “the greatest numbers of unique contributors”: Bloomberg’s Comdb2, an in-house relational database (Apache 2.0), and Man Group’s ArcticDB (BSL), a high-performance, serverless DataFrame database.

Building a database isn’t for the faint of heart, but many firms have taken on the challenge before, and with the rise of open source components like Iceberg, Arrow and DataFusion I expect many more will too in future.

Our Vision

Engineers like working on hard problems, and OSFF NYC was full of them - from the schema challenges of CDM to the urgency of rolling out quantum-secure cryptographic technologies - and JUXT’s own passion has been to help simplify the temporal reporting challenges that I know are so common in areas such as post-trade reconciliation and risk which are underserved by existing open source technology. To this end, a small and highly-motivated team within JUXT has developed our own open source, cloud-native SQL database: XTDB - built for compliance reporting and making as-of querying easy.

To demonstrate the ease of working with XTDB, we took part in the FINOS 2024 Tech Sprint to integrate XTDB into the ‘TraderX’ system:

Adding data observability and comprehensive as-of reporting to this microservice architecture, using a central XTDB database, took just a few days. You can find the code on GitHub here.

With the main TraderX integration Proof-of-Concept behind us, we now hope to move the needle on this topic by collaborating with like-minded firms in the FINOS community. The opportunity for simplification and cost savings feel abundant, and not just within financial services. The importance of accurate, auditable reporting outside of financial services is perhaps less well recognised, but XTDB users include organizations across the likes of insurance, healthcare, e-commerce and security.

Interested in talking? I’d be happy to hear from you hello@xtdb.com / info@juxt.pro

OSINFINANCE WEBINAR FEATURING JUXT

Explore how modern database technology is transforming financial services in our upcoming webinar: "Accelerated As-Of Reporting Without The Hassle: Modern Databases, Bitemporal Data, and XTDB"

Watch now!

About JUXT

For over a decade, JUXT’s technical experts have had great success in a wide variety of Fintech engineering projects, ranging from Tier 1 Investment Banks and Hedge Funds to Early-stage Startups. As of September 2024 JUXT is a Grid Dynamics (NASDAQ: GDYN) company, the partner of choice for Fortune 500 companies seeking transformative digital solutions. JUXT joined FINOS as a Silver member, announced during the OSFF, alongside G-Research, State Street, and Temporal.

Interested in FINOS open source projects? Click the link below to see how to get involved in the FINOS Community.

FINOS Good First Issues - Looking for a place to contribute? Take a look at good first issues across FINOS projects and get your feet wet in the FINOS community.

State of Open Source in Financial Services Report 2024 - Learn about what is really happening around open source in FSI.

This Week at FINOS Blog - See what is happening at FINOS each week.

FINOS Landscape - See our landscape of FINOS open source and open standard projects.

Community Calendar - Scroll through the calendar to find a meeting to join.

FINOS Slack Channels - The FINOS Slack provides our Community another public channel to discuss work in FINOS and open source in finance more generally.

Project Status Dashboard - See a live snapshot of our community contributors and activity.

Events - Check out our upcoming events or email marketing@finos.org if you'd like to partner with us or have an event idea.

FINOS Open Source in Finance Podcasts - Listen and subscribe to the first open source in fintech and banking podcasts for deeper dives on our virtual "meetup" and other topics.

View full post