Author: Jeremy Taylor, XTDB Head of Product, JUXT (a Grid Dynamics company)
Software Waste
“Billions of dollars in wasted engineering effort is spent on needless maintenance of private forks of open source software” - this was a rather jaw-dropping estimate mentioned during a FINOS-led roundtable session hosted at BMO’s office ahead of the main OSFF conference. The estimate undoubtedly includes vast numbers of frontend libraries on the one hand, and on the other a great many backend data infrastructure technologies. Just from a cost-savings standpoint, there is considerable potential, not to mention the many other upsides to be had by reducing the amount of work happening behind the firewall, and increasing the work happening in public, upstream development communities. Incidentally, Kelsey Hightower’s keynote talk illustrated this point superbly.
The researchers at FINOS who have been researching this topic of open source forks have focussed on a very narrow kind of waste, but at JUXT we often look much broader: How many systems needlessly exist because developers are working at the wrong level of abstraction? How much software should never have been written in the first place?
These are the kinds of questions that lead us to work on hard problems, like building a new open source SQL database. After all, databases were invented specifically to avoid the need to build new software for each kind of question someone might want to ask about their data.
As-Of 2024
Anyone who has worked in the financial services industry can attest that one of the most herculean tasks they undertake is managing massive amounts of data and the relationships between that data
[The 2024 State of Open Source in Financial Services report, Page 20]
A common unsolved problem I have observed across various projects is the need for as-of reporting. Put simply, the ability to look back at the history of data consistently, given a timestamp. Whether you are a trader who wants to understand your portfolio better, a back office developer working on a post-trade reconciliation system, or a quant looking to backtest strategies against historical data, almost everyone can identify with the desire to query against consistent snapshots of evolving data. Everyone is looking for relationships in data and trying to understand how data and relationships evolve.
Despite being the primary tool in an application developer’s tool belt, most SQL databases still don’t make this sort of as-of querying easy though, and that was really the point of my session at OSFF:
Open Source Gaps
In a previous blog post Why Financial Services Firms Are Choosing Open Source Databases To Drive Innovation (September 2024 by David Stokes, Percona) the case is made for avoiding vendor lock-in at the database layer and transitioning to open source database software, based on the premise that open source databases offer superior scalability, flexibility, and cost savings.
However, the most prominent open source SQL databases are all rather long in the tooth. MySQL and PostgreSQL were originally architected to reflect assumptions about storage and scaling which haven’t been relevant for over two decades.
The design of good storage layers in databases is deeply architectural. As a consequence, it is essentially a "forever" design decision.
In practice, the only way to change the fundamental architecture of a database is to write a new one, with everything that entails.
[@jandrewrogers, Hacker News, October 2024]
More important than assumptions about spinning disks vs. network flash storage arrays is the degree to which data is treated ‘mutably’ - this makes reporting very difficult without naive copying or advanced snapshotting techniques. Despite the SQL:2011 standard being over 13 years old, adoption of as-of reporting capabilities in database software like PostgreSQL or MySQL is almost non-existent in part because of the underlying assumptions that UPDATE actually mutates data, and that DELETE actually deletes data. Retrofitting is hard. But these ‘CRUD’ SQL semantics, which were established 50 years ago (SQL turned 50 years old in 2024!), are now preventing developers from easily building systems that reflect what regulators want to see - readily auditable systems that preserve an accurate view of history.
New Assumptions, Systems, and AIs
In the 2024 State of Open Source in Financial Services report there was a survey question on “Which open source technologies are valuable to the future of the financial services industry?” - the results suggested that organizations with 10,000 or more employees felt that “Database management” was only marginally more valuable to the future than “AR / VR, 3D technologies”. Should we presume from this that the database marketplace is essentially failing to deliver new value or be seen as a relevant source of new solutions to data challenges?
(The 2024 State Of Open Source In Financial Services, Figure 10)
Conversely, in the top spots for the results sit “AI / ML” and “Advanced analytics”, both of which are entirely predicated on timely and accurate use of data. It’s also well known that AI / ML generally needs as-of queries too under the guise of ‘reproducibility’.
As AI advances continue I expect we’ll see increasing focus on designing systems to establish the provenance and temporal accuracy of data consumed by AI in order to reduce hallucinations and improve predictions.
However, currently, many systems are being built to move and process data in support of all these new AI use cases that don’t even offer a SQL API. If we want to write and maintain less software as an industry then we should be pushing databases to do more, rather than unconsciously proliferating the number of database-like things in our software estate.
Various firms have recognised this imperative over the years and have attempted to build new databases in isolation, without the backing of external vendors. The 2024 State of Open Source in Financial Services report cited a couple of examples of FINOS-relevant database projects with “the greatest numbers of unique contributors”: Bloomberg’s Comdb2, an in-house relational database (Apache 2.0), and Man Group’s ArcticDB (BSL), a high-performance, serverless DataFrame database.
Building a database isn’t for the faint of heart, but many firms have taken on the challenge before, and with the rise of open source components like Iceberg, Arrow and DataFusion I expect many more will too in future.
Our Vision
Engineers like working on hard problems, and OSFF NYC was full of them - from the schema challenges of CDM to the urgency of rolling out quantum-secure cryptographic technologies - and JUXT’s own passion has been to help simplify the temporal reporting challenges that I know are so common in areas such as post-trade reconciliation and risk which are underserved by existing open source technology. To this end, a small and highly-motivated team within JUXT has developed our own open source, cloud-native SQL database: XTDB - built for compliance reporting and making as-of querying easy.
To demonstrate the ease of working with XTDB, we took part in the FINOS 2024 Tech Sprint to integrate XTDB into the ‘TraderX’ system:
Adding data observability and comprehensive as-of reporting to this microservice architecture, using a central XTDB database, took just a few days. You can find the code on GitHub here.
With the main TraderX integration Proof-of-Concept behind us, we now hope to move the needle on this topic by collaborating with like-minded firms in the FINOS community. The opportunity for simplification and cost savings feel abundant, and not just within financial services. The importance of accurate, auditable reporting outside of financial services is perhaps less well recognised, but XTDB users include organizations across the likes of insurance, healthcare, e-commerce and security.
Interested in talking? I’d be happy to hear from you hello@xtdb.com / info@juxt.pro
Join the Open Source in Finance Webinar FEATURING JUXT!
December 11th at 11 am ET / 4 pm GMT
Explore how modern database technology is transforming financial services in our upcoming webinar: "Accelerated As-Of Reporting Without The Hassle: Modern Databases, Bitemporal Data, and XTDB"
Register now!
About JUXT
For over a decade, JUXT’s technical experts have had great success in a wide variety of Fintech engineering projects, ranging from Tier 1 Investment Banks and Hedge Funds to Early-stage Startups. As of September 2024 JUXT is a Grid Dynamics (NASDAQ: GDYN) company, the partner of choice for Fortune 500 companies seeking transformative digital solutions. JUXT joined FINOS as a Silver member, announced during the OSFF, alongside G-Research, State Street, and Temporal.