Community Blog

OSFF Insights: Enabling Real Time Regulatory Compliance with Kafka Streams and Morphir - Anna McDonald, Confluent

Written by Ashley Tarsillo | 5/1/24 8:07 PM
Anna McDonald, drawing from her background rooted in the banking sector, reminisces about the history of collaborative efforts among banks, reminiscent of the present and future potential of platforms like FINOS. She explores the inevitability of regulations and the critical role of eventing, particularly emphasizing Kafka Streams' application in ensuring continual compliance.

Want to be part of these groundbreaking discussions in finance? Save your seat for the next Open Source in Finance Forum in London, 26 June. Don't miss out!

 

Anna McDonald's talk at Confluent delved into the intersection of real-time regulatory compliance and technology, with a particular focus on Kafka Streams and Morphir. Leveraging her background in finance, Anna stressed the importance of collaboration and innovation in tackling regulatory challenges, aligning with the ethos of FINOS. She highlighted Kafka Streams' pivotal role in ensuring continual compliance by processing data in real-time. Additionally, Anna introduced Morphir, a tool designed to aid in encapsulating business logic and enhancing developer productivity, which holds promise for the financial industry. Through Morphir's intermediate representation, developers can streamline testing and deployment processes, leading to increased confidence and efficiency in managing complex data pipelines. Looking ahead, Anna discussed future developments such as expanding support for KTables and integrating Avro, underscoring the collaborative nature of the project and inviting contributions from the FINOS community. Her passion for technology, compliance, and collaboration showcased the transformative potential of these tools within the financial industry's regulatory landscape.


Join experts from across financial services, technology, and open source to deepen collaboration and drive innovation across the industry in order to deliver better code faster at Open Source in Finance Forum - London.

 

TRANSCRIPT

Anna McDonald: [00:00:00] It's 2. 05. Amen for punctuality. So my name is Anna McDonald. I am the technical voice of the customer at Confluent. What that means is I work directly with engineering, I don't get paid on commission, and my job is to make customers successful. Irrespective of how much they pay us. So first a little history.

Does anyone know what this building is? Anyone? It's a bank. There's a clue. This is Gold Dome Savings Bank. It was a bank, used to be Buffalo Savings Bank, back in the 80s. And my dad was the Deputy General Counsel. So I was raised by a bank general counsel. So it's in my blood. And every time we'd go anywhere, he would point out these little signs that were on all the ATMs.

And we'd go, hey, you know what that is? And I would say, what? And he would say, something we did that was a big pain in the butt, but we got it done. He would have to go down to D. C. for regulations. And really, what it taught me, by the way, that's my dad, back in the 80s, and that's me in the 80s, pretending to work at a [00:01:00] bank in computers.

It was founded by eight member banks, and you'll recognize many of these names. They're either, today they've become, Chemical, shout out JPMC, right? And this is making me mad, because the city, anybody here from Citi? See, where's my Citi ladies? So there was one major bank that didn't participate in the New York Cast Exchange.

And it was Citi, which would have been hilarious if they were here, but they're not. But these eight banks worked together to provide ATM access. At all the member banks. Which is a huge customer outcome. I remember my mom going, Hey, we don't have to drive across town anymore. I just use whatever ATM we want.

And it was really in my, by the way, I did print out the original New York Times Business Day article. I have it. It's a great read. And NYCE still exists today. I think Fidelity Investments manages NYCE now, or chairs the the commission on it. But it was banks working together to provide an outcome that they couldn't provide alone.

If that's the past, to me, FINOS is the future of banks working together. It's why I'm so passionate [00:02:00] about it. Now, if you remember, I said that my dad had to go to D. C. a lot. And that's because with all good outcomes comes regulations. You can't get away from them. And so When I think about regulations, and I think about eventing, which I'm very passionate about, we see Kafka Streams being used.

And Kafka Streams, how many people here have ever used Kafka Streams? Yeah! What's up? I love this great crowd. For those of you that haven't, KStreams is just a Java library. That's all it is. As opposed to other things, which would be like a Spark, or like a Flink, where you have a framework, you have a compute platform.

KStreams It works with Kafka, that's why it's called Kafka Streams, and not just streams, it's not generic. And at the heart of the flow is a topology, which describes your processing logic. Sometimes they can get very large. So what we see in the field when it comes to KStreams and regulations, is these types of continual compliance.

I sat here last year at Steven's talk, and I remember I asked a very loaded [00:03:00] question. And he was perfect, he was like, I said, Is there ever any time where you feel like a real time framework might be more appropriate? He was like yes, there's other, in Kafka Streams, and I was like, score.

So he was very happy. And it made me, when I was there, I realized that this was the answer to a problem that I had seen and been trying to solve. We see people now, especially, there's all this research now on guardrails. Everybody knows legislation is coming for AI. Specifically for how can we use AI in highly regulated industries.

Kafka Streams is going to play a very important part in that by, implementing these lightweight guardrails. It may not be doing large language models, but it's going to make sure and enforce appropriate use of them. And then event detection workflows, right? When do we need to signal something because it might be regulatory?

These are the type of regulations, when we look at microservices and event streaming, that you really want to target with KStreams. Remember I said the heart of KStreams is the topology. Now, if you look at an example This is the topology that you're going to see. [00:04:00] If you look in the real world, and this is an actual topology, but the names have been changed to protect the guilty this is really more what you end up with.

Unpacking that and trying to figure out what's going on, I, I, we had a great fireside chat with someone from JPMC, and he said one of the things that he wants the most is to free our business logic. For being in these applications where we can't see them and that really stuck with me because that's something we see every day.

So the question is how can we tell what's going on? Enter Morphir. When I was sitting there at that talk, I thought, oh my gosh. This is like an actual thing that has legs and could work because I've seen so many abstractions of business logic fail. They're too complex. They're too tied to a specific technology that there's not going to be mass adoption.

And Morphir was really one of the first ones where the founding principles of it have legs. Those founding principles to me, and where it comes into play for KStreams, is the intermediate [00:05:00]representation. So there's this beautiful agnostic thing that sits there and describes not only your business logic, but your structs, which everything in KStreams, how many people know what a SERTES is?

I'm sure anyone who uses KStreams, aren't they fun? They're so fun, right? Because it's so fun to describe your data. So we have structs. All of this gets, rolled up into this beautiful intermediate representation that's agnostic of any technology. There's a built in UI. I am a command line kiddo.

But that does not mean that I don't appreciate that the easiest way sometimes for people to look and collaborate is through the UI. A UI, or a GUI, and test case generation. This is something that Kafka Streams does not do well. I love Kafka Streams, I'm the queen of Kafka Streams, but I will hold it accountable.

We have many KIPs in process we have ideas, we're trying to work on this, but today, testing a Kafka Streams app is unwieldy. And that's like being kind about it. And then, that extensible target framework. So that intermediate representation, Like I said, it's agnostic. You can [00:06:00] target anything you want, right?

You can add whatever you want. Today, there's Scala's in there. There's one for Spark, and I know that there are others on the way for me just totally creeping on Slack. I watch. I saw that, yeah. I do. I know I don't say anything, but I'm watching. Always watching. When I heard that talk, I thought, Alright, this is the sauce.

How do we unpack this? How do we chop this tree off? I love this woman. That Bonna is fantastic. So if we blow up that horrible, complicated DAG, and we look at our processing nodes, Matisse and I had a really good thought about this, and we thought, okay, what unit should we target to encapsulate? Like, how do we want to model this in a way that really focuses on that business logic and processing?

And in these kind of huge DAGs, we've got these operations, right? Filters, flat maps, and so we said we'll start there. Now, in Kafka Streams, there's two basic kind of tenets. There's a k stream, which most of the time is stateless, and then there's a k table that's stateful. [00:07:00] So in order to prove our concept and say, does this have legs?

We decided to start with a k stream and to start with something very simple, right? See, would we have an aha moment? And so now I'm going to show you, by the way, this was AI. Look, this dude's got like half glasses. I don't know if anyone notices that. So yeah, getting better. I don't have access.

I don't have any credits on whatever that is mid journey. So it's Budget AI. Hey, did I scare you guys all with my desktop? Can you tell that I don't use UIs? I don't care what that looks like. I'm not trying to find anything in there. I use grep. So if we look at this, everything for me at least, when I look at Morphir, it starts with this elm.

And what we wanted to do was to utilize what Morphir already had to make it feel natural for somebody who's struggling in case streams already. But still respect the tenets. Matthias and I sat down and we said, Alright, we're gonna drive everything according to methods. So if you look here that's my record, right?

And we have to dos on this too because how many people here use KStreams, use [00:08:00] Abro? Anybody? Yeah, what other schema types do you guys use? None? Is that, none? None's the best one. If you can get away with it. I'm a huge fan. Yeah. And what that, if you look at the struct, right?

Eventually, we want people to think about these as schemas. And Morphir actually has a create JSON schema. Boom, shakalaka. And we want to try to extend that so we can give people visibility. Because when schemas change, things break. And part of the thing I love about Morphir is we're preventing the breakage.

That's a goal of ours in KStreams, is to build out And help enrich that part of the Morphiror ecosystem. So if you look here, it's pretty simple. By the way, I love Elm. I think it's super fun. Cause it has a pipe. Who doesn't love a pipe? Everything should be a pipe. It's so much easier than object oriented.

I'm a, I am, I make no bone, I'm a functional programmer. My brain is very functional, so I was very happy in here. So we have, this. And what we did is we, forked Morphir Elm. And we added specific Kafka Streams, we added an AST, we added, a bunch of different things that, that kind of make this work.

[00:09:00] Now, we haven't contributed yet because there's still some things we're embarrassed of. And so we will be pushing that, or at least speaking to the Morphir luminaries, to get that in once Matthias and I, clean up a little bit more of our dirty laundry. So look for that.

I'm targeting absolutely by the end of the year. So when we look at this the first step is that beautiful intermediate representation, right? And you get that by just doing make. Also, nice dash i, even though the JSON is still a little bit, but it's nice to have a dash i there. It's like a little pretty print, just get the block.

So when we make this, what that's going to do we compile that, and here it is. This is going to give us our IR. And that intermediate representation is really just describing what we just saw on the elm. But wait, the next step is, okay, that's great. But how does this make me feel like I'm getting out of that being locked in code?

Cause it's still not code, but a markup language is not that much better. And this is when we got this far. I was like, this [00:10:00] is the sauce. So imagine you're developing your Kafka streams out and this is what's coming in. So you've got a schema, it's got two numbers, right? We all know that it's going to be more valuable.

And all of a sudden. You can actually see your filter like this. Not only that, and this is amazingly saucy. I love it. You can actually add test cases here. So if I put this integer's 10, and even though we're only checking one, we know we bring the other package along for the ride. So this one's 20.

And look at this. We get our expected output value. Because every single record that passes this filter up here, number one's less than 100, we expect it to output as is. We're not doing any transformation. Oh, look. I can also save this as a new test case. What son? So we got to this part and I was like, this is amazing, this is pulling exactly what I wanted, that spaghetti part of that bag, like that incredibly complex stuff.

I can just have that and be able to get in here and test it as I'm developing. So [00:11:00] when I, and I know a lot about KStreams development, I used KStreams starting until 2016. I, Matthias J. Sachs, Dr. Matthias J. Sachs, is my best friend in the entire world. And there are great things about KStreams and there are really hard things.

This is one of the hardest. Once we ran this, there's no doubt in my mind that this is valuable. If we look in the X Ray view, I also love this, because what if I forget what the input format is? Ooh, look, I just click on that. Now imagine this in one of our actual complex schemas that we're trying to pass around.

Imagine this if it's a customer aggregate, right? Imagine using this with a common domain model. How much fun is that? So we have all of this synergy, and observability that just comes out of the box, and we haven't even created a KStreams app yet. And that, to me, is the power of Morphir. Because this will work with KStreams, could work with Flink, could work with Spark, Scala, Fsharp I think it is that we're doing next.

Yes, thank you. I'm stalking on Slack. Now that we have all this, [00:12:00] Matthias and I sat down and we were like, again, behold the beauty of my desktop. We sat down and we said, okay, when we write this, when we do a k streams, how do we want to encapsulate this so people feel good about it? And we thought, all right, what we really want to do is just produce the methods, produce those k stream methods, and then let people include them.

So let me show you what this is about. So then we added a target. Look, isn't that pretty? I was so happy. I was like, look, dash T k streams. What son? Love it. And when we do this, what this is going to do is it's going to create that sample app. Now this app is in Scala, because you can use KStreams in Scala.

So another additional item is to just say, okay, let's use Java 2, but you can mix them and match them. And we actually do see a lot of Scala in Kafka Streams as a functional person that kind of makes me happy. So if we go up, come on, pop in there, to our dist, Habuji. In here, you can see [00:13:00] we've got our, we've got our struct which, again, we want really tight Avro integration here, because we all know that kind of breaks things and everyone's mapping everything to an Avro class most of the time.

And then, in here, you can see, we've just got two KStreams definitions for functions. What I wanted to do was show you how would you integrate that in an actual KStreams app. Let me grab this one. And this is the way it would look. If you remember, this is the code that was produced by our Kafka Streams.

And then this is probably more what people are used to when you think of a KStreams app, right? We've got all of our boilerplate code, which Morphir refers to as, what do we call it, devbots? Is that right? Like our, yeah, devbots. I like that. It's like battlebots. We should make them fight. And our boilerplate code.

And then all we do here is we just call that method. What this also allows you to do is that anything that's driven by Morphir, you can do special things with it. Just include it as a library, and that's all this is. We're just importing, right? We're just importing a class. And that way, when I think about CICD, which is one of the other common things we [00:14:00] talk about in Kafka Streams, how do I do continuous deployment with my stateful microservices?

Which can be very difficult. We want this flexibility. We want the flexibility to be able to operate on things that are tied to Morphir. by themselves. And so we decided this was really the best way to package this up to give us that future flexibility. In closing, let me get back over here. Had a boop.

I had a boop.

What are our next steps? So add K tables. So Morphir has some really nice stateful app representations. We're taking a look at those now, and we're in the process of going, all right. What's the most advantageous way to map a K table, right? What's the most advantageous way to build up state? How many people here ever have to do reconciliation or have to do, like, when they're trying to push to make sure that, we want to, this to enable increased developer confidence.

So you don't take two weeks to have to roll out a change because you're manually pumping data back in, doing all these [00:15:00] things. And we think building up and using that kind of JSON of test cases that Morphir uses, we think we can use that to drive end to end testing. For schema updates, changes, logic, to automate the process of feeling good about the change I made.

It's not going to break my existing deployment. And KTables is part of that because they're stateful. We want to expand our supported methods, obviously. Filters are great but we also need we have, I think we, Matias had a flat map yesterday, but we're going to expand the supported methods that you can export for KStreams.

Abro integration, like I talked about today. Morphir does already support JSON schema, which is pretty awesome. And then, the last one. As I said, there's some improvement that needs to be done with our test topology. And we want to integrate that with that Morphir test JSON file that's automatically created.

So you don't have to try to figure out how to do that. Those test cases also come along for the ride in generation. So that's where we are. And now I am done, and it is question time. Thank you very much. [00:16:00] When is the coming up? What, when is the pro coming up? When is Avro coming up? That's, I love this.

See, if people didn't demand things immediately, it wouldn't be valuable. So that's a check. Thank you. Avro coming up. I, we really wanna be in good shape to be able to actually contribute this to the main morph. If everything looks good by the end of the year, and so Avril might be a stretch by the end of the year, but I'm open to prioritization.

You, everyone here cared enough to come so. I will take that as a flag of importance in features. You didn't have the plus one. Plus one. There you go. Look at that. You offered to help. What? Yeah. You offered to help. That's right, too. Yeah. And I was just going to say Mattia says, we accept KIPs.

We also accept help contributing back to Morphir. We, I would be, anybody who is interested in this, we already, we have a fork. You want to take a look at it? Talk to me after. I'll be, here all day. Any other questions?

You can answer KStreams questions too. I love talking about KStreams. [00:17:00] Yeah?

Oh, sorry, say again?

And so that's very important. So when we talk about that is why the unit that we took, the unit of measurement that we took, was just the processor. Because that intermediate representation depending, we have to be very careful. So map versus map values, for example. So when, K Tables right now, Matthias and I are deciding on exactly how to elegantly represent intermediate topics.

How do, what do we want to do with those? Excellent question. Yeah, I will. Oh, we'll figure it out, man. We're on it. We talk, yeah, and just I'm not gonna, I'm plugging this. We have a new podcast. We're gonna actually talk about a lot of this on our new podcast. It's called The Duchess and the Doctor.

And it's just me and Matthias talking about eventing and streaming. So this is gonna be a topic. That's a great question and maybe I will just add that. I will also, make sure I credit you for that question. Hit me up on LinkedIn or wherever. So that way I, yeah. Awesome. Any other questions?

Oh, [00:18:00] yes.

Large scale, like a, yeah. So that's a great question. I and I think, I don't see like when we talk about like large scale data is what you mean. Like when I'm loading up a stateful K table and it's Yeah. And so more fear itself, right? The way that we want to work that.

And I think stuff and make like you can answer that, Stephen do you guys right now have, and I think you do in some of the examples have pretty large prep tables for some of the test cases that you do, what you're talking about, I think even maybe perhaps is that when I'm getting ready to verify this stuff I, in order to, for that to be solid especially if I got a K table and it's a reference table, it may have 3, 4, 5 million records it might, way be over a terabyte.

Is that kind of what you mean? And so that's one of the things that Matthias and I are thinking about right now is one, what do you need in order to load up that rep, because [00:19:00] again, how are you supposed to test it if you don't have the referential data to get in? So we are absolutely a hundred percent thinking about that.

I have not explored that yet to the point where I can give you a frank opinion, which is what I'm using. If I can't give you one, I'll tell you that. But definitely on our radar because you're 110 percent right. We have, K stream state stores that are way over a terabyte sharded. So we are that we are thinking about that for sure.

Any other questions?

So we really like, Matias and I are thinking, first, whatever, what are our stateless ones, right? Off the bat. I can't wait for windows. I'm a huge fan of windowing. I think, as soon as you have with eventing, the amount of data is such that usually when you start doing stateful, the meaningful value comes from a window.

So windowing is one of those aggregates transport we're picking and choosing, but if anybody has favorites and they're like, look, I, and they're [00:20:00] committed to helping us do this. We would love, we are 110 percent open to participation as bribes.

Do you have

a huge keyspace, right? That's, and I always make this point to people, it's not called streams. It's called Kafka streams because it's tied to Kafka, right? There are great things about that. And then there are not so great things depending on your use case. And keyspace is one of them. My, my guess is that your keyspace is large enough.

Where you need to decouple partitioning and scaling and sharding from Kafka. Is that probably why you guys use Flink, or? Yeah.

Yeah, and the great thing about Morphir is that you just put, if you want to, add Flink. Dash T Flink. So it's not, this is not tied inherently. Morphir's intrinsic value is not tied to Kafka Streams. It's tied to the fact that you can do Flink, you can do KStreams, you can do whatever you want.

So flat map window.[00:21:00]

So yeah, I think, obviously I think we're flat map windowing is one of the ones that I really want transform. Those are really the table stake ones. And what we really do is we have a little, we kinda have a little kind of competition at comp. We get a lot of topologies, right?

And so we do have some statistics on like the most used. And fin service, Fintech and Fin service. They use a lot of K streams. And so they'll probably weigh in heavily on, on which ones we do. If you have any extra, if you have anything that you're, anyone here says, no, I need that one mapped first, please.

We're open to, yeah. Yeah.

Any other questions? All right. Awesome sauce. Thank you everyone for coming.