Achieve High performant API through Kafka Stream Interactive Queries

In the typical world of web applications, where you expose an endpoint to serve the data for a given input. This happens with few steps, like validating the input, apply security filters, enrich the request, call the database, and return the data. Here request has to travel from the webserver all the way to the database where it is hosted on a separate VM.

With this implementation there are so many hiccups can occur, and we can go only vertical scaling than horizontal scaling. As far as I know, most of the enterprises have now migrated to the cloud from monolith architecture. You can easily create the instance, and destroy the instance without having many dependencies on the infrastructure provisioning teams.

Instead of routing the request in all these layers, why can't we just query the application rather than making a connection to the database and querying the data. If you get more consumers, then more calls to the application and more traffic to the database. Of course, you can maintain the cache for faster response times, but unfortunately when the cache miss-hit then you have to make a call to the database.

If your application has many users and you want to serve the data with minimum latency, of course, you can also stream the data, then you may try to utilize the Kafka streams interactive query. We will see how this works, and its benefits, and its downsides.

Requirement: You have to build a Rest API which accepts the stock symbol and gives back the stock price details, which were supplied by several exchanges/sources.

Now you have ~50K+ users who can call this API and you have to keep up with the user's demand and have to maintain 5 nines availability. For simplicity, you can assume you want to serve the data for thousands of users concurrently. In this use case, if you go with a database and traditional approach, then you may end up investing your time in evaluating the database, sessions, queries, JVM health.

We will see how this can be achieved with Kafka stream interactive queries. First, feed the asset price data from the exchange to the topic which has 30 partitions, and the key should be the stock symbol. Here you have to decide the data model for your data and also it should be evenly distributed across the partitions. If you plan to deploy to multiple instances you will have the option to deploy up to 30 instances and each instance will own one partition of data.

Create the Kafka Stream with the materializing into the state stores. You need to provide the Serdes for the Key and Values, Streams will build the stores. Under the hood, it is maintained by Kafka stream threads using rocksDb. Create an endpoint that accepts the stock symbol, here is our key.

Call the Streams Metadata, and supply the key, it returns the hostname which is owning the data for the key. Basically, it's a reverse proxy. Just compare whether it is local to the instance or not, If No, then make a call to that instance, and return back. You cannot have a reverse proxy service before hitting the instances because you cannot expose the Kafka stream beans outside the container or instance.

Consumer threads will consume the data from the asset price topic and they will put into the rocksdb data store. All you need to issue get() for that key.

Graphical Representation from Kafka Documentation:

Assume the client will hit the endpoint to any one of the application instances. For example, the client requested data for key 103, and the request went to Instance 1, and Instance 1 will fetch the stream's metadata for the key, by supplying the key and serializer, it gives back, the hostname in our case instance 3 hostname. Internally this makes a call to the group coordinator and it returns back the consumer details. You will get the hostname and port for Instance 3. Make a call to that instance and get the data.

The below rough image tells you how this works.

The single instance is holding 3 partitions.

Now we will see the Good, Bad, and Ugly of this design.

Good things:

Bad things:


The code template for this design is available in github, clone it and you can use this for your use case, You just need to change the config parameters.