Achieve High performant API through Kafka Stream Interactive Queries

In the typical world of web applications, where you expose an endpoint to serve the data for a given input. This happens with few steps, like validating the input, apply security filters, enrich the request, call the database, and return the data. Here request has to travel from the webserver all the way to the database where it is hosted on a separate VM.

With this implementation there are so many hiccups can occur, and we can go only vertical scaling than horizontal scaling. As far as I know, most of the enterprises have now migrated to the cloud from monolith architecture. You can easily create the instance, and destroy the instance without having many dependencies on the infrastructure provisioning teams.

Instead of routing the request in all these layers, why can't we just query the application rather than making a connection to the database and querying the data. If you get more consumers, then more calls to the application and more traffic to the database. Of course, you can maintain the cache for faster response times, but unfortunately when the cache miss-hit then you have to make a call to the database.

If your application has many users and you want to serve the data with minimum latency, of course, you can also stream the data, then you may try to utilize the Kafka streams interactive query. We will see how this works, and its benefits, and its downsides.

Requirement: You have to build a Rest API which accepts the stock symbol and gives back the stock price details, which were supplied by several exchanges/sources.

Now you have ~50K+ users who can call this API and you have to keep up with the user's demand and have to maintain 5 nines availability. For simplicity, you can assume you want to serve the data for thousands of users concurrently. In this use case, if you go with a database and traditional approach, then you may end up investing your time in evaluating the database, sessions, queries, JVM health.

We will see how this can be achieved with Kafka stream interactive queries. First, feed the asset price data from the exchange to the topic which has 30 partitions, and the key should be the stock symbol. Here you have to decide the data model for your data and also it should be evenly distributed across the partitions. If you plan to deploy to multiple instances you will have the option to deploy up to 30 instances and each instance will own one partition of data.

Create the Kafka Stream with the materializing into the state stores. You need to provide the Serdes for the Key and Values, Streams will build the stores. Under the hood, it is maintained by Kafka stream threads using rocksDb. Create an endpoint that accepts the stock symbol, here is our key.

Call the Streams Metadata, and supply the key, it returns the hostname which is owning the data for the key. Basically, it's a reverse proxy. Just compare whether it is local to the instance or not, If No, then make a call to that instance, and return back. You cannot have a reverse proxy service before hitting the instances because you cannot expose the Kafka stream beans outside the container or instance.

Consumer threads will consume the data from the asset price topic and they will put into the rocksdb data store. All you need to issue get() for that key.

Graphical Representation from Kafka Documentation:

Assume the client will hit the endpoint to any one of the application instances. For example, the client requested data for key 103, and the request went to Instance 1, and Instance 1 will fetch the stream's metadata for the key, by supplying the key and serializer, it gives back, the hostname in our case instance 3 hostname. Internally this makes a call to the group coordinator and it returns back the consumer details. You will get the hostname and port for Instance 3. Make a call to that instance and get the data.

The below rough image tells you how this works.

The single instance is holding 3 partitions.

Now we will see the Good, Bad, and Ugly of this design.

Good things:

  1. Horizontal scaling, You can scale out instances up to the partition count.
  2. If you are in a PCF environment, then use the common route to route the traffic in a round-robin fashion. So every instance is equaling load balanced. Implement container to container networking much faster.
  3. API calls will be distributed and every instance will independently serve the data. Data sharding.
  4. The instance disk is ephemeral, so the stream processed data is backed up by the changelog topic. So if any instance goes down, the other instances will fire up and serve the data from stand-by state stores(If you enabled standby ). There won't be any data loss. High Availability.
  5. You have designed the data by Key-Value, so, rocksdb will maintain the latest data for the key, so you will always get the latest data once it is consumed. Spend sufficient time designing the key and payload.
  6. Data is local, you don't need to rely on any network between the application and data, and it's fast enough, because of rocksdb high-performance data storage engine.
  7. Switch to spring reactive web, or gRPC implementation of this data fetch, if you are having a bottleneck on the tomcat server, where each HTTP request requires a separate thread.
  8. Use a faster disk, that you get the best performance out of I/O.
  9. Maintain a cache, in local before evicting to disk. This will boost the read requests, as the data resides in the memtable.

Bad things:

  1. Restrict yourself to a plain message payload object which has a minimum message payload. The more complex the payload, the more time it takes to store the data, remember you are dealing with Kafka topic data where you have to serialize and deserialize overhead.
  2. If you are in a cloud environment, and you don't have enough instance disk storage, then this will not suit your infrastructure. Streams works on local data.
  3. If you don't have, high performant disk, then expect the lag in processing.
  4. Deploy more than 1 application, and in multiple regions, so you can have at least one app if others go down in case of any external failure.


  1. In a multi-instance implementation, If one node goes, down, Kafka will issues Group rebalance, among the consumers, If the stream is in a rebalancing state, then you cannot serve the data, nor consume the data. It's a stop-the-world scenario.
  2. Be careful with application or instance native memory usage, because Kafka producer threads, send the data to the cluster, to publish to changelog topic. You will get the OOM from native, and it's real pain to figure out what causing the issue.
  3. If you have an HDD, or If you are storing the data into an NFS drive, where the disk is located in a remote mount path, then this design will not give you 100% performance.
  4. If you have a write-heavy and read-less use case then you have to tune the rocksdb configuration, and there is no foolproof configuration for the rocksDB, its all depends on your infrastructure.
  5. If the disk is not optimal then rocksDB compaction is slow, and it will hurt the consumption of data. In between, if you get the read request then you will return the old data.

The code template for this design is available in github, clone it and you can use this for your use case, You just need to change the config parameters.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store