Metamorphosis of an e-commerce system by Kafka

Dina Bogdan
5 min readJun 7, 2021

I have written in the past about some lessons learned about building microservices and reactive systems.

Also I have posted some patterns that can be leveraged for decoupling components of a system.

Starting with this article I will continue with showing a concrete scenario where some of the principles that I have written about have been applied. Also, I will add some code snippets for showing how to apply those patterns by yourself.

Scenario

Let’s imagine that we are facing a legacy e-commerce system that is too coupled in a synchronous way to a high number of external components. Due to this high coupling degree, our system can face multiple issues like:

  • hard to be scaled when needed
  • low performance under high load
  • unavailability produced by the unavailability of an external service
  • hard to be maintained due to coordinated deployments

One concrete example is displaying the page for a product to an authenticated user. The following diagram shows how the requests are being performed in this case.

Displaying a product page for an authenticated user

Note that there are five requests that must be performed, but we should count only four sequential requests. Still, due to the fact that requests are performed in a synchronous way, the entire system can’t be easily scaled and it has lots of problems.

Solution

A possible solution for the above-described scenario will be presented in the following. It was inspired by some design principles coming from building data-intensive applications. Some more in-depth resources for learning about these can be:

The main idea of the presented solution is to transform and aggregate all the master data items into a single derived item which can be subsequently stored into a data source and queried when needed.

Since we have used Kafka ecosystem for solving this issue, the final solution will be a Kafka Streams topology along with a couple of Kafka Connect components which will pull the data from the data sources via various implementation flavors of Outbox Pattern or Change Data Capture and publish it to Kafka in order to be ingested in the streaming application.

Kafka Streams Topology

The streaming topology will join the data present across all the Kafka topics by using the product id as partitioning key. The result of all the joins will be published into a Kafka topic and subsequently, it can be written into a database. This data aggregate that is being produced will be called SaleProduct.

We can build a new service (REST API, GraphQL API) on top of the SaleProduct database. The frontend application(s) can query this new endpoint and this one can serve the product as a response without any intermediary processing.

SaleProduct API/Service

Conclusions

In conclusion, let’s revisit the above-enumerated issues that our system was facing and see if any of them is still present:

  • hard to be scaled when needed” — in the current solution the components are no longer hard to be scaled, as Kafka topology can be easily scaled by adding partitions. We can choose a specific database technology for our new SaleProduct datastore, that can be also easily scaled (eg: Cassandra database). Also, the SaleProduct API is a simple stateless service that can be scaled in a horizontal way.
  • low performance under high load” — the performance under high load will be drastically improved by replacing all the synchronous HTTP requests for aggregating the data with a single query from the “local” database. More, the query will perform no joins and the data can be stored directly in the desired representation (eg: JSON, XML, etc.).
  • unavailability produced by the unavailability of an external service” — if in the previous scenario the unavailability of an external component would have been produced the unavailability of the so-called “Product Service”, in the current solution this is avoided since we have the data already stored in our “local” data source. The single-component that must stay available must be the database itself.
  • hard to be maintained due to coordinated deployments” — this is no longer a problem since we are no longer spatial or temporal coupled to any external service.

Still, I must say that this is not a bulletproof solution and has two major disadvantages:

  • Facing Eventual Consistency — if no one noticed yet that along with this article I have basically described how and why we choose A (Availability) and P (Partition Tolerance) from the well-known CAP Theorem, I will say it now: with this approach, we must be ready to face Eventual Consistency!
  • Data governance — usually, in a big and complex enterprise it will be very hard to use (retrieve and derive) master data and building derived data silos like it was presented in the current solution. Some concepts about embracing this kind of approach can be seen by looking into Zhamak Dehghani’s Data Mesh.

Next steps

In the upcoming articles, I will try to show some concrete flavors of implementations of Change Data Capture or Outbox patterns by using Kafka Connect.

Thank you for reading! There are more to come!

Follow me on twitter.

--

--

Dina Bogdan

Living in a reactive, full of actors, distributed system.