How to build a custom Kafka Producer application?

How to build a custom Kafka Producer application?

Recently I’ve published a few how-to topics related to Apache Kafka. In this post, I’ll tell you how to build a custom Kafka Producer application which will help to take more control of how to send messages to the Kafka broker. I’ll strongly recommend reading my previous posts in order to understand this post easily.

Let me start with a high-level overview of Kafka producer application. It is just another application that is capable of sending messages to a topic in a Kafka cluster. This message can be anything, for e.g. a business event such as end-user orders a product in an e-commerce application.

The source code is available on GitHub. Feel free to clone or contribute to the repository.

Development Environment

In order to build a Kafka producer or consumer application, it is important to set up the development environment.

An application will need Kafka client dependency which is basically the Kafka APIs that can be used to interact with the Kafka cluster and broker(s). There are many Kafka client packages available and which one to choose entirely depends on the project requirements, the team’s preference for a language/framework, etc.

I’m using Java-based Kafka client for this post. Please feel free to try any other client, the APIs are similar anyway.

Prerequisites:

  • Integrated Development Environment – IntelliJ Idea Community or Ultimate version or Eclipse. I’m using IntelliJ’s community version of this example.
  • Java 8 SDK
  • Maven Dependency Manager installed and plugged into the IDE
  • A running Kafka cluster with a minimum one Kafka broker and two topics – “kafka-training”, “kafka-training-2”. Please refer my post if you would like to learn how to set up Apache Kafka locally.

Project setup in IDE

Open IntelliJ IDE, click on create a new project, choose Maven with the correct Java SDK version (1.8 on my machine) and provide the required information like group-id, artifact-id, project name and location. The idea is to create a simple console based producer application.

Add Kafka client to the project

Open app’s pom.xml and add Kafka client as a dependency so that Maven can import the dependency in the project. Save this file and look at the External Libraries in the Project Explorer, a new library package folder will be added – kafka-clients:1.1.1.jar in my case.

Explore the folder named org.apache.kafka that contains the client API classes that will be required to create producer or consumer applications.

Kafka Producer

With the help of Kafka Producer API, it is really easy to create an instance of a producer in the application. There are many configuration properties available for customisation based on the requirements. However, the following are the three mandatory properties (key-value pair) required to create an instance of KafkaProducer class:

  • bootstrap.servers – This is a collection of Kafka broker addresses so that producer can connect to the first available broker. On successful connection with a broker, it comes to know about partition owners and leaders in order to send messages. The best practice is to provide the address for more than one broker.
  • key.serializer and value.serializer – As such a producer’s job is to send messages, it needs to serialise the message in order to optimise the size of the message for network transmission, for storage and compression. There are many types of serialisers available out of the box such as string, JSON, etc. It is also easy to write a custom one based on the requirements.

Let’s look at the producer code below:

KafkaProducer is a class type that is available in the Kafka client Producer APIs. As explained above, the three most important properties are then defined on line no. 13-15.

Line no. 20, define a KafkaProducer with the help of key-value pair properties which is used internally by the producer to define ProducerConfig class object. In this example, a string serializer is being used, but there are many other options available – e.g. JSON serializer.

The main code is surrounded by try-catch-finally block as instantiation of a Kafka producer or sending messages may result in a run-time exception. In this case, it is important to log the exception and to close the producer connection using the finally block.

Line no. 22-25 is basically running a loop, and the producer is sending messages to 2 topics named – “kafka-training” and “kafka-training-2”. The message is wrapped in a class typed known as ProducerRecord available in the Kafka client.

In order for this code to work, it is important to have Kafka server running (9092 port on my machine) with two topics defined as mentioned above.

Read via Kafka Consumer

Refer to my previous post to start a few console based consumers in order to see the producer in action.

Read the rest of the article, if you are interested to know the internal processing details in Kafka, otherwise, try to run the code from GitHub in order to see the producer, consumer and server in action.

Process detail while sending the message

Logically, the processing of a producer sending the message can be imagined in 2 parts:

  • The producer first talks to the Kafka cluster and gets the detail information in the form of metadata that will have details like the leader, partitions, etc. and this is stored as a Metadata object for the lifetime of the producer.
  • Next, the producer basically uses the configured serializer to serialize the message and use the partitioner detail – gained from the first process to send the message. Behind the scenes, the producer uses the partitioning strategy of how to send messages i.e. direct to a partition or using a round-robin (evenly distribute the message in available partitions of a topic) or key mod-hash or any custom registered strategy to the Kafka server/cluster.

Micro-batching process

Next, with the help of default partitioner or a custom one, the messages are then sent to the record-accumulator. This is done to optimize the process with the help of micro-batching in Apache Kafka.

The internal optimizer basically tries to create small, fast batches of messages while sending (Producer), writing (Broker) and reading (Consumer) the messages.

The record-accumulator handles record batch for each topic partition that a producer needs to talk to and then batch the messages at a small-scale. There are many configurations that come into the consideration to decide when to release the batch – e.g. batch.size (bytes), buffer.memory (bytes), max.block.ms, etc. Please refer to the documentation to customise the configuration.

Delivery Guarantees

Based on the initial configuration, a producer can decide what type of acknowledgement it would like to get after sending the messages. There are the following integer options:

  • 0 – fire and forget
  • 1 – leader acknowledged
  • 2 – replication quorum acknowledged

There is retry configuration that a producer can retry to send messages when the broker responds with an error. E.g. “retry.backoff.ms” configuration can be used to determine no. of milliseconds to wait before another retry process takes place.

An important point to remember is the message order is only maintained in a partition but there is no global order across partitions.

To play around with the global order point above, create a topic with more than one replication factor and maybe a few topic partitions. The consumers listening to the topic will not maintain the order.

Although this producer application is quite simple, the real-world producer application may fetch data from a source and then use the Kafka Producer to send messages to Kafka Cluster(s). I hope this post is easy to understand the basic concepts which can help to write robust producer applications.

If you would like to learn more about Apache Kafka, please subscribe to my blog as I’ll be writing more how-to articles very soon.

 

 

 

 

Siddharth Pandey

Siddharth Pandey is a Software Engineer with thorough hands-on commercial experience & exposure to building enterprise applications using Agile methodologies. Siddharth specializes in building, managing on-premise, cloud based real-time standard, single page web applications (SPAs). He has successfully delivered applications in health-care, finance, insurance, e-commerce sectors for major brands in the UK. Other than programming, he also has experience of managing teams, trainer, actively contributing to the IT community by sharing his knowledge using Stack Overflow, personal website & video tutorials.

You may also like...

Advertisment ad adsense adlogger