How to set up Apache Kafka locally?

How to set up Apache Kafka locally?

Apache Kafka is a distributed streaming platform that is similar to a message queue or enterprise messaging system which is a distributed commit log. In this post, I’ll tell you how to set up Apache Kafka locally.

A streaming platform is capable of following:

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
  • Store streams of records in a fault-tolerant durable way.
  • Process streams of records as they occur.

To get a basic understanding of Kafka, please refer to the introduction article as this article assumes that reader has basic theoretical knowledge about Kafka broker, producer and consumer in order to concentrate on the setup process.

I’m using Mac that has macOS High Sierra (Version 10.13.5), built-in Terminal as my development environment. Make sure you run the commands mentioned below in each step in a separate Terminal/Shell window and keep it running.

Step 1: Download Kafka and extract it on the local machine

Download Kafka from this link. This link is to download 1.1.0 version as this is the latest version right now.
Extract the tgz file either by double-click or by running the command in terminal :

tar -xzf kafka_2.11-1.1.0.tgz

Using the terminal or finder, navigate to /usr/local/ locally and create a folder called Kafka.
Next, move the extracted content to the Kafka folder so that you have Kafka installed on the local machine and you can use it from anywhere.

Below is the high-level overview of the extracted Kafka source:

  • bin – This folder has shell/utility files that are meant to be used to start/stop servers (Kafka, ZooKeeper), work with Kafka topics (create, alter, delete), Kafka producers/consumers, etc. There is a windows folder that has the same applications as a .bat file for windows users.
  • config – This folder has *.properties configuration files that have default values ready to be used with shell files in the bin folder. It is easy to customise or extend the configurations depending on the requirements.
  • libs – All the *.jar files that are required to run Kafka is available in this folder.
  • site-docs – This folder can give access to the Kafka documentation that is in zipped form, unzip it to open it.

Step 2: Start the Kafka Server

First, make sure you are in the Kafka folder in terminal/bash.
Kafka uses ZooKeeper which is already present in the bin as a package, ready to be used.

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
ZooKeeper

Start ZooKeeper server by using shell and default properties:
bin/zookeeper-server-start.sh config/zookeeper.properties

Once a ZooKeeper instance is up and running, it will be responsible to have all the metadata information about the Kafka cluster (brokers), etc. The default port of the instance is 2181.

Note: If you get the error message below, please ensure JAVA_HOME is set to 1.8.xyz version.

To resolve the error above, look JDK version on your machine and modify the command below:
export JAVA_HOME='/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home'

Next, Start Kafka Server by using shell and default properties:
bin/kafka-server-start.sh config/server.properties

Once a Kafka server instance is up and running, it can talk to the ZooKeeper which is easily possible due to default ZooKeeper configuration (port – 2181) present in the server.properties. The default port where Kafka server runs is 9092.

Step 3: Create a Topic

Think about the Kafka topic as a virtual container to logically group the messages. So, a broker can have many topics.

Create a topic named test-topic-1 using the command below:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test-topic-1

kafka-topics.sh allows to create, alter a topic using --create and --alter flags respectively. A ZooKeeper server instance reference is required so that the topic can be created on available Kafka broker using --zookeeper and localhost:2181. The name of the topic above is “test-topic-1” which is used with --topic flag.

We only have one broker running so the replication factor is set to 1 above. To create a distributed environment with more than one broker, the --replication-factor flag can be used. Also, a topic can have many partitions, this basically depends on how we would like to record/save messages. Think about a partition as a log file which is generally in the following format: /tmp/test-topic-1-0 which follows following pattern – /{log-path}/{topic-name}-{partition id} which partition id is numeric, starting from 0.

Get a list of topics:
bin/kafka-topics.sh --list --zookeeper localhost:2181

This basically lists all the topics present in the broker(s).

Step 4: Send some messages

The Kafka source provides a console based producer application that can be used to send messages using a console program. The command below just needs a broker instance reference where it is running and a topic name where the messages will be sent.

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic-1

Next, write messages in the console and hit enter to send messages to the Kafka broker.

With the old version of Kafka:
Instead of broker-list flag, there will be --zookeeper flag with its address.

Step 5: Start a consumer

The Kafka source also provides a console based consumer application that can be used to read messages. The command below just needs a broker instance reference where it is running, a topic name where the messages will be read from and an offset – from-beginning or a number. Think about offset as a bookmark that a book reader uses inside a book. That bookmark is used as a location of a page that is already read.

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic-1 --from-beginning

Final Result

So, we now have the following instances running – ZooKeeper, Kafka, Console Producer, Console Consumer with a topic named “test-topic-1” where the messages sent from the producer will be received by the consumer application.

If you would like to learn more about Apache Kafka, please subscribe to my blog as I’ll be writing more how-to articles very soon.

Siddharth Pandey

Siddharth Pandey is a Software Engineer with thorough hands-on commercial experience & exposure to building enterprise applications using Agile methodologies. Siddharth specializes in building, managing on-premise, cloud based real-time standard, single page web applications (SPAs). He has successfully delivered applications in health-care, finance, insurance, e-commerce sectors for major brands in the UK. Other than programming, he also has experience of managing teams, trainer, actively contributing to the IT community by sharing his knowledge using Stack Overflow, personal website & video tutorials.

You may also like...

Advertisment ad adsense adlogger