106 total views
Today we will look into a basic introduction of Apache Kafka and its various entities and terminologies.
Apache Kafka is a distributed messaging system that follows the publisher-subscriber model.
A messaging system facilitates the transfer of data from one application to another.
Hence this makes application to focus more on their data and functionalities instead of worrying about data transmission.
Generally, Messaging systems use the Queue Data structure.
As Kafka works on the publisher-subscriber model, it has a publisher aka Producer that generates the record on the Kafka Topic and a subscriber aka Consumer that processes that data.
Before deep-diving into Apache Kafka and working on the first Kafka project, a user must have a basic understanding of Kafka entities.
If you work on Database also check my blog on 3 ways of database connection.
Let’s look into some of the Kafka entities below:
Also known as Kafka server/node.
All these terms are the same and used by developers interchangeably.
A Kafka broker is responsible for redirecting messages into and out of the Topic.
You can think of it as a message broker that redirects the incoming messages from a producer to the correct topic and from topic to the correct consumer.
It’s a collection of one or more Kafka brokers.
A Topic is a Logical named entity where messages are being published.
You can also look into it as a stream of records ready to be consumed by a consumer.
Several consumers can subscribe to a single Topic by its name.
Firstly, A Partition is the storage unit of Kafka.
Secondly, as per the official definition, It can be viewed as an ordered, immutable sequence of records that is continually appended to—a structured commit log.
Kafka assigns a unique sequential Id to each record of a partition called offset.
In addition also see the below image, Partition belonging to a single Topic can be distributed onto different Kafka brokers.
Apache Kafka Producer
The Producer is an entity that publishes the record into a Kafka Topic.
Any application can generate data and records in any language and use Kafka’s Producer API to publish it on a specific Topic.
For example, A Java client can publish data on a daily basis to some topic.
Apache Kafka Consumer
The Consumer is an entity that consumes the record from a Kafka Topic.
Third parties use Kafka’s Consumer API to subscribe and consume data from the Topic.
For example, A CRON script can fetch data from a topic and insert it into a DB.
Kafka does not delete messages from the Topic instantly after the message consumption.
Retention Policy basically means for how much time can data reside into a Kafka Topic.
This time is configurable using Kafka config files.
The replication factor means How many nodes have a copy of a single partition.
Therefore the Replication factor tells about the fault tolerance capability of a system.
For a topic with replication factor N, Kafka will tolerate up to N-1 server failures without losing any records committed to the log.
In conclusion, I would say this is just a basic introduction of Apache Kafka like the tip of an iceberg. As it is a huge topic and has a lot to cover like its implementation, use-cases and etc.
This blog mainly gives you a basic idea of the Kafka components and I hope to cover other related topics in the future.