These are some rough notes while I learn Kafka, this document will gradually grow and refine as I start adding more to it.
Architecture
- Pub sub messaging system.
- In pub sub you have publishers (producers) and subscribers (consumers) of messages.
- Producers and consumers send messages to a specific location called Topic. (Grouping or collection of topics)
- Topics can be defined up front or on demand.
- Producers know the topic name and have permission to send to it.
- Consumers have permissions to read topics that it’s interested in.
- Messages and topics are stored in Broker.
- The broker is the daemon process that runs on a machine or a container.
- Broker has access to file system that it uses to store topics and messages.
Scaling
- Kafka can scale horizontally by increasing the number of brokers.
- Cluster - Grouping of Kafka brokers.
Distributed systems
- Controller, worker, tasks.
- Controller => Leader => Peers
- Replication factor - Ability to protect against loss.
- Work - Brokers receive messages, categorize them into topics and reliably persisting them for eventual retrieval.
- The effort to handle messages from the producer is substantially less than what is required by the consumer.
Distributed systems - Communication and Consensus
- Worker node membership and naming
- Configuration management
- Leader election
- Health status
The Role of Zookeeper
- Configuration information
- Health status
- Group membership
- Distributed system consisting of multiple nodes in “ensemble” aka cluster.
- Apache Zookeeper provides Kafka with the metadata it needs.
- Brokers registration, with heartbeats mechanism to keep the list current.
- Maintaining a list of topics alongside
- Their configuration (partitions, replication factor, additional configurations)
- The list of ISRs (in sync replicas for partitions)
- Performing leader elections in case some brokers go down.
- Storing the Kafka cluster id (randomly created 1st startup of the cluster)
- Storing ACLs
- Topics
- Consumer Groups
- Users
- Quotas configuration if enabled.
What is the purpose of the controller?
- One broker in the cluster acts as a controller
- Monitor the liveness of brokers
- Elect new leaders on broker failure.
- Communicate new leaders to brokers.
I suppose this is similar to the concept of a leader with Consul?
All brokers talk to Zookeeper and let them know who the controller is.
Get the current controller with:
zookeeper-shell 172.31.29.100 get /controller
Ran into this issue this morning:
Kafka cached zkVersion not equal to that in zookeeper broker not recovering
Ended up just reloading the controller broker returned from get /controller
and that seemed to restore order to things.
A Deep Dive into Kafka Controller
Topic - a particular stream of data.
- Similar to a database table without all the constraints.
- You can have as many topics as you want.
- A topic is identified by it’s name.
Topics are split into partitions.
- Each partition is ordered.
- Each message in a partition gets an incremental id called the offset.
- Offset only have meaning for a specific partition. Eg. Offset 3 in partition 0 doesn’t have the same offset 3 in partition 1
- Order is only guaranteed within a single partition not across partitions.
- Data is kept for a limited time, default 1 week.
- Once the data is written to a partition it cannot be changed. Immutable.
- You can have as many partitions per topic as you want.
- When you push data you don’t a partition, you push it to the topic, the partition that that data is assigned to is random unless you provide a key.
- You can have as many partitions as you want for a topic. The more partitions the more parallelism.
Data expiration
After a period of 1 week what happens to the data? The message is deleted from the offset example 10 in a partition and it’s never revisited again.
Brokers and data replication
- Broker is same as “server”
- Each server is identified with id (int)
- Each broker has certain topics and partitions.
- After connecting to any broker you’ll be connected to the entire cluster.
- A good number of brokers to get started with is 3.
Topic replication factor.
- Topics should have a replication factor > 1. (Usually between 2 and 3)
- This way if a single broker is down another broker can have the data.
- Example: Topic with 2 partitions and replication factor of 2. Visually this looks like 4 partitions in total. The two original partitions + two additional replicas.
Concept of a leader for a partition
- At any time only 1 broker can be a leader for a given partition. Only that broker can receive and serve data for partition.
- The other brokers will synchronize the data.
- Each partition has one leader and multiple ISR (In sync replicas) - In sync means it copies really fast from the leader and not too far behind.
Producers
Producers can choose to receive acknowledgement of data writes.
- ACKs = 0 - similar to UDP - potential for data loss.
- ACKs = 1 - wait for leader ACK - limited data loss
- ACKs = all - wait for leader + replicas to ACK - There is zero data loss.
Producers message keys
- Producers can choose to send key with message.
- If a key is sent then the producer has a guarantee that all messages with that key will always go to the same partition.
Consumers
- A consumer only needs the name of the topic he/she wishes to connect to and a single broker IP to consume data.
- Data is consumed in parallel from multiple partitions but in order.
- Need to go into consumer and consumer groups.
Zookeeper
- Zookeeper manages brokers (keeps a list of them)
- Zookeeper helps manage leader elections for partitions.
- Zookeeper sends notifications to Kafka in case of changes (New topic, broker dies, broker comes up, delete topics etc.)
- Kafka can’t work without Zookeeper.
- Zookeeper usually operates in an odd quorum (2n - 1)
- Zookeeper has the concept of 1 leader and everyone else being a follower.
Kafka Guarantees
- Messages are appended to topic-partition in the order they are sent.
- Consumers read messages in the order stored in topic-partition.
- With a replication factor of N, producers and consumers can tolerate up to N-1 brokers being down.
- This is why a replication factor of 3 is a good idea.
- Allows for one broker to be taken down for maintenance.
- Allows for another broker to be taken down unexpectedly.
- As long as the number of partitions remains constant for a topic (no new partitions), the same key will always go to the same partition.
Delivery semantics - At least once, at most once, exactly once.
- Consumers choose when to commit offsets.
- At most once - offsets are committed as soon as the message is received. If the processing goes wrong, the message will be lost. It won’t be read again.
- At least once - offsets are committed after the message is processed. If the processing goes wrong, the message will be read again. This can result in duplicate processing of messages. Make sure your processing is idempotent - That is that re-processing the same message won’t impact your systems.
- Exactly once - Very difficult to achieve / needs strong engineering.
Creating your first topic
kafka-topics --create --topic foo --partitions 1 --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181
Describing topic
kafka-topics --describe --topic foo --zookeeper zookeeper:2181
Get a count of all the broker ids from ZK.
zookeeper-shell localhost:2181 <<< "ls /brokers/ids"
Check health of zookeeper
echo "ruok" | nc -v localhost 2181
comments powered by Disqus