Zookeeper - Introduction

Image result for apache zookeeper


Zookeeper is one of the famous apache's projects which is used to provide synchronized services across the servers i.e. it's a centralized infrastructure. It is very hard to manage and coordinate the different cluster at a time but zookeeper with its advanced API with the simple Architecture solves this issues. With the help of zookeeper, we can focus more on development rather than managing the clusters. 

Zookeeper does this by creating a file in its server known as znode, which resides in the memory of zookeeper. This node can be updated by any nodes in the clusters to update their status. This updated status can be gained by other nodes so that they can change their behaviors to provide the perfect services. 

Topics and partitions in Apache Kafka [Basic Topics]

Image result for apache Kafka

What are topics and partition in Kafka?
Topics are nothing but a particular stream of data. It is just like the table's in the database except without all the constraints. We can have as many topics as we want. Like in the database each topic is identified by its name. 
Similarly, topics are splits into partitions and each partition is ordered. Each message in the partition gets an incrementing id which is called offset.

Let us suppose we have a topic T which has a partition let P0 be one of them

step 1: Partition0 {initially it is empty}
step 2: Partition0 0 {here when we write a message to this then that message will have offset 0}
step 3: Partition0 0 1 {when we write another message to it then that message will have offset 1}
step 4: Partition0  0 1 2 and so on {just we described earlier in an incremental order}
and so on...

Note:: Offsets increase from 0 to n as we write data

Each partition will have their own offsets 

Partition0: 0 1 2 3 4 5 6 7  {here the partition goes from 0 to 7 }
Partition1: 0 1 2 3 4 5 {here the partiotion goes from 0 to 5}
Partition2: 0 1 2 3 4 5 6 7 8 {here the partition goes from 0 to 8}

so here the combination of these partitions[Partition0, Partition1, partition2] is called a TOPIC.

Things to keep in mind
  1. Offsets in one partition don't mean anything to other partition, Eg: offset 2 in Partition1 will be the same with offset 2 in other partitions.
  2. Orders are guaranteed only within the partitions. 
  3. Data in the partition are limited to the specific time. {default 2 weeks}
  4. Once the data is written in the partition it cannot be unchanged i.e. it is immutable.
  5. We push data to the topic, not the partitions.
  6. Unless we provide the key data is randomly assigned to the partitions.
  7. We can have as many partitions on the topic we want.
Happy Coding...