Kafka
Kafka
Distributed log log as DB
“Confluent” built on top of kafka -> more commercial offering
Real time event processing
One at a time
Message queue
Terms
Producer/consumer/connector/stream processors
Consumers pull messages from the partitions
A topic is split into partitions (replication/availability)
In general working with kafka, the producer and consumer API need to be aware of partitions (if you care about ordered delivery)
broker is a node
Zookeeper consensus mechanism of the distributed system
partitions
(num) partitions = num of parallel processes
The topic doesn’t exist in any one place The leader is the leader of a partition (not a topic)
partitions tend to be replicating, but it’s purely for backup. Like RAID,
Kafka does in order message delivery by partition
Stream processors give guarantees on the producer/consumers
- a stream consumer will keep state (what message I’m on) back as a kafka topic so it can be killed/replaced
Kafka Stream or Apache Storm
You can build anything on top of kakfa
- db
- message queue
- stream processing