[ANN] Bottled Water: PostgreSQL to Kafka replication

2015-04-23 Thread Martin Kleppmann
Hi Kafka users, I'd like to announce a new open source project, called "Bottled Water", for getting data from PostgreSQL into Kafka: http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/ https://github.com/confluentinc/bottledwater-pg/ Bottled Water com

Log segment deletion

2018-01-29 Thread Martin Kleppmann
Hi all, We are debugging an issue with a Kafka Streams application that is producing incorrect output. The application is a simple group-by on a key, and then count. As expected, the application creates a repartitioning topic for the group-by stage. The problem appears to be that messages are g

Re: Log segment deletion

2018-01-29 Thread Martin Kleppmann
n the second stage of the streams app. :-( Is log.message.timestamp.type=LogAppendTime the best way of avoiding this problem? Thanks, Martin > On 29 Jan 2018, at 15:44, Martin Kleppmann wrote: > > Hi all, > > We are debugging an issue with a Kafka Streams application that is pr

Re: Log segment deletion

2018-01-30 Thread Martin Kleppmann
to close this gap. As of now, your > walk-around solution looks good to me, or you can also consider setting the > broker config *"log.message.timestamp.difference.max.ms > <http://log.message.timestamp.difference.max.ms>"* to very long values. > > > Guozhang

Re: Delayed Queue

2014-02-23 Thread Martin Kleppmann
Hi Jagan, unfortunately Kafka doesn't have the same TTL feature as you find in RabbitMQ. This is because Kafka and RabbitMQ have a fundamentally different design: - RabbitMQ brokers individually track the status of each message (whether it has been acked by a consumer, when its TTL expires, etc

Re: Consumer group ID for high level consumer

2014-02-26 Thread Martin Kleppmann
Hi Binita, The consumer group (group.id) is a mechanism for sharing the load of consuming a high-volume topic between multiple consumers. If you don't set a group ID, each consumer consumes all the partitions of a topic. If you set several consumers to the same group ID, the partitions of the t

Re: Reg Partition and Replica?

2014-02-27 Thread Martin Kleppmann
Hi Bala, Partitions are what give Kafka parallelism and allow it to scale. Every message exists in exactly one partition. Replicas are exact copies of partitions on different machines. They allow Kafka to be reliable and not lose messages if a machine dies. So the answers are: 1. No, a messag

Re: Reg Partition

2014-03-05 Thread Martin Kleppmann
Hi Bala, The way Kafka works, each partition is a sequence of messages in the order that they were produced, and each message has a position (offset) in this sequence. Kafka brokers don't keep track of which consumer has seen which messages. Instead, each consumer keeps track of the latest offs

Re: Reg Partition

2014-03-06 Thread Martin Kleppmann
formance improvement. Will there > not be any harm if I have some consumers consuming from the same partitions, > if I can tolerate slowness/performance degradation? > > Regards > Bala > > -Original Message- > From: Martin Kleppmann [mailto:mkleppm...@linkedin.co

Re: fidelity of offsets when mirroring

2014-03-06 Thread Martin Kleppmann
If you really don't mind some messages being lost during failover, your simplest option would be to just restart consumers at the latest offset in the new AZ. Or, if you don't mind messages being duplicated, rewind to an earlier time t as explained by Jun and Neha. Another thought: you might be

Re: Are offsets unique, immutable identifiers for a message in a topic?

2014-03-07 Thread Martin Kleppmann
Almost right: offsets are unique, immutable identifiers for a message within a topic-partition. Each partition has its own sequence of offsets, but a (topic, partition, offset) triple uniquely and persistently identifies a particular message. For log retention you have essentially two options:

Re: Are offsets unique, immutable identifiers for a message in a topic?

2014-03-07 Thread Martin Kleppmann
On 7 Mar 2014, at 14:11, "Maier, Dr. Andreas" wrote: >> In your case, it sounds like time-based retention with a fairly long >> retention period is the way to go. You could potentially store the >> offsets of messages to retry in a separate Kafka topic. > > I was also thinking about doing that. H