Hi Kafka users,
I'd like to announce a new open source project, called "Bottled Water", for
getting data from PostgreSQL into Kafka:
http://blog.confluent.io/2015/04/23/bottled-water-real-time-integration-of-postgresql-and-kafka/
https://github.com/confluentinc/bottledwater-pg/
Bottled Water com
Hi all,
We are debugging an issue with a Kafka Streams application that is producing
incorrect output. The application is a simple group-by on a key, and then
count. As expected, the application creates a repartitioning topic for the
group-by stage. The problem appears to be that messages are g
n the
second stage of the streams app. :-(
Is log.message.timestamp.type=LogAppendTime the best way of avoiding this
problem?
Thanks,
Martin
> On 29 Jan 2018, at 15:44, Martin Kleppmann wrote:
>
> Hi all,
>
> We are debugging an issue with a Kafka Streams application that is pr
to close this gap. As of now, your
> walk-around solution looks good to me, or you can also consider setting the
> broker config *"log.message.timestamp.difference.max.ms
> <http://log.message.timestamp.difference.max.ms>"* to very long values.
>
>
> Guozhang
Hi Jagan, unfortunately Kafka doesn't have the same TTL feature as you find in
RabbitMQ. This is because Kafka and RabbitMQ have a fundamentally different
design:
- RabbitMQ brokers individually track the status of each message (whether it
has been acked by a consumer, when its TTL expires, etc
Hi Binita,
The consumer group (group.id) is a mechanism for sharing the load of consuming
a high-volume topic between multiple consumers. If you don't set a group ID,
each consumer consumes all the partitions of a topic. If you set several
consumers to the same group ID, the partitions of the t
Hi Bala,
Partitions are what give Kafka parallelism and allow it to scale. Every message
exists in exactly one partition.
Replicas are exact copies of partitions on different machines. They allow Kafka
to be reliable and not lose messages if a machine dies.
So the answers are:
1. No, a messag
Hi Bala,
The way Kafka works, each partition is a sequence of messages in the order that
they were produced, and each message has a position (offset) in this sequence.
Kafka brokers don't keep track of which consumer has seen which messages.
Instead, each consumer keeps track of the latest offs
formance improvement. Will there
> not be any harm if I have some consumers consuming from the same partitions,
> if I can tolerate slowness/performance degradation?
>
> Regards
> Bala
>
> -Original Message-
> From: Martin Kleppmann [mailto:mkleppm...@linkedin.co
If you really don't mind some messages being lost during failover, your
simplest option would be to just restart consumers at the latest offset in the
new AZ. Or, if you don't mind messages being duplicated, rewind to an earlier
time t as explained by Jun and Neha.
Another thought: you might be
Almost right: offsets are unique, immutable identifiers for a message within a
topic-partition. Each partition has its own sequence of offsets, but a (topic,
partition, offset) triple uniquely and persistently identifies a particular
message.
For log retention you have essentially two options:
On 7 Mar 2014, at 14:11, "Maier, Dr. Andreas" wrote:
>> In your case, it sounds like time-based retention with a fairly long
>> retention period is the way to go. You could potentially store the
>> offsets of messages to retry in a separate Kafka topic.
>
> I was also thinking about doing that. H
12 matches
Mail list logo