We actually don't have a kafka cluster setup yet at all. Right now just
have 8 of our application servers. We currently sample some impressions
and then dedupe/count outside at a different DC, but are looking to try to
analyze all impressions for some overall analytics.
Our requests are around 1
Yes, that's the general design pattern. Another thing to look into is to
compress the data. Now Kafka consumer/producer can already do it for you, but
we choose to compress in the applications due to a historic issue that drgraded
performance, although it has been resolved now.
Also, just kee
Hello, i running mirror maker for mirroring data from one cluster to
another.
Now i get this error in log
Feb 25 22:38:56 ld4-27 MirrorMaker[54827]: [2018-02-25 22:38:56,914] ERROR
Error when sending message to topic rc.exchange.jpy with key: 29 bytes,
value: 153 bytes with error:
(org.apache.kafk
Hi Oleg,
do you have configured your consumer/producer with "no data loss" configuration
like bellow ?
For Consumer, set auto.commit.enabled=false in consumer.properties
For Producer
1. max.in.flight.requests.per.connection=1
2. retries=Int.MaxValue
3. acks=-1
4. block.on.buffer.f
Thanks! For the counts I'd need to use a global table to make sure it's
across all the data right? Also having millions of different values per
grouped attribute will scale ok?
On Mar 4, 2018 8:45 AM, "Thakrar, Jayesh"
wrote:
> Yes, that's the general design pattern. Another thing to look into
You are correct. This is not possible atm.
Note, that commits happen "under the hood" and users cannot commit
explicitly. Users can only "request" as commit -- this implies that
Kafka Streams will commit as soon as possible -- but when
`context#commit()` returns, the commit is not done yet (it onl
I don’t have any experience/knowledge on the Kafka inbuilt datastore, but
believe thatfor some
portions of streaming Kafka uses (used?) RocksDB to locally store some state
info in the brokers.
Personally I would use an external datastore.
There's a wide choice out there - regular key-value stor
BTW - I did not mean to rule-out Aerospike as a possible datastore.
Its just that I am not familiar with it, but surely looks like a good candidate
to store the raw and/or aggregated data, given that it also has a Kafka Connect
module.
From: "Thakrar, Jayesh"
Date: Sunday, March 4, 2018 at 9:25
The procedure you have suggested is good for replaying everything from
the very beginning, but I would like to replay messages from an
arbitrary offset.
On the backend I have a ClickHouse table that listens Kafka topic with
its group_id.
In case of problems between ClickHouse table and Kafka