Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Matt Daum
We actually don't have a kafka cluster setup yet at all. Right now just have 8 of our application servers. We currently sample some impressions and then dedupe/count outside at a different DC, but are looking to try to analyze all impressions for some overall analytics. Our requests are around 1

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
Yes, that's the general design pattern. Another thing to look into is to compress the data. Now Kafka consumer/producer can already do it for you, but we choose to compress in the applications due to a historic issue that drgraded performance, although it has been resolved now. Also, just kee

Mirror Maker Errors

2018-03-04 Thread Oleg Danilovich
Hello, i running mirror maker for mirroring data from one cluster to another. Now i get this error in log Feb 25 22:38:56 ld4-27 MirrorMaker[54827]: [2018-02-25 22:38:56,914] ERROR Error when sending message to topic rc.exchange.jpy with key: 29 bytes, value: 153 bytes with error: (org.apache.kafk

RE: Mirror Maker Errors

2018-03-04 Thread adrien ruffie
Hi Oleg, do you have configured your consumer/producer with "no data loss" configuration like bellow ? For Consumer, set auto.commit.enabled=false in consumer.properties For Producer 1. max.in.flight.requests.per.connection=1 2. retries=Int.MaxValue 3. acks=-1 4. block.on.buffer.f

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Matt Daum
Thanks! For the counts I'd need to use a global table to make sure it's across all the data right? Also having millions of different values per grouped attribute will scale ok? On Mar 4, 2018 8:45 AM, "Thakrar, Jayesh" wrote: > Yes, that's the general design pattern. Another thing to look into

Re: committing offset metadata in kafka streams

2018-03-04 Thread Matthias J. Sax
You are correct. This is not possible atm. Note, that commits happen "under the hood" and users cannot commit explicitly. Users can only "request" as commit -- this implies that Kafka Streams will commit as soon as possible -- but when `context#commit()` returns, the commit is not done yet (it onl

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
I don’t have any experience/knowledge on the Kafka inbuilt datastore, but believe thatfor some portions of streaming Kafka uses (used?) RocksDB to locally store some state info in the brokers. Personally I would use an external datastore. There's a wide choice out there - regular key-value stor

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh
BTW - I did not mean to rule-out Aerospike as a possible datastore. Its just that I am not familiar with it, but surely looks like a good candidate to store the raw and/or aggregated data, given that it also has a Kafka Connect module. From: "Thakrar, Jayesh" Date: Sunday, March 4, 2018 at 9:25

Re: Setting topic's offset from the shell

2018-03-04 Thread Zoran
The procedure you have suggested is good for replaying everything from the very beginning, but I would like to replay messages from an arbitrary offset. On the backend I have a ClickHouse table that listens Kafka topic with its group_id. In case of problems between ClickHouse table and Kafka