date:20180304

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Matt Daum

We actually don't have a kafka cluster setup yet at all.  Right now just
have 8 of our application servers.  We currently sample some impressions
and then dedupe/count outside at a different DC, but are looking to try to
analyze all impressions for some overall analytics.

Our requests are around 100-200 bytes each.  If we lost some of them due to
network jitter etc. it would be fine we're trying to just get overall a
rough count of each attribute.  Creating batched messages definitely makes
sense and will also cut down on the network IO.

We're trying to determine the required setup for Kafka to do what we're
looking to do as these are physical servers so we'll most likely need to
buy new hardware.  For the first run I think we'll try it out on one of our
application clusters that get a smaller amount traffic (300-400k req/sec)
and run the kafka cluster on the same machines as the applications.

So would the best route here be something like each application server
batches requests, send it to kafka, have a stream consumer that then
tallies up the totals per attribute that we want to track, output that to a
new topic, which then goes to a sink to either a DB or something like S3
which then we read into our external DBs?

Thanks!

On Sun, Mar 4, 2018 at 12:31 AM, Thakrar, Jayesh <
jthak...@conversantmedia.com> wrote:

> Matt,
>
> If I understand correctly, you have an 8 node Kafka cluster and need to
> support  about 1 million requests/sec into the cluster from source servers
> and expect to consume that for aggregation.
>
> How big are your msgs?
>
> I would suggest looking into batching multiple requests per single Kafka
> msg to achieve desired throughput.
>
> So e.g. on the request receiving systems, I would suggest creating a
> logical avro file (byte buffer) of say N requests and then making that into
> one Kafka msg payload.
>
> We have a similar situation (https://www.slideshare.net/JayeshThakrar/
> apacheconflumekafka2016) and found anything from 4x to 10x better
> throughput with batching as compared to one request per msg.
> We have different kinds of msgs/topics and the individual "request" size
> varies from  about 100 bytes to 1+ KB.
>
> On 3/2/18, 8:24 AM, "Matt Daum"  wrote:
>
> I am new to Kafka but I think I have a good use case for it.  I am
> trying
> to build daily counts of requests based on a number of different
> attributes
> in a high throughput system (~1 million requests/sec. across all  8
> servers).  The different attributes are unbounded in terms of values,
> and
> some will spread across 100's of millions values.  This is my current
> through process, let me know where I could be more efficient or if
> there is
> a better way to do it.
>
> I'll create an AVRO object "Impression" which has all the attributes
> of the
> inbound request.  My application servers then will on each request
> create
> and send this to a single kafka topic.
>
> I'll then have a consumer which creates a stream from the topic.  From
> there I'll use the windowed timeframes and groupBy to group by the
> attributes on each given day.  At the end of the day I'd need to read
> out
> the data store to an external system for storage.  Since I won't know
> all
> the values I'd need something similar to the KVStore.all() but for
> WindowedKV Stores.  This appears that it'd be possible in 1.1 with this
> commit:
> https://github.com/apache/kafka/commit/1d1c8575961bf6bce7decb049be7f1
> 0ca76bd0c5
> .
>
> Is this the best approach to doing this?  Or would I be better using
> the
> stream to listen and then an external DB like Aerospike to store the
> counts
> and read out of it directly end of day.
>
> Thanks for the help!
> Daum
>
>
>

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh

Yes, that's the general design pattern. Another thing to look into is to 
compress the data. Now Kafka consumer/producer can already do it for you, but 
we choose to compress in the applications due to a historic issue that drgraded 
performance,  although it has been resolved now.

Also,  just keep in mind that while you do your batching, kafka producer also 
tries to batch msgs to Kafka, and you will need to ensure you have enough 
buffer memory. However that's all configurable.

Finally ensure you have the latest java updates and have kafka 0.10.2 or higher.

Jayesh


From: Matt Daum 
Sent: Sunday, March 4, 2018 7:06:19 AM
To: Thakrar, Jayesh
Cc: users@kafka.apache.org
Subject: Re: Kafka Setup for Daily counts on wide array of keys

We actually don't have a kafka cluster setup yet at all.  Right now just have 8 
of our application servers.  We currently sample some impressions and then 
dedupe/count outside at a different DC, but are looking to try to analyze all 
impressions for some overall analytics.

Our requests are around 100-200 bytes each.  If we lost some of them due to 
network jitter etc. it would be fine we're trying to just get overall a rough 
count of each attribute.  Creating batched messages definitely makes sense and 
will also cut down on the network IO.

We're trying to determine the required setup for Kafka to do what we're looking 
to do as these are physical servers so we'll most likely need to buy new 
hardware.  For the first run I think we'll try it out on one of our application 
clusters that get a smaller amount traffic (300-400k req/sec) and run the kafka 
cluster on the same machines as the applications.

So would the best route here be something like each application server batches 
requests, send it to kafka, have a stream consumer that then tallies up the 
totals per attribute that we want to track, output that to a new topic, which 
then goes to a sink to either a DB or something like S3 which then we read into 
our external DBs?

Thanks!

On Sun, Mar 4, 2018 at 12:31 AM, Thakrar, Jayesh 
mailto:jthak...@conversantmedia.com>> wrote:
Matt,

If I understand correctly, you have an 8 node Kafka cluster and need to support 
 about 1 million requests/sec into the cluster from source servers and expect 
to consume that for aggregation.

How big are your msgs?

I would suggest looking into batching multiple requests per single Kafka msg to 
achieve desired throughput.

So e.g. on the request receiving systems, I would suggest creating a logical 
avro file (byte buffer) of say N requests and then making that into one Kafka 
msg payload.

We have a similar situation 
(https://www.slideshare.net/JayeshThakrar/apacheconflumekafka2016) and found 
anything from 4x to 10x better throughput with batching as compared to one 
request per msg.
We have different kinds of msgs/topics and the individual "request" size varies 
from  about 100 bytes to 1+ KB.

On 3/2/18, 8:24 AM, "Matt Daum" mailto:m...@setfive.com>> 
wrote:

I am new to Kafka but I think I have a good use case for it.  I am trying
to build daily counts of requests based on a number of different attributes
in a high throughput system (~1 million requests/sec. across all  8
servers).  The different attributes are unbounded in terms of values, and
some will spread across 100's of millions values.  This is my current
through process, let me know where I could be more efficient or if there is
a better way to do it.

I'll create an AVRO object "Impression" which has all the attributes of the
inbound request.  My application servers then will on each request create
and send this to a single kafka topic.

I'll then have a consumer which creates a stream from the topic.  From
there I'll use the windowed timeframes and groupBy to group by the
attributes on each given day.  At the end of the day I'd need to read out
the data store to an external system for storage.  Since I won't know all
the values I'd need something similar to the KVStore.all() but for
WindowedKV Stores.  This appears that it'd be possible in 1.1 with this
commit:

https://github.com/apache/kafka/commit/1d1c8575961bf6bce7decb049be7f10ca76bd0c5
.

Is this the best approach to doing this?  Or would I be better using the
stream to listen and then an external DB like Aerospike to store the counts
and read out of it directly end of day.

Thanks for the help!
Daum

Mirror Maker Errors

2018-03-04 Thread Oleg Danilovich

Hello, i running mirror maker for mirroring data from one cluster to
another.

Now i get this error in log
Feb 25 22:38:56 ld4-27 MirrorMaker[54827]: [2018-02-25 22:38:56,914] ERROR
Error when sending message to topic rc.exchange.jpy with key: 29 bytes,
value: 153 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
Feb 25 22:38:56 ld4-27 MirrorMaker[54827]:
org.apache.kafka.common.errors.TimeoutException: Expiring 82 record(s) for
rc.exchange.jpy-0: 50381 ms has passed since last append

I use maxintvalue retries in producer config.
Why this message occured. I cant detect any issues.

-- 
Best regards,
*Oleg Danilovich*

RE: Mirror Maker Errors

2018-03-04 Thread adrien ruffie

Hi Oleg,


do you have configured your consumer/producer with "no data loss" configuration 
like bellow ?

For Consumer, set auto.commit.enabled=false in consumer.properties

For Producer

  1.  max.in.flight.requests.per.connection=1
  2.  retries=Int.MaxValue
  3.  acks=-1
  4.  block.on.buffer.full=true

Like advised in this topic:

https://community.hortonworks.com/articles/79891/kafka-mirror-maker-best-practices.html



I have the following assumption:

Your MM can't reach your targeted cluster (rc.exchange.jpy right ?)

for this reason, your producer will retry indefinitely.

But because mirror maker will  block on producer buffer when is full,

and now because your buffer is full and you have a retries policy + block to 
true combined with acks all, the wholeness of

these parameter may exceed the retention time of your records.


Do you have tracked this lead ?

(It's just an idea)


I hope you solve quickly your issue.


Adrien


De : Oleg Danilovich 
Envoyé : samedi 3 mars 2018 15:42:47
À : users@kafka.apache.org
Objet : Mirror Maker Errors

Hello, i running mirror maker for mirroring data from one cluster to
another.

Now i get this error in log
Feb 25 22:38:56 ld4-27 MirrorMaker[54827]: [2018-02-25 22:38:56,914] ERROR
Error when sending message to topic rc.exchange.jpy with key: 29 bytes,
value: 153 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
Feb 25 22:38:56 ld4-27 MirrorMaker[54827]:
org.apache.kafka.common.errors.TimeoutException: Expiring 82 record(s) for
rc.exchange.jpy-0: 50381 ms has passed since last append

I use maxintvalue retries in producer config.
Why this message occured. I cant detect any issues.

--
Best regards,
*Oleg Danilovich*

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Matt Daum

Thanks! For the counts I'd need to use a global table to make sure it's
across all the data right?   Also having millions of different values per
grouped attribute will scale ok?

On Mar 4, 2018 8:45 AM, "Thakrar, Jayesh" 
wrote:

> Yes, that's the general design pattern. Another thing to look into is to
> compress the data. Now Kafka consumer/producer can already do it for you,
> but we choose to compress in the applications due to a historic issue that
> drgraded performance,  although it has been resolved now.
>
> Also,  just keep in mind that while you do your batching, kafka producer
> also tries to batch msgs to Kafka, and you will need to ensure you have
> enough buffer memory. However that's all configurable.
>
> Finally ensure you have the latest java updates and have kafka 0.10.2 or
> higher.
>
> Jayesh
>
> --
> *From:* Matt Daum 
> *Sent:* Sunday, March 4, 2018 7:06:19 AM
> *To:* Thakrar, Jayesh
> *Cc:* users@kafka.apache.org
> *Subject:* Re: Kafka Setup for Daily counts on wide array of keys
>
> We actually don't have a kafka cluster setup yet at all.  Right now just
> have 8 of our application servers.  We currently sample some impressions
> and then dedupe/count outside at a different DC, but are looking to try to
> analyze all impressions for some overall analytics.
>
> Our requests are around 100-200 bytes each.  If we lost some of them due
> to network jitter etc. it would be fine we're trying to just get overall a
> rough count of each attribute.  Creating batched messages definitely makes
> sense and will also cut down on the network IO.
>
> We're trying to determine the required setup for Kafka to do what we're
> looking to do as these are physical servers so we'll most likely need to
> buy new hardware.  For the first run I think we'll try it out on one of our
> application clusters that get a smaller amount traffic (300-400k req/sec)
> and run the kafka cluster on the same machines as the applications.
>
> So would the best route here be something like each application server
> batches requests, send it to kafka, have a stream consumer that then
> tallies up the totals per attribute that we want to track, output that to a
> new topic, which then goes to a sink to either a DB or something like S3
> which then we read into our external DBs?
>
> Thanks!
>
> On Sun, Mar 4, 2018 at 12:31 AM, Thakrar, Jayesh <
> jthak...@conversantmedia.com> wrote:
>
>> Matt,
>>
>> If I understand correctly, you have an 8 node Kafka cluster and need to
>> support  about 1 million requests/sec into the cluster from source servers
>> and expect to consume that for aggregation.
>>
>> How big are your msgs?
>>
>> I would suggest looking into batching multiple requests per single Kafka
>> msg to achieve desired throughput.
>>
>> So e.g. on the request receiving systems, I would suggest creating a
>> logical avro file (byte buffer) of say N requests and then making that into
>> one Kafka msg payload.
>>
>> We have a similar situation (https://www.slideshare.net/Ja
>> yeshThakrar/apacheconflumekafka2016) and found anything from 4x to 10x
>> better throughput with batching as compared to one request per msg.
>> We have different kinds of msgs/topics and the individual "request" size
>> varies from  about 100 bytes to 1+ KB.
>>
>> On 3/2/18, 8:24 AM, "Matt Daum"  wrote:
>>
>> I am new to Kafka but I think I have a good use case for it.  I am
>> trying
>> to build daily counts of requests based on a number of different
>> attributes
>> in a high throughput system (~1 million requests/sec. across all  8
>> servers).  The different attributes are unbounded in terms of values,
>> and
>> some will spread across 100's of millions values.  This is my current
>> through process, let me know where I could be more efficient or if
>> there is
>> a better way to do it.
>>
>> I'll create an AVRO object "Impression" which has all the attributes
>> of the
>> inbound request.  My application servers then will on each request
>> create
>> and send this to a single kafka topic.
>>
>> I'll then have a consumer which creates a stream from the topic.  From
>> there I'll use the windowed timeframes and groupBy to group by the
>> attributes on each given day.  At the end of the day I'd need to read
>> out
>> the data store to an external system for storage.  Since I won't know
>> all
>> the values I'd need something similar to the KVStore.all() but for
>> WindowedKV Stores.  This appears that it'd be possible in 1.1 with
>> this
>> commit:
>> https://github.com/apache/kafka/commit/1d1c8575961bf6bce7dec
>> b049be7f10ca76bd0c5
>> .
>>
>> Is this the best approach to doing this?  Or would I be better using
>> the
>> stream to listen and then an external DB like Aerospike to store the
>> counts
>> and read out of it directly end of day.
>>
>> Thanks for the help!
>> Daum
>>
>>
>>
>

Re: committing offset metadata in kafka streams

2018-03-04 Thread Matthias J. Sax

You are correct. This is not possible atm.

Note, that commits happen "under the hood" and users cannot commit
explicitly. Users can only "request" as commit -- this implies that
Kafka Streams will commit as soon as possible -- but when
`context#commit()` returns, the commit is not done yet (it only sets a
flag).

What is your use case for this? How would you want to use this from an
API point of view?

Feel free to open a feature request JIRA -- we don't have any plans to
add this atm -- it's the first time anybody asks for this feature. If
there is a JIRA, maybe somebody picks it up :)

-Matthias

On 3/3/18 6:51 AM, Stas Chizhov wrote:
> Hi,
> 
> There seems to be no way to commit custom metadata along with offsets from
> within Kafka Streams.
> Are there any plans to expose this functionality or have I missed something?
> 
> Best regards,
> Stanislav.
> 

signature.asc
Description: OpenPGP digital signature

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh

I don’t have any experience/knowledge on the Kafka inbuilt datastore, but 
believe thatfor some
portions of streaming Kafka uses (used?) RocksDB to locally store some state 
info in the brokers.

Personally  I would use an external datastore.
There's a wide choice out there - regular key-value stores like Cassandra, 
ScyllaDB, RocksDB, timeseries key-value stores like InfluxDB to regular RDBMSes.
If you have hadoop in the picture, its even possible to bypass a datastore 
completely (if appropriate) and store the raw data on HDFS organized by (say) 
date+hour
by using periodic (minute to hourly) extract jobs and store data in 
hive-compatible directory structure using ORC or Parquet.

The reason for shying away from NoSQL datastores is their tendency to do 
compaction on data which leads to unnecessary reads and writes (referred to as 
write-amplification).
With periodic jobs in Hadoop, you (usually) write your data once only. Ofcourse 
with that approach you loose the "random/keyed access" to the data,
but if you are only interested in the aggregations across various dimensions, 
those can be stored in a SQL/NoSQL datastore.

As for "having millions of different values per grouped attribute" - not sure 
what you mean by them.
Is it that each record has some fields that represent different kinds of 
attributes and that their domain can have millions to hundreds of millions of 
values?
I don't think that should matter.

From: Matt Daum 
Date: Sunday, March 4, 2018 at 2:39 PM
To: "Thakrar, Jayesh" 
Cc: "users@kafka.apache.org" 
Subject: Re: Kafka Setup for Daily counts on wide array of keys

Thanks! For the counts I'd need to use a global table to make sure it's across 
all the data right?   Also having millions of different values per grouped 
attribute will scale ok?

On Mar 4, 2018 8:45 AM, "Thakrar, Jayesh" 
mailto:jthak...@conversantmedia.com>> wrote:
Yes, that's the general design pattern. Another thing to look into is to 
compress the data. Now Kafka consumer/producer can already do it for you, but 
we choose to compress in the applications due to a historic issue that drgraded 
performance,  although it has been resolved now.
Also,  just keep in mind that while you do your batching, kafka producer also 
tries to batch msgs to Kafka, and you will need to ensure you have enough 
buffer memory. However that's all configurable.
Finally ensure you have the latest java updates and have kafka 0.10.2 or higher.
Jayesh


From: Matt Daum mailto:m...@setfive.com>>
Sent: Sunday, March 4, 2018 7:06:19 AM
To: Thakrar, Jayesh
Cc: users@kafka.apache.org
Subject: Re: Kafka Setup for Daily counts on wide array of keys

We actually don't have a kafka cluster setup yet at all.  Right now just have 8 
of our application servers.  We currently sample some impressions and then 
dedupe/count outside at a different DC, but are looking to try to analyze all 
impressions for some overall analytics.

Our requests are around 100-200 bytes each.  If we lost some of them due to 
network jitter etc. it would be fine we're trying to just get overall a rough 
count of each attribute.  Creating batched messages definitely makes sense and 
will also cut down on the network IO.

We're trying to determine the required setup for Kafka to do what we're looking 
to do as these are physical servers so we'll most likely need to buy new 
hardware.  For the first run I think we'll try it out on one of our application 
clusters that get a smaller amount traffic (300-400k req/sec) and run the kafka 
cluster on the same machines as the applications.

So would the best route here be something like each application server batches 
requests, send it to kafka, have a stream consumer that then tallies up the 
totals per attribute that we want to track, output that to a new topic, which 
then goes to a sink to either a DB or something like S3 which then we read into 
our external DBs?

Thanks!

On Sun, Mar 4, 2018 at 12:31 AM, Thakrar, Jayesh 
mailto:jthak...@conversantmedia.com>> wrote:
Matt,

If I understand correctly, you have an 8 node Kafka cluster and need to support 
 about 1 million requests/sec into the cluster from source servers and expect 
to consume that for aggregation.

How big are your msgs?

I would suggest looking into batching multiple requests per single Kafka msg to 
achieve desired throughput.

So e.g. on the request receiving systems, I would suggest creating a logical 
avro file (byte buffer) of say N requests and then making that into one Kafka 
msg payload.

We have a similar situation 
(https://www.slideshare.net/JayeshThakrar/apacheconflumekafka2016) and found 
anything from 4x to 10x better throughput with batching as compared to one 
request per msg.
We have different kinds of msgs/topics and the individual "request" size varies 
from  about 100 bytes to 1+ KB.

On 3/2/18, 8:24 AM, "Matt Daum" mailto:m...@setfive.com>> 
wrote:

I am new to Kafka but I think I

Re: Kafka Setup for Daily counts on wide array of keys

2018-03-04 Thread Thakrar, Jayesh

BTW - I did not mean to rule-out Aerospike as a possible datastore.
Its just that I am not familiar with it, but surely looks like a good candidate 
to store the raw and/or aggregated data, given that it also has a Kafka Connect 
module.

From: "Thakrar, Jayesh" 
Date: Sunday, March 4, 2018 at 9:25 PM
To: Matt Daum 
Cc: "users@kafka.apache.org" 
Subject: Re: Kafka Setup for Daily counts on wide array of keys

I don’t have any experience/knowledge on the Kafka inbuilt datastore, but 
believe thatfor some
portions of streaming Kafka uses (used?) RocksDB to locally store some state 
info in the brokers.

Personally  I would use an external datastore.
There's a wide choice out there - regular key-value stores like Cassandra, 
ScyllaDB, RocksDB, timeseries key-value stores like InfluxDB to regular RDBMSes.
If you have hadoop in the picture, its even possible to bypass a datastore 
completely (if appropriate) and store the raw data on HDFS organized by (say) 
date+hour
by using periodic (minute to hourly) extract jobs and store data in 
hive-compatible directory structure using ORC or Parquet.

The reason for shying away from NoSQL datastores is their tendency to do 
compaction on data which leads to unnecessary reads and writes (referred to as 
write-amplification).
With periodic jobs in Hadoop, you (usually) write your data once only. Ofcourse 
with that approach you loose the "random/keyed access" to the data,
but if you are only interested in the aggregations across various dimensions, 
those can be stored in a SQL/NoSQL datastore.

As for "having millions of different values per grouped attribute" - not sure 
what you mean by them.
Is it that each record has some fields that represent different kinds of 
attributes and that their domain can have millions to hundreds of millions of 
values?
I don't think that should matter.

From: Matt Daum 
Date: Sunday, March 4, 2018 at 2:39 PM
To: "Thakrar, Jayesh" 
Cc: "users@kafka.apache.org" 
Subject: Re: Kafka Setup for Daily counts on wide array of keys

Thanks! For the counts I'd need to use a global table to make sure it's across 
all the data right?   Also having millions of different values per grouped 
attribute will scale ok?

On Mar 4, 2018 8:45 AM, "Thakrar, Jayesh" 
mailto:jthak...@conversantmedia.com>> wrote:
Yes, that's the general design pattern. Another thing to look into is to 
compress the data. Now Kafka consumer/producer can already do it for you, but 
we choose to compress in the applications due to a historic issue that drgraded 
performance,  although it has been resolved now.
Also,  just keep in mind that while you do your batching, kafka producer also 
tries to batch msgs to Kafka, and you will need to ensure you have enough 
buffer memory. However that's all configurable.
Finally ensure you have the latest java updates and have kafka 0.10.2 or higher.
Jayesh

From: Matt Daum mailto:m...@setfive.com>>
Sent: Sunday, March 4, 2018 7:06:19 AM
To: Thakrar, Jayesh
Cc: users@kafka.apache.org
Subject: Re: Kafka Setup for Daily counts on wide array of keys

We actually don't have a kafka cluster setup yet at all.  Right now just have 8 
of our application servers.  We currently sample some impressions and then 
dedupe/count outside at a different DC, but are looking to try to analyze all 
impressions for some overall analytics.

Our requests are around 100-200 bytes each.  If we lost some of them due to 
network jitter etc. it would be fine we're trying to just get overall a rough 
count of each attribute.  Creating batched messages definitely makes sense and 
will also cut down on the network IO.

We're trying to determine the required setup for Kafka to do what we're looking 
to do as these are physical servers so we'll most likely need to buy new 
hardware.  For the first run I think we'll try it out on one of our application 
clusters that get a smaller amount traffic (300-400k req/sec) and run the kafka 
cluster on the same machines as the applications.

So would the best route here be something like each application server batches 
requests, send it to kafka, have a stream consumer that then tallies up the 
totals per attribute that we want to track, output that to a new topic, which 
then goes to a sink to either a DB or something like S3 which then we read into 
our external DBs?

Thanks!

On Sun, Mar 4, 2018 at 12:31 AM, Thakrar, Jayesh 
mailto:jthak...@conversantmedia.com>> wrote:
Matt,

If I understand correctly, you have an 8 node Kafka cluster and need to support 
 about 1 million requests/sec into the cluster from source servers and expect 
to consume that for aggregation.

How big are your msgs?

I would suggest looking into batching multiple requests per single Kafka msg to 
achieve desired throughput.

So e.g. on the request receiving systems, I would suggest creating a logical 
avro file (byte buffer) of say N requests and then making that into one Kafka 
msg payload.

We

Re: Setting topic's offset from the shell

2018-03-04 Thread Zoran

The procedure you have suggested is good for replaying everything from 
the very beginning, but I would like to replay messages from an 
arbitrary offset.


On the backend I have a ClickHouse table that listens Kafka topic with 
its group_id.


In case of problems between ClickHouse table and Kafka, I would like to 
replay messages that are missing from the ClickHouse table.


As I can't do anything on the ClickHouse side (there are just specified 
topic and group parameters), I need a mechanism to set the offset from 
the outside.


In order to minimize complexity of the operation, I would like to set 
the offset from Kafka shell scripts if possible.



On 02/28/2018 10:29 AM, UMESH CHAUDHARY wrote:

You might want to set group.id config in kafka-console-consumer (or in any
other consumer) to the value which you haven't used before. This will
replay all available messages in the topic from start if you use
--from-beginning in console consumer.

On Wed, 28 Feb 2018 at 14:19 Zoran  wrote:


Hi,


If I have a topic that has been fully read by consumers, how to set the
offset from the shell to some previous value in order to reread again
several messages?


Regards.

Re: Kafka Setup for Daily counts on wide array of keys

Re: Kafka Setup for Daily counts on wide array of keys

Mirror Maker Errors

RE: Mirror Maker Errors

Re: Kafka Setup for Daily counts on wide array of keys

Re: committing offset metadata in kafka streams

Re: Kafka Setup for Daily counts on wide array of keys

Re: Kafka Setup for Daily counts on wide array of keys

Re: Setting topic's offset from the shell

9 matches

Site Navigation

Mail list logo

Footer information