Identity Mirroring

2020-05-07 Thread Henry Cai
I saw this feature mentioned in the cloudera blog post: https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/ High Throughput Identity Mirroring The batch does not need to be decompressed and compressed and deserialized and serialized if nothing had to be changed. Identity mirroring can ha

MM2 with older Kafka version

2020-04-22 Thread Henry Cai
Looks like MM2 ships with Kafka 2.4, if our brokers are still running on older kafka version (2.3), can the MM2 running with 2.4 code work with brokers running with 2.3 code? Thanks.

No backoff on producer sending from MirrorMaker on metadata error

2019-03-10 Thread Henry Cai
Version: Kafka 2.0.0 (Believe the problem also exists on 2.2.0) The producer thread in MirrorMaker will go repeatedly sending loop without any backoffs when encountered metadata errors. For example, the following error messages will repeat itself for hundred times with very small interval (100ms)

Re: Kafka 2.0: High level consumer is slow to deal with SSL certification expiration

2019-03-10 Thread Henry Cai
wrote: > Hi, > > It would be great to verify that this happens with Kafka 2.2.0 RC1. If it > does, then please file a JIRA so that this doesn't get lost. > > Ismael > > On Thu, Mar 7, 2019 at 4:19 PM Henry Cai > wrote: > > > Hi, > > > > We have

Kafka 2.0: High level consumer is slow to deal with SSL certification expiration

2019-03-07 Thread Henry Cai
Hi, We have been using Kafka 2.0's mirror maker (which used High level consumer) to do replication. The topic is SSL enabled and the certificate will expire at a random time within 12 hours. When the certificate expired we will see many SSL related exception in the log [2019-03-07 18:02:54,128]

Re: broker logs unusable after KAFKA-6150: Make Repartition Topics Transient

2018-07-06 Thread Henry Cai
, 2018 at 4:08 PM, Guozhang Wang wrote: > Hello Henry, > > What's your server-side log4j settings? Could you use WARN on these two > classes: kafka.server.epoch.LeaderEpochFileCache and kafka.log.Log. > > > > Guozhang > > > On Fri, Jul 6, 2018 at 3:08 P

broker logs unusable after KAFKA-6150: Make Repartition Topics Transient

2018-07-06 Thread Henry Cai
@guozhang After we moved to kafka-1.1.0 for our Kafka streams application, our broker logs are polluted with loggings such as: [2018-07-06 21:59:26,170] INFO Cleared earliest 0 entries from epoch cache based on passed offset 301483601 leaving 1 in EpochFile for partition inflight_spend_unified_st

How to fix ISR/URP when there is no new FetchRequest made?

2018-06-30 Thread Henry Cai
We have URP reported for some empty topics, it was very annoying. The below screen shots shows both leader (1004) and follower (1001) has the exact same content. They probably get into this stage when some brokers were prematurely stopped and restarted. By looking at the code, looks like ISR inf

source code location for KSQL

2018-04-27 Thread Henry Cai
I think KSQL is also open sourced, where is the source code location for KSQL in github? Thanks.

clean leader election on kafka 0.10.2.1

2017-11-02 Thread Henry Cai
We were on kafka 0.10.2.1. We tried to switch from unclean leader election to clean leader election and found it became very difficult to start up the whole cluster. It seems the hosts went into a deadlock situation during startup - broker A was a follower on partition 1 and waits for the lea

Re: Achieve message ordering through Async Producer

2017-03-28 Thread Henry Cai
sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled). On Tue, Mar 28, 2017 at 4:17 PM, Henry Cai wrote: > If I use kafka's AsyncProducer, would I still be able to achieve message > ordering within the same partition? When the first message failed t

Achieve message ordering through Async Producer

2017-03-28 Thread Henry Cai
If I use kafka's AsyncProducer, would I still be able to achieve message ordering within the same partition? When the first message failed to send to broker, will the second message (within the same kafka partition) being sent out ahead of first message? Based on this email thread, it seems Async

Re: [VOTE] 0.10.1.0 RC0

2016-10-06 Thread Henry Cai
Why is this feature in the release note? - [KAFKA-264 ] - Change the consumer side load balancing and distributed co-ordination to use a consumer co-ordinator I thought this was already done in 2015. On Thu, Oct 6, 2016 at 4:55 PM, Vahid

Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-16 Thread Henry Cai
+1 for Lambda expression. On Thu, Jun 16, 2016 at 1:48 PM, Rajiv Kurian wrote: > +1 > > On Thu, Jun 16, 2016 at 1:45 PM, Ismael Juma wrote: > > > Hi all, > > > > I would like to start a discussion on making Java 8 a minimum requirement > > for Kafka's next feature release (let's say Kafka 0.10.

Manual update offset for new consumer

2016-06-10 Thread Henry Cai
When we were using the old consumer, we can use zookeeper client tool to manually set the offset for a consumer group. For the new consumer when the offsets is stored in broker, is there a tool to do the same thing?

Re: Tool to look at what's stored in internal topics on kafka 0.9

2016-05-09 Thread Henry Cai
> > > > > Hi! > > > If you produce your messages with key type (optional) and value type > of > > > String then you can you Kafka tool: http://www.kafkatool.com/ > > > > > > I hope that it helps. > > > Regards, > > > Florin > &

Tool to look at what's stored in internal topics on kafka 0.9

2016-05-05 Thread Henry Cai
When we are on kafka 0.8, all the consumer offsets are stored in ZK and we can use some ZK browser to see the contents in different ZK paths. On kafka 0.9, when everything moved to internal kafka topics, do we have a tool to browse through the contents in those topics?

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
t at the end of the window" the reality is that if late data comes you > still have to produce additional outputs. So you don't produce one output > at the end but rather potentially any number of outputs. So a simple way to > think about this is that you produce all updates but o

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
.0.0. But we can push it to be in 0.10.0.1. > > Guozhang > > On Wed, Apr 20, 2016 at 4:57 PM, Henry Cai > wrote: > > > Thanks. > > > > Do you know when KAFKA-3101 will be implemented? > > > > I also add a note to that JIRA for a left outer join use case

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
g, while we keep the aggregated values > in RocksDB, we do not send the updated values for each receiving record but > only do that based on some policy. More generally we can have a trigger > mechanism for user to customize when to emit. > > > Guozhang > > > On Wed, Apr 2

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
at the > streams end (note that we have two buffers here, the consumer buffers raw > bytes, and the streams library take raw bytes and buffer the de-serialized > objects, and threshold on its own buffer to pause / resume the consumer). > > > Guozhang > > On Wed, Apr 20, 2016 at 3

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-20 Thread Henry Cai
ommitted, > then upon failover it will simply re-consumer these records again from > Kafka. > > Guozhang > > On Tue, Apr 19, 2016 at 11:34 PM, Henry Cai > wrote: > > > For the technique of custom Processor of holding call to > context.forward(), > > if I hold

Re: Kafka Streams: finding a solution to a particular use case

2016-04-20 Thread Henry Cai
he key is actually a combo of {join window, key, > sequenceID} and hence all records are unique, we do not need log compaction > for its changelogs. > > Guozhang > > > On Tue, Apr 19, 2016 at 11:28 PM, Henry Cai > wrote: > > > In my case, the key space is u

Re: How to "buffer" a stream with high churn and output only at the end of a window?

2016-04-19 Thread Henry Cai
For the technique of custom Processor of holding call to context.forward(), if I hold it for 10 minutes, what does that mean for the consumer acknowledgement on source node? I guess if I hold it for 10 minutes, the consumer is not going to ack to the upstream queue, will that impact the consumer p

Re: Kafka Streams: finding a solution to a particular use case

2016-04-19 Thread Henry Cai
k this semantics can be improved to "stream > time", which is less vulnerable to early out-of-ordering records. > > > Do you want to create JIRAs for those issues I mentioned in the previous > emails to keep track? > > > Guozhang > > > On Tue, Apr 19, 2016

Re: Kafka Streams: finding a solution to a particular use case

2016-04-19 Thread Henry Cai
log compactions such as > compaction intervals and thresholds in Kafka Streams so that users can > control its behavior. Actually, Henry do you mind creating a JIRA for this > purpose and list what you would like to control log compaction? > > > Guozhang > > > > > >

Re: Kafka Streams: finding a solution to a particular use case

2016-04-19 Thread Henry Cai
Related to the log compaction question: " it will be log compacted on the key over time", how do we control the time for log compaction? For the log compaction implementation, is the storage used to map a new value for a given key stored in memory or on disk? On Tue, Apr 19, 2016 at 8:58 AM, Guil

Re: future of Camus?

2015-10-22 Thread Henry Cai
Take a look at secor: https://github.com/pinterest/secor Secor is a no-frill kafka->HDFS/Ingesting tool, doesn't depend on any underlying systems such as Hadoop, it only uses Kafka high level consumer to balance the work loads. Very easy to understand and manage. It's probably the 2nd most popu

Re: InvalidMessageException: Message is corrupt, Consumer stuck

2015-08-05 Thread Henry Cai
ble to go into ZooKeeper, find the consumer-group, topic and > partition, and increment the offset past the "corrupt" point. > > On Tue, Aug 4, 2015 at 10:23 PM, Henry Cai > wrote: > > > Hi, > > > > We are using the Kafka high-level consumer 8.1.1, somehow

InvalidMessageException: Message is corrupt, Consumer stuck

2015-08-04 Thread Henry Cai
Hi, We are using the Kafka high-level consumer 8.1.1, somehow we got a corrupted message in the topic. We are not sure the root cause of this, but the problem we are having now is the HL consumer is stuck in that position: kafka.message.InvalidMessageException: Message is corrupt (stored crc = 5

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-16 Thread Henry Cai
shows it satisfy our workload. SSD is better in latency >> curve but pretty comparable in terms of throughput. we can use the extra >> space from HDD for longer retention period. >> >> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai >> >> >> Henry Cai >>

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
be network >> bottlenecked. >> >> Wes >> >> >> Steven Wu >> June 2, 2015 at 1:07 PM >> EBS (network attached storage) has got a lot better over the last a few >> years. we don't quite trust it for kafka workload. >> >> At Netflix, we were going with the new d2 instance type (HDD). our >> perf/load testing shows it satisfy our workload. SSD is better in latency >> curve but pretty comparable in terms of throughput. we can use the extra >> space from HDD for longer retention period. >> >> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai >> >> >> >

HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
We have been hosting kafka brokers in Amazon EC2 and we are using EBS disk. But periodically we were hit by long I/O wait time on EBS in some Availability Zones. We are thinking to change the instance types to a local HDD or local SSD. HDD is cheaper and bigger and seems quite fit for the Kafka u

Re: Replication tools to move topics/partitions gradually

2015-05-25 Thread Henry Cai
is, but we can probably put > out the script we have in the next couple weeks. Then we can just iterate > on it. > > -Todd > > > On May 24, 2015, at 8:46 PM, Henry Cai > wrote: > > > > Todd, > > > > This is very promising. Do you know when will we be abl

Re: Replication tools to move topics/partitions gradually

2015-05-24 Thread Henry Cai
e") and break it down into a > configurable number of discrete moves so it doesn't tank the cluster. > > And yes, I've finally started the process of departing them from the > LinkedIn-specific tooling so we can release them to everyone else :) > > -Todd >

Replication tools to move topics/partitions gradually

2015-05-24 Thread Henry Cai
We have a kafka cluster with 10 brokers and we are using the kafka replication tool (kafka-reassign-partitions.sh) when we need to add more brokers to the cluster. But this tool tends to move too many topic/partitions around at the same time which causes instability. Do we have an option to do it