Kafka Streams: a back-pressure question for windowed streams

2017-04-20 Thread Tianji Li
Hi there, I have a doubt regarding how to realize 'back-pressure' for windowed streams. Say I have a pipeline that consumes from a topic on a windowed basis, then do some processing (whenever punctuate is called), and produces into another topic. If the incoming rates from all consumers is 10M/s

Re: Kafka Streams: Is it possible to set heap size for the JVMs running Kafka Streams?

2017-04-06 Thread Tianji Li
ze$20streams|sort: > relevance/confluent-platform/jazg8f-qn0M/hulJwk60AAAJ>. > > One caveat is that RocksDb uses off-heap memory, but that can also be > tuned. > > Thanks > Eno > > > On 6 Apr 2017, at 14:23, Tianji Li wrote: > > > > If so, how to? > > > > Thanks > > Tianji > >

Kafka Streams: Is it possible to set heap size for the JVMs running Kafka Streams?

2017-04-06 Thread Tianji Li
If so, how to? Thanks Tianji

Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
215 <https://github.com/apache/ > kafka/blob/trunk/streams/src/main/java/org/apache/kafka/ > streams/processor/internals/StreamTask.java#L215> > > Cheers > Eno > > On 1 Apr 2017, at 17:14, Tianji Li wrote: > > > > Hi Eno, > > > > Could you point to

Re: Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
Eno > > > On Apr 1, 2017, at 3:26 PM, Tianji Li wrote: > > > > Hi there, > > > > Say a processor that is consuming topic A and producing into topic B, and > > somehow the processing takes long time, is it possible to pause the > > consuming from topic A,

Kafka Streams: Is it possible to pause/resume consuming a topic?

2017-04-01 Thread Tianji Li
Hi there, Say a processor that is consuming topic A and producing into topic B, and somehow the processing takes long time, is it possible to pause the consuming from topic A, and later on resume? Or does it make sense to do so? If not, what are the options to resolve this issue? Thanks Tianji

Re: Kafka Streams: lockException

2017-03-20 Thread Tianji Li
evicted because max.poll.interval exceeds > > the > > > set limit. > > > > > > Try running rocksdb in memory https://github.com/facebook/ > > > rocksdb/wiki/How-to-persist-in-memory-RocksDB-database%3F. > > > > > > Thanks > >

Re: Kafka Streams: lockException

2017-03-17 Thread Tianji Li
to increase > max.poll.interval config value. > If that doesn't work, could you revert to using one thread for now. Also > let us know either way since we might need to open a bug report. > > Thanks > Eno > > > On 16 Mar 2017, at 20:47, Tianji Li wrote: > >

Kafka Streams: lockException

2017-03-16 Thread Tianji Li
Hi there, I always got this crashes and wonder if anyone knows why. Please let me know what information I should provide to help with trouble shooting. I am using 0.10.2.0. My application is reading one topic and then groupBy().aggregate() 50 times on different keys. I use memory store, without

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

2017-03-16 Thread Tianji Li
ummary, I'd recommend you use RocksDb as is since 7 vs 5 is a > reasonable difference. > > However, the real performance will be when you actually enable logging, > right? You might want RocksDb to be backed to Kafka for fault tolerance. > > Finally, make sure to use 0.10.2, the l

Re: restart Kafka Streams application takes around 5 minutes

2017-03-16 Thread Tianji Li
k that > has largely reduced the rebalance latency so I'd recommend try using a > build from Kafka trunk for testing if possible. > > > Guozhang > > On Wed, Mar 15, 2017 at 10:46 AM, Tianji Li wrote: > > > It seems independent to the rocksdb sizes. It also took 5

Re: Kafka Stream: RocksDBKeyValueStoreSupplier performance

2017-03-15 Thread Tianji Li
ores.create(storeName) > >.withKeys(stringSerde) > >.withValues(avroSerde) > >.persistent() > >.disableLogging() > >.build(); > > > Thanks > Eno > > > > On 15 Mar 2017, at 13:02, Tianji Li wrote: > > >

Re: restart Kafka Streams application takes around 5 minutes

2017-03-15 Thread Tianji Li
May be partitioning the > source topic, increasing the number of threads/instances processing the > source and reducing the time window of aggregation can help in reducing the > startup time. > > > > On Wed, Mar 15, 2017 at 6:36 PM, Tianji Li wrote: > > > Hi there, > > &

restart Kafka Streams application takes around 5 minutes

2017-03-15 Thread Tianji Li
Hi there, In the experiments I am doing now, if I restart the streams application, I have to wait for around 5 minutes for some reason. I can see something in the Kafka logs: [2017-03-15 08:36:18,118] INFO [GroupCoordinator 0]: Preparing to restabilize group xxx-test25 with old generation 2 (kaf

Kafka Stream: RocksDBKeyValueStoreSupplier performance

2017-03-15 Thread Tianji Li
Hi there, It seems that the RocksDB state store is quite slow in my case and I wonder if I did anything wrong. I have a topic, that I groupBy() and then aggregate() 50 times. That is, I will create 50 results topics and a lot more changelog and repartition topics. There are a few things that are

Re: Kafka Streams vs Spark Streaming

2017-02-28 Thread Tianji Li
Hi Guys, Thanks very much for your help. A final question, is it possible to use different commit intervals for state-store change-logs topics and for sink topics? Thanks Tianji

Re: Kafka Streams vs Spark Streaming

2017-02-27 Thread Tianji Li
, are you doing 100 > > different aggregation against record key ? Then you should have a single > > data object for those 100 values, anyway it sounds like we have similar > > problem .. > > > > -Kohki > > > > > > > > > > >

Re: Kafka Streams vs Spark Streaming

2017-02-25 Thread Tianji Li
a lots of tiny> > > information and I have a big concern about the size of the state store /> > > topic, so my decision is that I'm going with my own handling of Kafka API> > > ..> > >> > > If you do stateless operation and don't have a spark cluster, yea

Kafka Streams vs Spark Streaming

2017-02-24 Thread Tianji Li
Hi there, Can anyone give a good explanation in what cases Kafka Streams is preferred, and in what cases Sparking Streaming is better? Thanks Tianji

Is there a list of companies using Kafka Streams?

2017-02-24 Thread Tianji Li
Just curious... Thanks Tianji