Re: Performance of producer sending to many vs to few topics

2016-07-20 Thread Krzysztof Nawara
Hello, No, in all test I used two partitions per topic, one per broker. The only variable in the test was number of topics. From what I've read what really affects performance is total number of partitions per broker (probably including replicas), so 1 topic with 1000 partitions should pretty much

Re: Questions relating KStream-KTable join with Kafka-Streams

2016-07-20 Thread Nicolas PHUNG
Hi, Thank you for your answer @Matthias. Indeed, I need a kind of symmetric join. However, KStream-KStream join doesn't match with my use case: I will need to generate the events in the changelog (e.g a campaign marketing with a certain Id/Key) because they live only for the join in a defined wind

Re: Questions relating KStream-KTable join with Kafka-Streams

2016-07-20 Thread Nicolas PHUNG
@Guozhang Ok I've tried and it doesn't have the expected behavior. For KStream-KStream join, there's the issue to have to produce the same changelog record to be able to join within the windows. And for KStream-KTable, an update/insert in the changelog record doesn't trigger join missed that was in

Kafka Streaming Question : reset offset

2016-07-20 Thread Pariksheet Barapatre
Hi Experts, I have written Kafka Streaming app that just filters rows based on some condition and load it to MongoDB. The streaming process is working fine but due to some flaw in my code, I want to reprocess whole data again. One way is to do this - kill streaming app , change consumer group i

Re: Kafka Streaming Question : reset offset

2016-07-20 Thread Matthias J. Sax
Hi Pari, currently, changing the application ID is the best way to follow. Cleaning up the application state correctly, is a little bit tricky. We are currently working on an improvement for this -- should be available soon. See https://issues.apache.org/jira/browse/KAFKA-3185 -Matthias On 07/

Re: Kafka Streaming Question : reset offset

2016-07-20 Thread Pariksheet Barapatre
Many Thanks Matthias for update. Regards Pari On 20 July 2016 at 17:48, Matthias J. Sax wrote: > Hi Pari, > > currently, changing the application ID is the best way to follow. > Cleaning up the application state correctly, is a little bit tricky. We > are currently working on an improvement for

Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Mathieu Fenniak
Hello Kafka users, I'm seeing some unexpected results when using Kafka Streams, and I was hoping someone could explain them to me. I have two streams, which I've converted KStream->KTable, and then I am joining them together with a "join" (not an outer join, not a full join). With the resulting

release of 0.10.1

2016-07-20 Thread David Garcia
Does anyone know when this release will be cut? -David

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Matthias J. Sax
Hi Mathieu, join semantics are tricky. We are still working on a better documentation for it... For the current state and your question: Each time a record is processed, it looks up the other KTable to see if there is a matching record. If non is found, the join result is empty and a tombstone r

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Mathieu Fenniak
Hm... OK, I think that makes sense. It seems like I can't filter out those tombstone records; is that expected as well? If I throw in a .filter operation before my foreach, its Predicate is not invoked, and the foreach's ForeachAction is invoked with a null value still. Mathieu On Wed, Jul 20,

Re: __consumer_offsets rebalance

2016-07-20 Thread Anderson Goulart
Hi, How can I see if log compaction is enabled? And how can I enable it? I didn't find it on kafka docs. Thanks, Anderson On 14/07/2016 13:37, Todd Palino wrote: It's safe to move the partitions of the offsets topic around. You'll move the consumer coordinators as you do, however, so the o

Re: Regarding kafka partition and replication

2016-07-20 Thread Amit K
Thanks for reply. The infrastructure, which I mentioned, is already in place and is getting used so can not alter it. But given that, if I have 9 partitions with replication factor of 2, will that help me to have a good fault tolerant and optimal, in regard with hardware use, system? Thanks, Ami

RE: Regarding kafka partition and replication

2016-07-20 Thread Tauzell, Dave
For fault -tolerance you want the replicated partitions to be on different physical servers. I think a better set is to have 3 brokers, 9 partitions and a replication factor of 2. This will ensure that the replicated data is always on a different physical hosts. -Dave Dave Tauzell | Senior S

Re: Questions relating KStream-KTable join with Kafka-Streams

2016-07-20 Thread Guozhang Wang
Hi Nicolas, For KStream-KTable join, if a record coming from the KStream did not find the matching record from the other materialized KTable, then the join result is lost since as you noted, even if the changelog record arrives to KTable later, it will not trigger a join as the KStream is not mate

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Guozhang Wang
Hi Mathieu, As Matthias said, we are working on improving the current join semantics: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=63407287 and will keep you updated. As for KTable.filter(), I think it can actually achieve want you want: not forwarding nulls to the downstre

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Mathieu Fenniak
Hi Guozhang, Yes, I tried to apply the filter on the KTable that came from join, and then the foreach on the KTable that came from filter. I was still getting the nulls through to my foreach. It is easy to workaround, but, the behaviour was especially surprising when the filter didn't prevent it

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Guozhang Wang
Are you using the 0.10.0.0 release or from trunk? On Wed, Jul 20, 2016 at 10:58 AM, Mathieu Fenniak < mathieu.fenn...@replicon.com> wrote: > Hi Guozhang, > > Yes, I tried to apply the filter on the KTable that came from join, and > then the foreach on the KTable that came from filter. I was stil

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Matthias J. Sax
Try to do a .toStream().filter(...).foreach(...) -Matthias On 07/20/2016 08:11 PM, Guozhang Wang wrote: > Are you using the 0.10.0.0 release or from trunk? > > On Wed, Jul 20, 2016 at 10:58 AM, Mathieu Fenniak < > mathieu.fenn...@replicon.com> wrote: > >> Hi Guozhang, >> >> Yes, I tried to a

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Mathieu Fenniak
I'm using the 0.10.0.0 release. Matthias's suggestion of using .toStream().filter(...).foreach(...) does prevents the nulls from reaching the foreach. But .filter(...).foreach(...) does not; the filter's predicate is not even executed before the ForeachAction receives the null records. On Wed,

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Eno Thereska
Hi Michael, These are good questions and I can confirm that the system works in the way you hope it works, if you use the DSL and don't make up keys arbitrarily. In other words, there is nothing currently that prevents you from shooting yourself in the foot e.g., by making up keys and using the

Re: Kafka Streams: KTable join + filter + foreach

2016-07-20 Thread Guozhang Wang
I think this is a bug of 0.10.0 that we have already fixed in trunk some time ago. Guozhang On Wed, Jul 20, 2016 at 12:25 PM, Mathieu Fenniak < mathieu.fenn...@replicon.com> wrote: > I'm using the 0.10.0.0 release. > > Matthias's suggestion of using .toStream().filter(...).foreach(...) does > pr

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Guozhang Wang
Hi Michael, 1. Kafka Streams always tries to colocate the local stores with the processing nodes based on the partition key. For example, if you want to do an aggregation based on key K1, but the input topic is not keyed on K1 and hence not partitioned on that. The library then will auto-repartiti

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-20 Thread Michael Ruberry
Thank you both for your replies. This is incredibly helpful. Guozhang, in (2) above did you mean "some keys* may be* hashed to different partitions and the existing local state stores will not be valid?" That fits with out understanding. As to your caveats in (3) and (4), we are trying to be sure

Re: kafka-streams depends upon slf4j-log4j12

2016-07-20 Thread Guozhang Wang
Thanks! On Tue, Jul 19, 2016 at 3:14 PM, Mathieu Fenniak < mathieu.fenn...@replicon.com> wrote: > Thanks for the confirmation Guozhang. I've submitted a PR along these > lines. https://github.com/apache/kafka/pull/1639 > > > On Tue, Jul 19, 2016 at 3:50 PM, Guozhang Wang wrote: > > > This is a

KafkaConsumer position block

2016-07-20 Thread yuanjia8...@163.com
Hi, With kafka-clients-0.10.0.0, I use KafkaConsumer.position() to get the offset, the process block in ConsumerNetworkClient.awaitMetadataUpdate. Block until the meadata has been refreshed. My questions are: 1. Why the metadata not refresh? 2. Could it use timeout or throw ex