Re: Kafka Streams Failed to rebalance error

2017-06-09 Thread João Peixoto
u > describe and that is confirmed. The reason we increased " > max.poll.interval.ms" to basically infinite is to just avoid this problem. > > Eno > > On 9 Jun 2017, at 07:40, João Peixoto wrote: > > > > I am now able to consistently reproduce this issue w

Re: Kafka Streams Failed to rebalance error

2017-06-09 Thread João Peixoto
To help out I made the project that reproduces this issue publicly available at https://github.com/Hartimer/kafka-stream-issue On Thu, Jun 8, 2017 at 11:40 PM João Peixoto wrote: > I am now able to consistently reproduce this issue with a dummy project. > > 1. Set "max.poll.int

Re: Kafka Streams Failed to rebalance error

2017-06-08 Thread João Peixoto
y IDE on a sink "foreach" step and then proceeding after the above interval had elapsed. Any advice on how to work around this using 0.10.2.1 would be greatly appreciated. Hope it helps On Wed, Jun 7, 2017 at 10:19 PM João Peixoto wrote: > But my stream definition does not ha

Re: Kafka Streams Failed to rebalance error

2017-06-07 Thread João Peixoto
having one thread per > partition is ideal. > > Thanks > Sachin > > > On Thu, Jun 8, 2017 at 9:58 AM, João Peixoto > wrote: > > > There is one instance with 10 threads. > > > > On Wed, Jun 7, 2017 at 9:07 PM Guozhang Wang wrote: > > >

Re: Kafka Streams Failed to rebalance error

2017-06-07 Thread João Peixoto
There is one instance with 10 threads. On Wed, Jun 7, 2017 at 9:07 PM Guozhang Wang wrote: > João, > > Do you also have multiple running instances in parallel, and how many > threads are your running within each instance? > > Guozhang > > > On Wed, Jun 7, 2017 at 3:1

Re: Kafka Streams Failed to rebalance error

2017-06-07 Thread João Peixoto
progress On Wed, Jun 7, 2017 at 2:24 PM Eno Thereska wrote: > Hi there, > > This might be a bug, would you mind opening a JIRA (copy-pasting below is > sufficient). > > Thanks > Eno > > On 7 Jun 2017, at 21:38, João Peixoto wrote: > > > > I'm using Kafka

Re: Kafka Streams Failed to rebalance error

2017-06-07 Thread João Peixoto
I'm using Kafka Streams 0.10.2.1 and I still see this error 2017-06-07 20:28:37.211 WARN 73 --- [ StreamThread-1] o.a.k.s.p.internals.StreamThread : Could not create task 0_31. Will retry. org.apache.kafka.streams.errors.LockException: task [0_31] Failed to lock the state directory for t

Re: Partitions as mechanism to keep multitenant segregated data

2017-05-23 Thread João Peixoto
It seems like you're trying to use the partitioning mechanism as a routing mechanism, which afaik is not really its objective. It may work but it is definitely not the best approach imo. 1. You're throwing away the parallelism capabilities of Kafka. You'll have a single "queue" per customer. By th

Re: Can't re-process topic

2017-05-17 Thread João Peixoto
I'm not too familiar with Spark but the "earliest"/"latest" configuration is only relevant if your consumer does not hold a valid offset. If you read up to offset N, when you restart you'll start from N. If you start a new consumer then it has no offset, that's when the above configuration takes e

Re: What purpose serves the repartition topic?

2017-05-17 Thread João Peixoto
`groupByKey()` knows that data is already partitioned correctly. > > > > -Matthias > > On 5/16/17 5:40 PM, João Peixoto wrote: > > Your explanation makes sense. If I understood correctly it means that one > > stream thread can actually generate records that will be aggregated

Re: What purpose serves the repartition topic?

2017-05-16 Thread João Peixoto
y must > be modified before a key-based operation and user wants to suppress > re-partitioning as she knows that partitioning is preserved (cf. > https://issues.apache.org/jira/browse/KAFKA-4835). This is currently not > supported at DSL level. However, you could fall back to Processor API if

What purpose serves the repartition topic?

2017-05-16 Thread João Peixoto
Certain operations require a repartition topic, such as "selectKey" or "map". What purpose serves this repartition topic? Sample record: {"key": "a", ...} Stream: source.selectKey((k, v) -> KeyValue.pair(k.toUpperCase(), v)).groupByKey() //... >From my understanding, the repartition topic will g

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
gt; Yes, there could be room for optimizations, e.g., see this: > http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3cCAJikTEUHR=r0ika6vlf_y+qajxg8f_q19og_-s+q-gozpqb...@mail.gmail.com%3e > < > http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3CCAJikTEUH

Re: Can state stores function as a caching layer for persistent storage

2017-05-16 Thread João Peixoto
after such change we'd need to join on some other stream/table or cases where we sink to a topic. However, in cases where none of these things, the repartition topic does nothing? If this is true can we somehow not create it? On Sun, May 14, 2017 at 7:58 PM João Peixoto wrote: > Very useful

Re: Strict ordering of messages in Kafka

2017-05-15 Thread João Peixoto
Afaik that is not possible out of the box as that would require synchronization across multiple threads/instances. The throughput of such an approach would be terrible as the parallelism of KStreams is tightly coupled with the number of partitions. I'd say if you need such ordering you should reco

Re: Can state stores function as a caching layer for persistent storage

2017-05-14 Thread João Peixoto
treamsInternalDataManagement-Commits > > This might also help in case you want to dig into the code: > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture > > > -Matthias > > On 5/14/17 4:07 PM, João Peixoto wrote: > > I think I now underst

Re: Can state stores function as a caching layer for persistent storage

2017-05-14 Thread João Peixoto
27;d appreciate if someone could point me to the documentation or code that backs up the above. On Sat, May 13, 2017 at 3:11 PM João Peixoto wrote: > Replies in line as well > > > On Sat, May 13, 2017 at 3:25 AM Eno Thereska > wrote: > >> Hi João, >> >> Som

Re: Can state stores function as a caching layer for persistent storage

2017-05-13 Thread João Peixoto
Replies in line as well On Sat, May 13, 2017 at 3:25 AM Eno Thereska wrote: > Hi João, > > Some answers inline: > > > On 12 May 2017, at 18:27, João Peixoto wrote: > > > > Thanks for the comments, here are some clarifications: > > > > I did look at

Re: Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
es sequentially. > (3) Yes. There will be a store for each partition. (If store is local.) > (4) Yes. The overall processing loop is sequential (cf. > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Architecture > ) > Also, the next commit point is computed after a su

Can state stores function as a caching layer for persistent storage

2017-05-12 Thread João Peixoto
On a stream definition I perform an "aggregate" which is configured with a state store. *Goal*: Persist the aggregation results into a database, e.g. MySQL or MongoDB *Working approach*: I have created a custom StateStore backed by a changelog topic like the builtin state stores. Whenever the sto

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-09 Thread João Peixoto
his works as a workaround till they fix this in next release. > > > On Wed, May 10, 2017 at 12:05 AM, João Peixoto > wrote: > > > Guozhang thanks a lot for that info, that is exactly what I'm observing > it > > seems. > > > > I'll keep an eye out. >

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-09 Thread João Peixoto
e. So I'd suggest taking a look at the app's logs > and see if there are multiple rebalances triggered during the starting up, > and if yes the above fix may help the most. > > > Guozhang > > > On Mon, May 8, 2017 at 7:41 AM, João Peixoto > wrote: > > >

Re: Kafka Stream stops polling new messages

2017-05-09 Thread João Peixoto
val.ms` that we just > > increased to guard against failure for long stat recreation phases. > > > > Any thoughts? > > > > > > -Matthias > > > > > > On 5/3/17 8:48 AM, João Peixoto wrote: > > > That'd be great as I'm not fami

Re: Shouldn't the initializer of a stream aggregate accept the key?

2017-05-08 Thread João Peixoto
corrupt data > partitioning and thus would lead to wrong result. > > It's not possible to modify the key via return -- the returned value > will only be the initial value for the aggregation. > > > -Matthias > > On 5/4/17 6:00 PM, João Peixoto wrote: > >

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-08 Thread João Peixoto
show consumers and > >> offsets but after a short time will go back to rebalancing. > >> > >> How much storage does your Kafka-streams use? > >> Also, what is your k8s configuration? > >> Deployment? Deployment with emptyDir, hostPath or EBS? Statefu

Re: Large Kafka Streams deployment takes a long time to bootstrap

2017-05-05 Thread João Peixoto
] o.a.k.s.p.internals.StreamThread : stream-thread [StreamThread-4] Committing all tasks because the commit interval 1ms has elapsed On Fri, May 5, 2017 at 3:48 PM João Peixoto wrote: > Warning, long message > > *Problem*: Initializing a Kafka Stream is taking a loo

Large Kafka Streams deployment takes a long time to bootstrap

2017-05-05 Thread João Peixoto
Warning, long message *Problem*: Initializing a Kafka Stream is taking a lng time. Currently at the 40 minute mark *Setup*: 2 co-partition topics with 100 partitions. First topic contains a lot of messages in the order of hundreds of millions Second topic is a KTable and contains ~30k records

Re: Shouldn't the initializer of a stream aggregate accept the key?

2017-05-04 Thread João Peixoto
confluence/display/KAFKA/KIP-149%3A+Enabling+key+access+in+ValueTransformer%2C+ValueMapper%2C+and+ValueJoiner > > It might make sense to include the `Initializer` interface into KIP-149, > too. > > Atm, you would need to write some custom code to tackle this use case. > > -

Shouldn't the initializer of a stream aggregate accept the key?

2017-05-03 Thread João Peixoto
Looking at the aggregate documentation one of the required items is an "initializer", no arguments and returns a value. Shouldn't this initializer follow a similar approach of Java's computIfAbsent

Re: Kafka Stream stops polling new messages

2017-05-03 Thread João Peixoto
That'd be great as I'm not familiar with the protocol there On Wed, May 3, 2017 at 8:41 AM Eno Thereska wrote: > Cool, thanks, shall we open a JIRA? > > Eno > > On 3 May 2017, at 16:16, João Peixoto wrote: > > > > Actually I need to apologize, I pas

Re: Kafka Stream stops polling new messages

2017-05-03 Thread João Peixoto
t; below should suffice)? Alternatively let us know and we’ll open it. Sounds > like we should handle this better. > > Thanks, > Eno > > > > On May 3, 2017, at 5:49 AM, João Peixoto > wrote: > > > > I believe I found the root cause of my problem. I seem to h

Windowed aggregations memory requirements

2017-05-03 Thread João Peixoto
The base question I'm trying to answer is "how much memory does my instance need". Considering a use case where I want to keep a rolling average on a tumbling window of 1 minute size allowing for late arrivals up to 5 minutes (lower bound) we would have something like this: TimeWindows.of(TimeUni

Re: Kafka Stream stops polling new messages

2017-05-02 Thread João Peixoto
sted for production usage" so that's on me On Tue, May 2, 2017 at 11:20 AM Matthias J. Sax wrote: > Did you check the logs? Maybe you need to increase log level to DEBUG to > get some more information. > > Did you double check committed offsets via bin/kafka-consumer-groups

Re: Setting up Kafka & Kafka Streams for loading real-time and 'older' data concurrently

2017-05-02 Thread João Peixoto
Out of curiosity, would this mean that a state store for such a window could hold 90 days worth of data in memory? Or filesystem if we're talking about Rocksdb On Tue, May 2, 2017 at 10:08 AM Damian Guy wrote: > Hi Garret, > > No, log.retention.hours doesn't impact compacted topics. > > Thanks,

Kafka Stream stops polling new messages

2017-04-28 Thread João Peixoto
My stream gets stale after a while and it simply does not receive any new messages, aka does not poll. I'm using Kafka Streams 0.10.2.1 (same happens with 0.10.2.0) and the brokers are running 0.10.1.1. The stream state is RUNNING and there are no exceptions in the logs. Looking at the JMX metri