Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-19 Thread Guozhang Wang
Regarding KAFKA-4468, as discussed on the JIRA we intentionally did not write the end-timestamp to RocksDB for storage optimization, i.e. we will still write the combo of window-start-time and key, that is because for TimeWindow the window length is fixed and accessible in the Windows object, so we

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-17 Thread Matthias J. Sax
With regard to the JIRA. I guess we do not want to put the end timestamp into the key. For general usage, windows of different type are written into different topics. Thus, Nicolas' use case is quite special and using custom Serde is the better approach to handle it, instead of changing Kafka Stre

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-17 Thread Eno Thereska
For changes that may be backwards incompatible or change the APIs we usually do a short KIP first (e.g., I just did one yesterday: https://cwiki.apache.org/confluence/display/KAFKA/KIP-114%3A+KTable+materialization+and+improved+semantics

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Nicolas Fouché
In the case of KAFKA-4468, it's more about state stores. But still, keys would not be backward compatible. What is the "official" policy about this kind of change ? 2017-01-16 23:47 GMT+01:00 Nicolas Fouché : > Hi Eno, > I thought it would be impossible to put this in Kafka because of backward >

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Nicolas Fouché
Hi Eno, I thought it would be impossible to put this in Kafka because of backward incompatibility with the existing windowed keys, no ? In my case, I had to recreate a new output topic, reset the topology, and and reprocess all my data. 2017-01-16 23:05 GMT+01:00 Eno Thereska : > Nicolas, > > I'm

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Eno Thereska
Nicolas, I'm checking with Bill who originally was interested in KAFKA-4468. If he isn't actively working on it, why don't you give it a go and create a pull request (PR) for it? That way your contribution is properly acknowledged etc. We can help you through with that. Thanks Eno > On 16 Jan

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Nicolas Fouché
My current implementation: https://gist.github.com/nfo/eaf350afb5667a3516593da4d48e757a . I just appended the window `end` at the end of the byte array. Comments and suggestions are welcome ! 2017-01-16 15:48 GMT+01:00 Nicolas Fouché : > Hi Damian, > > I recall now that I copied the `WindowedSer

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Nicolas Fouché
Hi Damian, I recall now that I copied the `WindowedSerde` class [1] from Confluent examples by Confluent, which uses the internal `WindowedSerializer` class. Better write my own Serde them. You're right, I should not rely on internal classes, especially for data written outside Kafka Streams topol

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Damian Guy
Hi Nicolas, I guess you are using the Processor API for your topology? The WindowedSerializer is an internal class that is used as part of the DSL. In the DSL a topic will be created for each window operation, so we don't need the end time as it can be calculated from the window size. However, the

Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

2017-01-16 Thread Nicolas Fouché
Hi, In the same topology, I generate aggregates with 1-day windows and 1-week windows and write them in one single topic. On Mondays, these windows have the same start time. The effect: these aggregates overrides each other. That happens because WindowedSerializer [1] only serializes the window s