Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

Damian Guy Mon, 16 Jan 2017 03:32:35 -0800

Hi Nicolas,

I guess you are using the Processor API for your topology? The
WindowedSerializer is an internal class that is used as part of the DSL. In
the DSL a topic will be created for each window operation, so we don't need
the end time as it can be calculated from the window size.
However, there is an open jira for this:
https://issues.apache.org/jira/browse/KAFKA-4468


Thanks,
Damian

On Mon, 16 Jan 2017 at 11:18 Nicolas Fouché <nfou...@onfocus.io> wrote:

> Hi,
>
> In the same topology, I generate aggregates with 1-day windows and 1-week
> windows and write them in one single topic. On Mondays, these windows have
> the same start time. The effect: these aggregates overrides each other.
>
> That happens because WindowedSerializer [1] only serializes the window
> start time. I'm a bit surprised, a window has by definition a start and an
> end. I suppose one wanted save on key sizes ? And/or one would consider
> that topics should not contain aggregates with different granularities ?
>
> I have two choices then, either create as many output topics as I have
> granularities, or create my own serializer which also includes the window
> end time. What would the community recommend ?
>
> Getting back to the core problem:
> I could understand that it's not "right" to store different granularities
> in one topic, and I thought it would save resources (less topic to manage
> by Kafka). But, I'm really not sure about this default serializer: it does
> not serialize all instance variables of the `Window` class, and more
> generally does comply to the definition of a window.
>
> [1]
>
> https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/WindowedSerializer.java
>
> Thanks.
> Nicolas
>

Re: Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)

Reply via email to