Matthias,

Thanks again for the detailed explanation.

My customer doesn't want this information, but I do.  They want simple
aggregations (Tell me how many events happened on this stream between time
0-30, 30-60, 60-90, etc).  Sometimes the aggregations don't fire at the
time I would expect.  This is where I want to interrogate the stream prior
to the aggregation so I can see the timestamps of the partitions feeding
the aggregation, and maybe that will help me pinpoint that I have a
topic/partition that isn't being used(maybe due to a bad partitioning
scheme on my part?).

Does that make sense?

Dan

On Mon, Apr 2, 2018 at 8:27 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> Atm, Kafka does not ship any ready-to-use transformers. All operators
> that are ready-to-use are provided via the DSL only and their
> implementation itself is not public API.
>
> The question arises though, if a transformer like this would be the best
> way to expose this information or if there might be a better way if we
> include something like this in the library directly.
>
> I guess it also depends on the use case in detail. Can you elaborate why
> your customer wants this information? If it's something that is
> generally useful/applicable, we are of course interested.
>
> We are always open for suggestions and happy if people contribute. If
> you are interested, please subscribe to the dev list and we can discuss
> there. Or just create a JIRA with a feature request as starting point.
> Whatever suits you better.
>
>
> -Matthias
>
>
> On 4/2/18 3:00 PM, dan bress wrote:
> > Hi Matthias,
> >
> > Yes, I'm aware of Wall-clock time punctuation, but that is not the
> behavior
> > my customers want, so I can't use that unfortunately...
> >
> > OK.  I'm going to try and write a utility "StreamTimeInspector"
> transformer
> > that will do as you suggest and give me the stats the I want.  If we find
> > this useful at my company I'll consider submitting it as a PR.
> >
> > Is there any current precedent around transformers like this in
> > KafkaStreams?  Transformers you insert for interrogation only?
> >
> > Dan
> >
> > On Mon, Apr 2, 2018 at 2:10 PM Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> >> Dan,
> >>
> >> Kafka Streams supports two types of punctuation: event-time and
> >> wall-clock time punctuation. Wall-clock time punctuation will be called
> >> even if event-time does not progress and even if there is no new input
> >> data available for processing. Not sure if this is what you are looking
> >> for.
> >>
> >> Beside this, Processor API exposes record metadata information. Thus,
> >> you can plugin a custom `transform()` step to access and maintain the
> >> information/stats you are interested in (The provided `context` object
> >> from `init()` is your friend for this.) and maybe make it queryable via
> >> Interactive Queries?
> >>
> >> Hope this helps.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/2/18 11:26 AM, dan bress wrote:
> >>> I have an app that is consuming multiple topics, and punctuating on
> >> stream
> >>> time.  I know that punctuate is driven by the min time of all the
> >>> partitions of all the topics driving the transformer that I am
> >> punctuating
> >>> on.  When I deploy my app and punctuate is not called as I expect, what
> >>> tools do I have to understand where time is per
> >>> instance/thread/topic/partition?  Does Kafka Streams expose stats for
> >> this?
> >>>
> >>> I would like something like a stat of something like:
> >>>
> >>> Kafka Streams Instance / Kafka Stream Thread / Topic / Partition /
> >>> EventTime = 1522693495291
> >>>
> >>> Then I can look at which topic/partition is lagging to debug my
> problem.
> >>>
> >>> Does something like this exist?  Is this a reasonable feature request?
> >>>
> >>> Thanks!
> >>> Dan
> >>>
> >>
> >>
> >
>
>

Reply via email to