Hi Matthias,

First, I will answer to your last question.

The main reason to have both TaskState#assignment and
TaskState#consumedOffsetsByPartition is that tasks have no consumed offsets
until at least one message is consumed for each partition even if previous
offsets exist for the consumer group.
So yes this methods are redundant as it only diverge at application startup.

About the use case, currently we are developping for a customer a little
framework based on KafkaStreams which transform/denormalize data before
ingesting into hadoop.

We have a cluster of workers (SpringBoot) which instantiate KStreams
topologies dynamicaly based on dataflow configurations.
Each configuration describes a topic to consume and how to process messages
(this looks like NiFi processors API).

Our architecture is inspired from KafkaConnect. We have topics for configs
and states which are consumed by each workers (actually we have reused some
internals classes to the connect API).

Now, we would like to develop UIs to visualize topics and partitions
consumed by our worker applications.

Also, I think it would be nice to be able,  in the futur, to develop web
UIs similar to Spark but for KafkaStreams to visualize DAGs...so maybe this
KIP is just a first step.

Thanks,

2017-03-01 22:52 GMT+01:00 Matthias J. Sax <matth...@confluent.io>:

> Thanks for the KIP.
>
> I am wondering a little bit, why you need to expose this information.
> Can you describe some use cases?
>
> Would it be worth to unify this new API with KafkaStreams#state() to get
> the overall state of an application without the need to call two
> different methods? Not sure how this unified API might look like though.
>
>
> One minor comment about the API: TaskState#assignment seems to be
> redundant. It should be the same as
> TaskState#consumedOffsetsByPartition.keySet()
>
> Or do I miss something?
>
>
> -Matthias
>
> On 3/1/17 5:19 AM, Florian Hussonnois wrote:
> > Hi Eno,
> >
> > Yes, but the state() method only returns the global state of the
> > KafkaStream application (ie: CREATED, RUNNING, REBALANCING,
> > PENDING_SHUTDOWN, NOT_RUNNING).
> >
> > An alternative to this KIP would be to change this method to return more
> > information instead of adding a new method.
> >
> > 2017-03-01 13:46 GMT+01:00 Eno Thereska <eno.there...@gmail.com>:
> >
> >> Thanks Florian,
> >>
> >> Have you had a chance to look at the new state methods in 0.10.2, e.g.,
> >> KafkaStreams.state()?
> >>
> >> Thanks
> >> Eno
> >>> On 1 Mar 2017, at 11:54, Florian Hussonnois <fhussonn...@gmail.com>
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I have just created KIP-130 to add a new method to the KafkaStreams API
> >> in
> >>> order to expose the states of threads and active tasks.
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> >> 130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Florian HUSSONNOIS
> >>
> >>
> >
> >
>
>


-- 
Florian HUSSONNOIS

Reply via email to