Re: Kafka Indexing Service - Decoupling segments from consumer tasks

Gian Merlino Wed, 02 May 2018 18:32:21 -0700

Hey Dylan,

Great to hear that your experience has generally been positive!


What do you think about using compaction for this? (The feature added in
https://github.com/druid-io/druid/pull/5102.) The idea with compaction was
that it would enable a background process that goes through freshly
inserted segments and re-partitions them optimally.

For creating multiple datasources out of one topic, there is a PR wending
its way through review right now that is relevant: https://github.com/
druid-io/druid/pull/5556.

On Wed, May 2, 2018 at 12:46 PM, Dylan Wylie <dylanwy...@gmail.com> wrote:

> Hey there,
>
> With the recent improvements to the Kafka Indexing Service we've been
> migrating over from Tranquility and have had a very positive experience.
>
> However one of the downsides to using the KIS, is that the number of
> segments generated for each period can't be smaller than the number of
> tasks required to consume the queue. So if you have a use case involving
> ingesting from a topic with a high rate of large messages but your spec
> only extracts a small proportion of fields you may be forced to run a large
> number of tasks that generate very small segments.
>
> This email is to check in for peoples thoughts on separating consuming and
> parsing messages from indexing and segment management, in a similar fashion
> to how Tranquility operates.
>
> Potentially - we could have the supervisor spawn two types of task that can
> be configured independently, a consumer and an appender. The consumer would
> parse the message based on the spec and then pass the results to the
> appropriate appender task which builds the segment. Another advantage to
> this approach is that it would allow creating multiple datasources from a
> single consumer group rather than ingesting the same topic multiple times.
>
> I'm quite new to the codebase so all thoughts and comments are welcome!
>
> Best regards,
> Dylan
>

Re: Kafka Indexing Service - Decoupling segments from consumer tasks

Reply via email to