Hey Dylan, Great to hear that your experience has generally been positive!
What do you think about using compaction for this? (The feature added in https://github.com/druid-io/druid/pull/5102.) The idea with compaction was that it would enable a background process that goes through freshly inserted segments and re-partitions them optimally. For creating multiple datasources out of one topic, there is a PR wending its way through review right now that is relevant: https://github.com/ druid-io/druid/pull/5556. On Wed, May 2, 2018 at 12:46 PM, Dylan Wylie <dylanwy...@gmail.com> wrote: > Hey there, > > With the recent improvements to the Kafka Indexing Service we've been > migrating over from Tranquility and have had a very positive experience. > > However one of the downsides to using the KIS, is that the number of > segments generated for each period can't be smaller than the number of > tasks required to consume the queue. So if you have a use case involving > ingesting from a topic with a high rate of large messages but your spec > only extracts a small proportion of fields you may be forced to run a large > number of tasks that generate very small segments. > > This email is to check in for peoples thoughts on separating consuming and > parsing messages from indexing and segment management, in a similar fashion > to how Tranquility operates. > > Potentially - we could have the supervisor spawn two types of task that can > be configured independently, a consumer and an appender. The consumer would > parse the message based on the spec and then pass the results to the > appropriate appender task which builds the segment. Another advantage to > this approach is that it would allow creating multiple datasources from a > single consumer group rather than ingesting the same topic multiple times. > > I'm quite new to the codebase so all thoughts and comments are welcome! > > Best regards, > Dylan >