I'm not familiar with Kafka Streams API, but I guess it's possible to use it since the data need to be consumed from some source, processed, and the results should be produced into some other destination.
The main point here, that you would need to specify which source partitions should be used by a particular stream processor/consumer and to which a single destination partition the result should be produced. The logic of the processor can be either read all data from assigned partitions and sort them or, maybe, read data by portions (I think the best is to read data by setting offset to the same timestamp) and then produce sorted result to the destination topic. As for the merging everything into a single destination partition, it's just a particular case when you would have only one large group. On Thu, Jan 16, 2020 at 10:38 PM Debraj Manna <subharaj.ma...@gmail.com> wrote: > Thanks Daniyar for replying. > > Do kafka streams have any apis to do the partitioning and grouping that you > are suggesting? > > Also if I have to merge everything into a single partition what should be > the efficient way to do this? > > On Fri, Jan 17, 2020 at 6:03 AM Daniyar Kulakhmetov < > dkulakhme...@liftoff.io> > wrote: > > > Since you not going to merge everything into one partition, you don't > need > > to sort all messages across all partitions (because messages are sorted > > only within partition). > > I'd suggest splitting X partitions to Y groups and then merge source > > partitions within each group into their destination partition. > > > > > > On Thu, Jan 16, 2020 at 10:20 AM Debraj Manna <subharaj.ma...@gmail.com> > > wrote: > > > > > Just to add when this operation will be going on no new data will be > > added > > > to original Kafka topic. I am trying to avoid buffering all data to a > > > temporary datastore to sort. > > > > > > On Thu, 16 Jan 2020, 23:14 Debraj Manna, <subharaj.ma...@gmail.com> > > wrote: > > > > > > > Hi > > > > > > > > I have a Kafka topic with X partitions. Each message has a timestamp, > > ts. > > > > Can someone suggest me some way of sorting all the messages (based on > > ts) > > > > across all partitions and putting it in a new topic with Y partitions > > (Y > > > < > > > > X ) using Kafka java client? > > > > > > > > Thanks > > > > > > > > > > > > > >