Thanks, I'll check it out. I have a samza application that is consuming a lot of different types of messages (these messages are related to each other but do not require join - think of these like different configuration and metric information of virtual machines that modify some central sates like databases, timeseries stores etc). We have used a single KafkaTopic so far with partitions for parallelism.
Now, there is a message type (metrics) for which I want to perform larger "batching" for cost reasons. Hence I was exploring ways in which I can put those messages on a separate Kafka Topic but use the same samza application that we have been using so far, instead of creating a new one. There is some state (caches etc) that are shared between messages and hence it will be wasteful to launch an independent application. If I could control the checkpointing per topic independently, this approach could work. Please let me know if this sounds like a reasonable approach for this? On Sat, Oct 28, 2017 at 8:41 PM, Jagadish Venkatraman < jagadish1...@gmail.com> wrote: > In Samza, the logical unit of processing (and hence, checkpointing) is a > task. Hence, you cannot selectively checkpoint SSPs within a task. > > However, you can configure how you group your SSPs into tasks by choosing > a Grouper. If you want to control checkpointing at the granularity of an > SSP, then you can choose the org.apache.samza.container.grouper.stream. > GroupBySystemStreamPartitionFactory. > > Config reference: https://samza.apache.org/learn/documentation/0.10/jobs/ > configuration-table.html > > What are you trying to do? Maybe, there's a simpler way to achieve it? > > > > On Sat, Oct 28, 2017 at 4:09 AM, Gaurav Agarwal <gauravagarw...@gmail.com> > wrote: > >> Hi All, >> >> If I had Samza Tasks that were consuming message from multiple topics, >> how would checkpoint/commit work in that case? On calling >> taskCordinator.commit(), would current offset of all topics be saved for >> the caller task (only the partitions assigned to the caller task)? Is >> there a way to control this behavior more granularly where I can request >> samza to commit the offset for only a given task/topic combination only? >> >> -- >> thanks, >> gaurav >> > > > > -- > Jagadish V, > Graduate Student, > Department of Computer Science, > Stanford University >