Our current main purpose of samza is for data pipeline, so we don't want to create multiple tasks in the single SamzaContainer. As I read Samza implementation, it will create as many tasks as the number of partitions assigned in the container, right?
The problem of that approach is, each task will have a separate buffer, which is not necessary in our use case. So, I wrote the following SystemStreamPartitionGrouper public class SingleTaskSystemStreamPartition implements SystemStreamPartitionGrouper { @Override public Map<TaskName, Set<SystemStreamPartition>> group(Set<SystemStreamPartition> ssps) { return new ImmutableMap.Builder<TaskName, Set<SystemStreamPartition>>() .put(new TaskName(ssps.iterator().next().getStream()), ssps) .build(); } } The above worked with yarn.container.count=1 but if I increase that number, containers couldn't start because they couldn't get assigned SystemStreamPartition. Could you guide me how to write SystemStreamPartitionGrouper to create a single task per SamzaContainer? Thank you Best, Jae