Hi Aram, I assume that based on the message fields, you would want to output to Cassandra, Graphite etc.
A single samza job is an implementation of the StreamTask/WindowableTask interface. Samza will create multiple instances of your implementation and assign it to containers. Having a single samza job vs multiple samza jobs - If you have multiple jobs, they can be stopped, started, managed, maintained independently. It's still possible to do this in a single job and you can scale out using multiple containers. Sync vs Async producers: It entirely depends on how you implement your producer. Do you care about ordering? ie, within the same partition, do you want to preserve ordering in your writes to Cassandra/Graphite? An instance of your producer is shared across all task-instances in the container. Having multiple producers for multiple systems seems cleaner to me since the system characteristics are different. Thanks, Jagadish On Mon, Nov 16, 2015 at 3:36 AM, Aram Mkrtchyan < aram.mkrtch...@picsart.com.invalid> wrote: > Hi guys, > > We're processing JSON data from Kafka using Samza, and we'd like to have a > single Samza Job that's able to process and produce the messages to > different systems. > > For example, consume messages from kafka, and produce them to Cassandra, > Graphite and other systems, so that the messages are being consumed once. > We want this because the tasks themselves are very simple, and we don't > want to have separate samza jobs for them. > > We'd like someone to compare possible approaches. > > 1. Having Multiple producer systems for one task. > 2. Having Single producer which has registry of small message handlers, > which process messages (synchronous/asynchronous)? > 3. Having Multiple Jobs is the only valid way of doing it. > > Thanks. > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University