Hi Aram,

I assume that based on the message fields, you would want to output to
Cassandra, Graphite etc.

A single samza job is an implementation of the StreamTask/WindowableTask
interface. Samza will create multiple instances of your implementation and
assign it to containers.

Having a single samza job vs multiple samza jobs - If you have multiple
jobs, they can be stopped, started, managed, maintained independently. It's
still possible to do this in a single job and you can scale out using
multiple containers.

Sync vs Async producers: It entirely depends on how you implement your
producer. Do you care about ordering? ie, within the same partition, do you
want to preserve ordering in your writes to Cassandra/Graphite? An instance
of your producer is shared across all task-instances in the container.

Having multiple producers for multiple systems seems cleaner to me since
the system characteristics are different.

Thanks,
Jagadish





On Mon, Nov 16, 2015 at 3:36 AM, Aram Mkrtchyan <
aram.mkrtch...@picsart.com.invalid> wrote:

> Hi guys,
>
> We're processing JSON data from Kafka using Samza, and we'd like to have a
> single Samza Job that's able to process and produce the messages to
> different systems.
>
> For example, consume messages from kafka, and produce them to Cassandra,
> Graphite and other systems, so that the messages are being consumed once.
> We want this because the tasks themselves are very simple, and we don't
> want to have separate samza jobs for them.
>
> We'd like someone to compare possible approaches.
>
>    1. Having Multiple producer systems for one task.
>    2. Having Single producer which has registry of small message handlers,
>    which process messages (synchronous/asynchronous)?
>    3. Having Multiple Jobs is the only valid way of doing it.
>
> Thanks.
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Reply via email to