Hi Yan,

We use Samza in a production environment using ProcessJobFactory in Docker containers because it greatly simplifies our deployment process and makes much better use of resources.

Is there any plan to make the ThreadJobFactory or ProcessJobFactory multithreaded? I will look into doing that myself, but I think it might be useful to implement this for everyone. I am sure there are plenty of cases where people do not want to use YARN, but want more parallelism in their tasks.

Lukas

-----Original Message----- From: Yan Fang
Sent: Monday, September 14, 2015 11:08 AM
To: dev@samza.apache.org
Subject: Re: Runtime Execution Model

Hi Bruno,

AFAIK, there is no existing JobFactory that brings as many threads as the
partition number. But I think nothing stops you to implement this: you can
get the partition information from the JobCoordinator, and then bring as
many threads as the partition/task number.

Since the two local factories (ThreadJobFactory and ProcessJobFactory) are
mainly for development, there is no additional document. But most of the
code here
<https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local>
is
self-explained.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci <bruno.bona...@gmail.com>
wrote:

Hi,
I'm looking for additional documentation on the different RUNTIME
EXECUTION MODELS of the different `job.factory.class`.

I'm particularly interested on how each factory (ThreadJobFactory,
ProcessJobFactory and YarnJobFactory) will create tasks consume and process
messages out of Kafka and the thread model used.

I did a few tests with the ThreadJob factory consuming out of a kafka
topic with 5 partitions and I was expecting that it would use multiple
threads to consume/process the different partitions, however it is
using only one thread at runtime.

Is there any way to tell Samza to use multiple processing threads (1 per
partition)??


Thanks
Bruno


Reply via email to