Hi Yi, Does a single task consume from a single partition or it consumes from more/all partitions?
Thanks Bruno > On 14 Sep 2015, at 23:22, Yi Pan <nickpa...@gmail.com> wrote: > > Hi, Bruno, > > The number of containers are configurable in YarnJobFactory via > yarn.container.count. > Each container is a single threaded model and you can run multiple tasks in > a single container. > At maximum, you can have as many containers as the number of tasks in this > config to achieve 1 task / thread. > > Hope that clarifies the config a bit more for you. > > Thanks! > > -Yi > > On Mon, Sep 14, 2015 at 3:16 PM, Bruno Bonacci <bruno.bona...@gmail.com> > wrote: > >> Thanks Yan for writing me back, >> >> That's ok for ThreadJobFactory and ProcessJobFactory but what about the >> YarnJobFactory? >> How many task/executors will be spawning? >> >> >> Bruno >> >>> On Mon, Sep 14, 2015 at 7:08 PM, Yan Fang <yanfang...@gmail.com> wrote: >>> >>> Hi Bruno, >>> >>> AFAIK, there is no existing JobFactory that brings as many threads as the >>> partition number. But I think nothing stops you to implement this: you >> can >>> get the partition information from the JobCoordinator, and then bring as >>> many threads as the partition/task number. >>> >>> Since the two local factories (ThreadJobFactory and ProcessJobFactory) >> are >>> mainly for development, there is no additional document. But most of the >>> code here >>> < >> https://github.com/apache/samza/tree/master/samza-core/src/main/scala/org/apache/samza/job/local >>> is >>> self-explained. >>> >>> Thanks, >>> >>> Fang, Yan >>> yanfang...@gmail.com >>> >>> On Sat, Sep 12, 2015 at 1:47 PM, Bruno Bonacci <bruno.bona...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I'm looking for additional documentation on the different RUNTIME >>>> EXECUTION MODELS of the different `job.factory.class`. >>>> >>>> I'm particularly interested on how each factory (ThreadJobFactory, >>>> ProcessJobFactory and YarnJobFactory) will create tasks consume and >>> process >>>> messages out of Kafka and the thread model used. >>>> >>>> I did a few tests with the ThreadJob factory consuming out of a kafka >>>> topic with 5 partitions and I was expecting that it would use multiple >>>> threads to consume/process the different partitions, however it is >>>> using only one thread at runtime. >>>> >>>> Is there any way to tell Samza to use multiple processing threads (1 >> per >>>> partition)?? >>>> >>>> >>>> Thanks >>>> Bruno >>