Kartik, This is for the case when you don't use YARN. ThreadJob runs locally and simply spins up a single thread for all tasks right now.
Lukas On 10/20/15, Kartik Paramasivam <kparamasi...@linkedin.com.invalid> wrote: > We have been wanting to do something similar at LinkedIn. We however > haven't thought through the details. > > if container == thread.. then we would need to change the AppMaster to > request the appropriate number of Yarn 'containers' (processes) .. i.e. we > would have to decouple the process count from the yarn.Containers.Count .. > > Basically wouldn't we have to come up with a new setting Yarn.ProcessCount > ? > > On Mon, Oct 19, 2015 at 3:49 PM, Lukas Steiblys <lu...@doubledutch.me> > wrote: > >> I have been thinking lately about the most non-invasive way to add >> multithreading capabilities to ThreadJobFactory, as that is the main >> method >> we run our jobs in production. Looking at the master branch code in Git, >> I >> have found the following: >> a.. The best way would be to simply spin up a new thread for each >> container. >> b.. The number of containers can already be specified using the >> configuration property job.container.count. >> c.. I can construct a new SamzaContainer for each containerModel >> returned from coordinator.jobModel.getContainers in ThreadJobFactory. >> d.. I can pass a list of these containers into ThreadJob constructor >> modifying it to accept an array of Runnables. >> e.. For each runnable, it would create a new thread and start it in the >> submit method of ThreadJob. >> This should start up a new thread for each container and group the tasks >> using the appropriate TaskNameGrouper. >> >> Any ideas on what I might have missed? Are there any other potential >> solutions? Would this be a good patch for Samza in general? >> >> Lukas >> >