I have been thinking lately about the most non-invasive way to add multithreading capabilities to ThreadJobFactory, as that is the main method we run our jobs in production. Looking at the master branch code in Git, I have found the following: a.. The best way would be to simply spin up a new thread for each container. b.. The number of containers can already be specified using the configuration property job.container.count. c.. I can construct a new SamzaContainer for each containerModel returned from coordinator.jobModel.getContainers in ThreadJobFactory. d.. I can pass a list of these containers into ThreadJob constructor modifying it to accept an array of Runnables. e.. For each runnable, it would create a new thread and start it in the submit method of ThreadJob. This should start up a new thread for each container and group the tasks using the appropriate TaskNameGrouper.
Any ideas on what I might have missed? Are there any other potential solutions? Would this be a good patch for Samza in general? Lukas