1. a. I think startup is a public method on KafkaServer so for people embedding Kafka in some way this helps guarantee correctness. b. I think KafkaScheduler tries to be a bit too clever, there is a patch out there that just moves to global synchronization for the whole class which is easier to reason about. Technically startup is not called from multiple threads but the classes correctness should not depended on the current usage so it should work correctly if it were. c. I think in cases where you actually just want to start and run N threads, using Thread directly is sensible. ExecutorService is useful but does have a ton of gadgets and gizmos that obscure the basic usage in that case. d. Yeah we should probably wait until the processor threads start as well. I think it probably doesn't cause misbehavior as is, but it would be better if the postcondition of startup was that all threads had started.
2. a. There are different ways to do this. My overwhelming experience has been that any attempt to share a selector across threads is very painful. Making the selector loops single threaded just really really simplifies everything, but also the performance tends to be a lot better because there is far less locking inside that selector loop. b. Yeah I share you skepticism of that call. I'm not sure why it is there or if it is needed. I agree that wakeup should only be needed from other threads. It would be good to untangle that mystery. I wonder what happens if it is removed. -Jay On Wed, Jan 21, 2015 at 1:58 PM, Chittaranjan Hota <chitts.h...@gmail.com> wrote: > Hello, > Congratulations to the folks behind kafka. Its has been a smooth ride > dealing with multi TB data when the same set up in JMS fell apart often. > > Although I have been using kafka for more than a few days now, started > looking into the code base since yesterday and already have doubts at the > very beginning. Would need some inputs on why the implementation is done > the way it is. > > Version : 0.8.1 > > THREADING RELATED > 1. Why in the start up code synchronized? Who are the competing threads? > a. startReporters func is synchronized > b. KafkaScheduler startup is synchronized? There is also a volatile > variable declared when the whole synchronized block is itself guaranteeing > "happens before". > c. Use of native new Thread syntax instead of relying on Executor > service > d. processor thread uses a couthdownlatch but main thread doesnt await > for processors to signal that startup is complete. > > > NIO RELATED > 2. > a. Acceptor, and each Processor thread have their own selector (since > they are extending from abstract class AbstractServerThread). Ideally a > single selector suffices multiplexing. Is there any reason why multiple > selectors are used? > b. selector wake up calls by Processors in the read method (line 362 > SocketServer.scala) are MISSED calls since there is no thread waiting on > the select at that point. > > Looking forward to learning the code further! > Thanks in advance. > > Regards, > Chitta >