Hey All, Another topic worth discussing is how to layout code for the new producer and consumer as well as the common code they will share. There are really three questions: 1. Which top-level sub-modules/directories should we have (currently everything is under core, but presumably we want to split that up)? 2. What jars should we produce and what will be the dependencies between these? 3. What should the package layout be within the modules?
Let's use this thread to discuss that and make a decision. (2) is arguably the most relevant to the end-user who presumably doesn't care how we layout our code, so let's start with that. One constraint we have is that there is some code that must be shared between client and server (or else we will just have tons of duplication). This is true for utilities, data formats, etc. I think the server will likely end up embedding the consumer for replication and likely the producer for other uses such as offsets, so the server will necessarily depend on the client. The client should not depend on the server or we would have a circular dependency. Thus common code can't be kept with the server. I think there are several possibilities: a. 2 jar solution: Have a kafka-client.jar which contains the producer, consumer, and any future admin client we might add. Have a kafka-server.jar which contains the server and depends on the client jar. b. 3 jar solution: Have a kafka-common.jar which contains common code and a kafka-client.jar and kafka-server.jar. Client would depend on common and server would depend on common and client. One knock on this approach is that the common jar isn't really very useful on its own and it is perhaps kind of irritating to have to have to jars for clients. c. Multi jar solution: Have a kafka-common.jar, plus one for each client (kafka-producer.jar, kafka-consumer.jar, kafka-admin.jar). I would vote for (a) in the absence of any other input because it is the simplest for the user who just needs a single client jar. For (1) we could either have the code modules mimic the resulting jars or not. I think there might be some value in separating the producer, consumer, and common code to avoid crazy internal dependencies between packages. This could be enforced either by having separate modules which compile separately and then package into one jar or else by just keeping everything together and using checkstyle to enforce this. Currently there is only one module. The only weird thing about having it together is that common utilities is under the "clients" module which is a little unintuitive. I'm not sure if having modules not match the resulting jars will cause build headaches. Okay finally let's discuss the layout of packages in the existing code. The most important aspect of this is that we separate public from internal classes. This will make it easier to produce clean javadocs, and will help us keep these public apis clean and fully documented. Currently the public packages are kafka.common kafka.comon.errors kafka.clients.producer Other alternatives would be to attempt to annotate public classes or to include some naming scheme like kafka.common.api and kafka.clients.producer.api that would make it more clear which packages are public. Guozhang had several other comments on packages in KAFKA-1227. Let's use this thread to discuss these or any other suggestions and make a decision on how to do this. -Jay