Hey Gianmarco, To your broader point, I agree that having a close alignment with Kafka would be a great thing in terms of adoption/discoverability/etc. There areas where I think this matters a lot are: 1. Website and docs: ideally when reading about Kafka you should be able to find out about Samza. 2. Api style and naming: ideally the various interfaces should feel similar and use similar concepts and names. This is a bunch of little things (calling topics and partitions in the same way, sharing metrics, sharing partitioning strategies, etc). 3. Release alignment--i.e. this set of versions all work together. 4. Branding--I actually think if we go down that route it would be worth considering just calling Samza something like "Kafka Streams" or "Kafka Streaming" which I think would help a lot people to understand what it is and since Kafka is heavily adopted would help with adoption. It always seems silly to bother with naming, but I actually think this ends up mattering a ton in how people understand the system (I guess as programmers we kind of all intuitively understand the importance of good naming).
WRT partition mapping, yeah I totally agree. I think in all proposals this is left pluggable. And I think ideally the same set of assignment strategies should be available either in the Kafka consumer or in Samza. I think at this point the only debate is whether this is controlled client side or server side. -Jay On Fri, Jul 3, 2015 at 1:40 AM, Gianmarco De Francisci Morales < g...@apache.org> wrote: > Hi Jay, > > Thanks for your answer. > > > > However a few things have changed since that original design: > > 1. We now have the additional use cases of copycat and Samza > > 2. We now realize that the assignment strategies don't actually > necessarily > > ensure each partition is assigned to only one consumer--there are really > > valid use cases for broadcast or multiple replica assignment schemes--so > we > > can't actually make the a hard assertion on the server. > > > > So it may make sense to revist this, I don't think it is necessarily a > > massive change and would give more flexibility for the variety of cases. > > > > -Jay > > > I totally agree, the 1-partition-1-task mapping is too restrictive. > However, I think the fundamental operation that Samza, Copycat, and Kafka > consumers should agree upon is "how can I specify in a simple and > transparent way which partitions I want to consume, and how?". > This means providing a mapping from partitions to consumer tasks, possibly > in a transparent way so as to allow for optimizations in placement, > co-partitioning, etc... > This issue has the potential of generating again a lot of duplicate work, > and I think it should be solved at the Kafka level. > Given that Copycat and normal consumers are already inside Kafka, I think > having Samza there as well would simplify things a lot. > The result is that Kafka would be a complete package for handling streams: > - Messaging, partitioning, and fault tolerance (Kafka core) > - Ingestion (Copycat) > - Lightweight processing (Samza) > - Coupling with other systems (Kafka consumers) > > Cheers, > > -- > Gianmarco >