Hey Gianmarco,

To your broader point, I agree that having a close alignment with Kafka
would be a great thing in terms of adoption/discoverability/etc. There
areas where I think this matters a lot are:
1. Website and docs: ideally when reading about Kafka you should be able to
find out about Samza.
2. Api style and naming: ideally the various interfaces should feel similar
and use similar concepts and names. This is a bunch of little things
(calling topics and partitions in the same way, sharing metrics, sharing
partitioning strategies, etc).
3. Release alignment--i.e. this set of versions all work together.
4. Branding--I actually think if we go down that route it would be worth
considering just calling Samza something like "Kafka Streams" or "Kafka
Streaming" which I think would help a lot people to understand what it is
and since Kafka is heavily adopted would help with adoption. It always
seems silly to bother with naming, but I actually think this ends up
mattering a ton in how people understand the system (I guess as programmers
we kind of all intuitively understand the importance of good naming).

WRT partition mapping, yeah I totally agree. I think in all proposals this
is left pluggable. And I think ideally the same set of assignment
strategies should be available either in the Kafka consumer or in Samza. I
think at this point the only debate is whether this is controlled client
side or server side.

-Jay

On Fri, Jul 3, 2015 at 1:40 AM, Gianmarco De Francisci Morales <
g...@apache.org> wrote:

> Hi Jay,
>
> Thanks for your answer.
>
>
> > However a few things have changed since that original design:
> > 1. We now have the additional use cases of copycat and Samza
> > 2. We now realize that the assignment strategies don't actually
> necessarily
> > ensure each partition is assigned to only one consumer--there are really
> > valid use cases for broadcast or multiple replica assignment schemes--so
> we
> > can't actually make the a hard assertion on the server.
> >
> > So it may make sense to revist this, I don't think it is necessarily a
> > massive change and would give more flexibility for the variety of cases.
> >
> > -Jay
>
>
> I totally agree, the 1-partition-1-task mapping is too restrictive.
> However, I think the fundamental operation that Samza, Copycat, and Kafka
> consumers should agree upon is "how can I specify in a simple and
> transparent way which partitions I want to consume, and how?".
> This means providing a mapping from partitions to consumer tasks, possibly
> in a transparent way so as to allow for optimizations in placement,
> co-partitioning, etc...
> This issue has the potential of generating again a lot of duplicate work,
> and I think it should be solved at the Kafka level.
> Given that Copycat and normal consumers are already inside Kafka, I think
> having Samza there as well would simplify things a lot.
> The result is that Kafka would be a complete package for handling streams:
> - Messaging, partitioning, and fault tolerance (Kafka core)
> - Ingestion (Copycat)
> - Lightweight processing (Samza)
> - Coupling with other systems (Kafka consumers)
>
> Cheers,
>
> --
> Gianmarco
>

Reply via email to