Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Matthias J. Sax Thu, 13 Sep 2018 08:33:52 -0700

Three more comments:

(1) For `Grouped` should we add `with(String name, Serde<K> key,
Serde<V> value)` to allow specifying all parameters at once?


Produced/Consumed/Serialized etc follow a similar pattern. There is one
static method for each config parameter, plus one static method that
accepts all parameters. Might be more consistent if we follow this pattern.

(2) It seems that `Serialized` is only used in `groupBy` and
`groupByKey` -- because both methods accepting `Serialized` parameter
are deprecated and replaced with methods accepting `Grouped`, it seems
that we also want to deprecate `Serialized`?

(3) About naming repartition topics: thinking about this once more, I
actually prefer to use `left|right` instead of `this|other` :)


-Matthias


On 9/13/18 6:45 AM, Matthias J. Sax wrote:
> I don't know what Samza does, however, Flink requires users to specify
> names similar to this proposal to be able to re-identify state in case
> the topology gets altered between deployments.
> 
> Flink only has state they need to worry about. For Kafka Streams, it's
> state plus repartition topics.
> 
> 
> -Matthias
> 
> On 9/13/18 1:48 AM, Eno Thereska wrote:
>> Hi folks,
>>
>> I know we don't normally have a "Related work" section in KIPs, but
>> sometimes I find it useful to see what others have done in similar cases.
>> Since this will be important for rolling re-deployments, I wonder what
>> other frameworks like Flink (or Samza) have done in these cases. Perhaps
>> they have done nothing, in which case it's fine to do this from first
>> principles, but IMO it would be good to know just to make sure we're
>> heading in the right direction.
>>
>> Also I don't get a good feel for how much work this will be for an end user
>> who is doing the rolling deployment, perhaps an end-to-end example would
>> help.
>>
>> Thanks
>> Eno
>>
>> On Thu, Sep 13, 2018 at 6:22 AM, Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>>> Follow up comments:
>>>
>>> 1) We should either use `[app-id]-this|other-[join-name]-repartition` or
>>> `app-id]-[join-name]-left|right-repartition` but we should not change
>>> the pattern depending if the user specifies a name of not. I am fine
>>> with both patterns---just want to make sure with stick with one.
>>>
>>> 2) I didn't see why we would need to do this in this KIP. KIP-307 seems
>>> to be orthogonal, and thus KIP-372 should not change any processor
>>> names, but KIP-307 should define a holistic strategy for all processor.
>>> Otherwise, we might up with different strategies or revert what we
>>> decide in this KIP if it's not compatible with KIP-307.
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 9/12/18 6:28 PM, Guozhang Wang wrote:
>>>> Hello Bill,
>>>>
>>>> I made a pass over your proposal and here are some questions:
>>>>
>>>> 1. For Joined names, the current proposal is to define the repartition
>>>> topic names as
>>>>
>>>> * [app-id]-this-[join-name]-repartition
>>>>
>>>> * [app-id]-other-[join-name]-repartition
>>>>
>>>>
>>>> And if [join-name] not specified, stay the same, which is:
>>>>
>>>> * [previous-processor-name]-repartition for both Stream-Stream (S-S)
>>> join
>>>> and S-T join
>>>>
>>>> I think it is more natural to rename it to
>>>>
>>>> * [app-id]-[join-name]-left-repartition
>>>>
>>>> * [app-id]-[join-name]-right-repartition
>>>>
>>>>
>>>> 2. I'd suggest to use the name to also define the corresponding processor
>>>> names accordingly, in addition to the repartition topic names. Note that
>>>> for joins, this may be overlapping with KIP-307
>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL>
>>>> as
>>>> it also have proposals for defining processor names for join operators as
>>>> well.
>>>>
>>>> 3. Could you also specify how this would affect the optimization for
>>>> merging multiple repartition topics?
>>>>
>>>> 4. In the "Compatibility, Deprecation, and Migration Plan" section, could
>>>> you also mention the following scenarios, if any of the upgrade path
>>> would
>>>> be changed:
>>>>
>>>>  a) changing user DSL code: under which scenarios users can now do a
>>>> rolling bounce instead of resetting applications.
>>>>
>>>>  b) upgrading from older version to new version, with all the names
>>>> specified, and with optimization turned on. E.g. say we have the code
>>>> written in 2.1 with all names specified, and now upgrading to 2.2 with
>>> new
>>>> optimizations that may potentially change the repartition topics. Is that
>>>> always safe to do?
>>>>
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Wed, Sep 12, 2018 at 4:52 PM, Bill Bejeck <bbej...@gmail.com> wrote:
>>>>
>>>>> All I'd like to start a discussion on KIP-372 for the naming of joins
>>> and
>>>>> grouping operations in Kafka Streams.
>>>>>
>>>>> The KIP page can be found here:
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 372%3A+Naming+Joins+and+Grouping
>>>>>
>>>>> I look forward to feedback and comments.
>>>>>
>>>>> Thanks,
>>>>> Bill
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-372: Naming Joins and Grouping

Reply via email to