Sounds good to me to do it in a different KIP (if at all---not convinced
yet we need this restriction).


-Matthias



On 5/11/18 11:36 AM, Guozhang Wang wrote:
> Matthias' point is valid, that today we are not effectively preventing
> users to programmatically access any materialized state store and mutate it
> on the fly anyways. This is a caveat that we should improve since as Andy
> asked, the implications of mutating a materialized store is "undefined" as
> of today. But thinking about it a bit more I think we do not need to fix in
> this KIP, but rather in a different PR, to consider the following:
> 
> 1) ProcessorContext#getStateStore(): should we return the internal store,
> even if user provided the right internal store name?
> 
> 2) For stores used in materialization, should we allow users to write to
> them?
> 
> 
> Guozhang
> 
> 
> On Fri, May 11, 2018 at 11:17 AM, Andy Coates <a...@confluent.io> wrote:
> 
>> I'm no expert, so happy to go with what ever is decided.
>>
>> Implementing it so that a call to getStateStore(queryableStoreName) throws
>> an exception is trivial enough to do, though I can see your point that this
>> might be better done in a more consistent / single place, rather than
>> peppered through the code.  I'm not sure there are any other places in the
>> API that take both a Materialized instance and a set of state store names,
>> so maybe this is a special/first case?
>>
>> I guess my main question would be what are the implications if a user did
>> mutate the underlying statestate within the transformer itself? Or to put
>> it another way, would allowing access to the state store result it weird
>> and wonderful bugs, and become a common source of bugs and another 'thing'
>> developers need to know not to do, or would it 'just work'?  If its the
>> former, then I think that's a strong case to have the code explicitly stop
>> users doing dumb stuff.
>>
>> On 11 May 2018 at 11:00, Matthias J. Sax <matth...@confluent.io> wrote:
>>
>>> I don't think it worth to complicate things and restrict the access to
>>> the store that is created by the library via `Materialized` parameter.
>>> The reason is, that we don't prevent this from happening anyway atm --
>>> users can always create an arbitrary processor and connect a KTable
>>> state with the processor and mess around with the KTable state. Of
>>> course, they should not do this, but it's possible after all and I think
>>> it's a user error if they do.
>>>
>>> Thus, if we really want to restrict the access to internal
>>> created/manages stores, we should do it consistently -- but it's unclear
>>> to me how to do this: note that Processor API and DSL should be
>>> separated (they are still not as clearly as they should, but we put some
>>> effort into this the last year). Also, accessing the `context` is a PAPI
>>> level operation. If we want to restrict the access to stores there, we
>>> need to introduce two "categories" of stores -- PAPI stores and DSL
>>> stores. This would be a step backwards IMHO and mangle DSL and PAPI
>>> together in an undesired way.
>>>
>>> Instead, we should clearly communicate to the users, that if they use
>>> `Materialized` they should never write to this store as it's managed by
>>> the library (reading is ok).
>>>
>>> WDYT?
>>>
>>> -Matthias
>>>
>>> On 5/11/18 10:50 AM, Guozhang Wang wrote:
>>>> Yes I was indeed thinking about now allowing `getStateStore()` to with
>>> the name
>>>> specified in Materialized. I understand it is a bit too restrictive,
>> but
>>> I
>>>> cannot think of a elegant way to work around the following:
>>>>
>>>> 1) to programmatically enforce that the restore store is read-only.
>>>>
>>>> 2) With Materialized, the store name may not be specified by the user
>> and
>>>> hence it will be created internally; what would happen if users call
>>>> `getStateStore()` with the correct internal store name? If not the
>>>> semantics is a bit complex, if yes we are breaking the protocol to not
>>>> expose  internal store.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Fri, May 11, 2018 at 10:42 AM, Andy Coates <a...@confluent.io>
>> wrote:
>>>>
>>>>> Just a thought - but on the subject of disallowing access to the
>>>>> materialized state store from within the transformer's init method...
>>> might
>>>>> this not be overly restrictive? Could there be valid uses where
>>> read-only
>>>>> access would be useful / valid.
>>>>>
>>>>> On 11 May 2018 at 10:35, Andy Coates <a...@confluent.io> wrote:
>>>>>
>>>>>> OK, KIP updated:
>>>>>>  - added overloads taking `Materialized`
>>>>>>  - dropped overloads taking `ValueTransformerSupplier` in favour of
>> the
>>>>>> `withKey` variants.
>>>>>>  - added more info around the limitations of the ProcessorContext
>>> passed
>>>>>> in to the transformer's init calls, i.e. no forward calls allowed or
>>>>> calls
>>>>>> to getStateStore where the store name matches the materialized result
>>> of
>>>>>> the call.
>>>>>>
>>>>>> I'll sort out the PR next.
>>>>>>
>>>>>> On 11 May 2018 at 10:26, Damian Guy <damian....@gmail.com> wrote:
>>>>>>
>>>>>>> I'm a +1 for Guozhang's suggestion
>>>>>>>
>>>>>>> On Fri, 11 May 2018 at 10:20 Andy Coates <a...@confluent.io> wrote:
>>>>>>>
>>>>>>>> Makes sense to me.  What do others think?
>>>>>>>>
>>>>>>>> On 11 May 2018 at 10:13, Guozhang Wang <wangg...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> While looking into the overloaded functions, I'm wondering if we
>> can
>>>>>>> save
>>>>>>>>> the transformers without key, i.e. only add two overloaded
>>>>> functions:
>>>>>>>>>
>>>>>>>>> <VR> KTable<K, VR> transformValues(final
>>>>>>>> ValueTransformerWithKeySupplier<?
>>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier,
>>>>>>>>>                                    final String...
>> stateStoreNames);
>>>>>>>>>
>>>>>>>>> <VR> KTable<K, VR> transformValues(final
>>>>>>>> ValueTransformerWithKeySupplier<?
>>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier,
>>>>>>>>>                                    final Materialized<K, VR,
>>>>>>>>> KeyValueStore<Bytes, byte[]>> materialized,
>>>>>>>>>                                    final String...
>> stateStoreNames);
>>>>>>>>>
>>>>>>>>> The reason is that, in KIP-149 we've added the overloaded
>> functions
>>>>>>>>> `withKey`, which should be covering the case without key already
>>>>>>> because
>>>>>>>> if
>>>>>>>>> users do not really need the key, they can just take it as a dummy
>>>>>>>>> parameter. We did not deprecate the old ones since some of them
>> have
>>>>>>> just
>>>>>>>>> been added one version back. But if we agree that by the end of
>> the
>>>>>>> day
>>>>>>>> we
>>>>>>>>> would only maintain the overloaded value functions "with key"
>> only,
>>>>>>> then
>>>>>>>> we
>>>>>>>>> should not add the ones without keys any more in new KIPs.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 11, 2018 at 9:42 AM, Andy Coates <a...@confluent.io>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry for my lack of response - I've been out of action with a
>> bad
>>>>>>> back
>>>>>>>>> for
>>>>>>>>>> a few days!
>>>>>>>>>>
>>>>>>>>>> I originally had the `Materialized` overloads added to the API.
>>>>> I'll
>>>>>>>>> update
>>>>>>>>>> the KIP / PR with these again. In terms of semantics, as Matthias
>>>>>>>>> suggests,
>>>>>>>>>> these should be consistent with filter() and mapValues(), etc.
>>>>>>>>>>
>>>>>>>>>> On 8 May 2018 at 17:59, Guozhang Wang <wangg...@gmail.com>
>> wrote:
>>>>>>>>>>
>>>>>>>>>>> To follow on Matthias and Damian's comments here:
>>>>>>>>>>>
>>>>>>>>>>> If we are going to add the overload functions as
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> <VR> KTable<K, VR> transformValues(final
>>>>>>> ValueTransformerSupplier<?
>>>>>>>>> super
>>>>>>>>>>> V,
>>>>>>>>>>> ? extends VR> valueTransformerSupplier,
>>>>>>>>>>>                                    final String...
>>>>>>> stateStoreNames,
>>>>>>>>>>>                                    final Materialized<K,
>>>>>>>>>>> VR, KeyValueStore<Bytes, byte[]> materialized);
>>>>>>>>>>>
>>>>>>>>>>> <VR> KTable<K, VR> transformValues(final
>>>>>>>> ValueTransformerWithKeySupplie
>>>>>>>>>> r<?
>>>>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier,
>>>>>>>>>>>                                    final String...
>>>>>>> stateStoreNames,
>>>>>>>>>>>                                  final Materialized<K,
>>>>>>>>>>> VR, KeyValueStore<Bytes, byte[]> materialized);
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> Then are we going to still only allow the
>>>>> valueTransofmer.init() /
>>>>>>>>>>> process() to be able to access N stores, with N stores specified
>>>>>>> with
>>>>>>>>> the
>>>>>>>>>>> stateStoreNames, but not the one specified in materialized.name
>>>>>>> ()?
>>>>>>>>>>> Personally I think it should be the case as the materialized
>>>>> store
>>>>>>>>> should
>>>>>>>>>>> be managed by the Streams library itself, but we should probably
>>>>>>> help
>>>>>>>>>> users
>>>>>>>>>>> to understand if they have some stores used for the same purpose
>>>>>>>>> (storing
>>>>>>>>>>> the value that are going to be sent to the downstream changelog
>>>>>>>> stream
>>>>>>>>> of
>>>>>>>>>>> KTable), they should save that store and not creating by
>>>>>>> themselves
>>>>>>>> as
>>>>>>>>> it
>>>>>>>>>>> will be auto created by the Streams library.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Guozhang
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 8, 2018 at 7:45 AM, Damian Guy <
>>>>> damian....@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Initially i thought materializing a store would be overkill,
>>>>> but
>>>>>>>>> from a
>>>>>>>>>>>> consistency point of view it makes sense to add an overload
>>>>> that
>>>>>>>>> takes
>>>>>>>>>> a
>>>>>>>>>>>> `Materialized` and only create the store if that is supplied.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, 6 May 2018 at 17:52 Matthias J. Sax <
>>>>>>> matth...@confluent.io
>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Andy,
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for the KIP. I don't have any further comments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My 2cents about Guozhang's questions: as I like consistent
>>>>>>>>> behavior,
>>>>>>>>>> I
>>>>>>>>>>>>> think transfromValues() should behave the same way as
>>>>> filter()
>>>>>>>> and
>>>>>>>>>>>>> mapValues().
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/2/18 2:24 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>> Hello Andy,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the KIP. The motivation and the general
>>>>> proposal
>>>>>>>> looks
>>>>>>>>>>> good
>>>>>>>>>>>> to
>>>>>>>>>>>>>> me. I think in KTable it is indeed valuable to add the
>>>>>>>> functions
>>>>>>>>>> that
>>>>>>>>>>>>> does
>>>>>>>>>>>>>> not change key, such as mapValues, transformValues, and
>>>>>>> filter.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are a few meta comments I have about the semantics
>>>>> of
>>>>>>> the
>>>>>>>>>> newly
>>>>>>>>>>>>> added
>>>>>>>>>>>>>> functions:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) For the resulted KTable, how should its
>>>>>>>> "queryableStoreName()"
>>>>>>>>>> be
>>>>>>>>>>>>>> returning?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) More specifically, how do we decide if the resulted
>>>>>>> KTable
>>>>>>>> is
>>>>>>>>> to
>>>>>>>>>>> be
>>>>>>>>>>>>>> materialized or not? E.g. if there is no store names
>>>>>>> provided
>>>>>>>>> then
>>>>>>>>>> it
>>>>>>>>>>>> is
>>>>>>>>>>>>>> likely that the resulted KTable is not materialized, or at
>>>>>>>> least
>>>>>>>>>> not
>>>>>>>>>>>>>> logically materialized and not be queryable. What if there
>>>>>>> is
>>>>>>>> at
>>>>>>>>>>> least
>>>>>>>>>>>>> one
>>>>>>>>>>>>>> state store provided? Will any of them be provided as the
>>>>>>>>>>> materialized
>>>>>>>>>>>>>> store, or should we still add a Materialized parameter for
>>>>>>> this
>>>>>>>>>>>> purpose?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) For its internal implementations, how should the
>>>>>>> key/value
>>>>>>>>>> serde,
>>>>>>>>>>>>>> sendOldValues flag etc be inherited from its parent
>>>>>>> processor
>>>>>>>>> node?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, May 2, 2018 at 12:43 PM, Andy Coates <
>>>>>>>> a...@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like to start a discussion for KIP 292. I would
>>>>>>>>> appreciate
>>>>>>>>>>> it
>>>>>>>>>>>> if
>>>>>>>>>>>>>>> you could review and provide feedback.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> KIP: KIP-292: Add transformValues() method to KTable
>>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>> 292%3A+Add+transformValues%28%29+method+to+KTable>
>>>>>>>>>>>>>>> Jira: KAFKA-6849 <https://issues.apache.org/
>>>>>>>>>> jira/browse/KAFKA-6849>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    PR: #4959 <https://github.com/apache/kafka/pull/4959>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Andy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
> 
> 
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to