Sounds good to me to do it in a different KIP (if at all---not convinced yet we need this restriction).
-Matthias On 5/11/18 11:36 AM, Guozhang Wang wrote: > Matthias' point is valid, that today we are not effectively preventing > users to programmatically access any materialized state store and mutate it > on the fly anyways. This is a caveat that we should improve since as Andy > asked, the implications of mutating a materialized store is "undefined" as > of today. But thinking about it a bit more I think we do not need to fix in > this KIP, but rather in a different PR, to consider the following: > > 1) ProcessorContext#getStateStore(): should we return the internal store, > even if user provided the right internal store name? > > 2) For stores used in materialization, should we allow users to write to > them? > > > Guozhang > > > On Fri, May 11, 2018 at 11:17 AM, Andy Coates <a...@confluent.io> wrote: > >> I'm no expert, so happy to go with what ever is decided. >> >> Implementing it so that a call to getStateStore(queryableStoreName) throws >> an exception is trivial enough to do, though I can see your point that this >> might be better done in a more consistent / single place, rather than >> peppered through the code. I'm not sure there are any other places in the >> API that take both a Materialized instance and a set of state store names, >> so maybe this is a special/first case? >> >> I guess my main question would be what are the implications if a user did >> mutate the underlying statestate within the transformer itself? Or to put >> it another way, would allowing access to the state store result it weird >> and wonderful bugs, and become a common source of bugs and another 'thing' >> developers need to know not to do, or would it 'just work'? If its the >> former, then I think that's a strong case to have the code explicitly stop >> users doing dumb stuff. >> >> On 11 May 2018 at 11:00, Matthias J. Sax <matth...@confluent.io> wrote: >> >>> I don't think it worth to complicate things and restrict the access to >>> the store that is created by the library via `Materialized` parameter. >>> The reason is, that we don't prevent this from happening anyway atm -- >>> users can always create an arbitrary processor and connect a KTable >>> state with the processor and mess around with the KTable state. Of >>> course, they should not do this, but it's possible after all and I think >>> it's a user error if they do. >>> >>> Thus, if we really want to restrict the access to internal >>> created/manages stores, we should do it consistently -- but it's unclear >>> to me how to do this: note that Processor API and DSL should be >>> separated (they are still not as clearly as they should, but we put some >>> effort into this the last year). Also, accessing the `context` is a PAPI >>> level operation. If we want to restrict the access to stores there, we >>> need to introduce two "categories" of stores -- PAPI stores and DSL >>> stores. This would be a step backwards IMHO and mangle DSL and PAPI >>> together in an undesired way. >>> >>> Instead, we should clearly communicate to the users, that if they use >>> `Materialized` they should never write to this store as it's managed by >>> the library (reading is ok). >>> >>> WDYT? >>> >>> -Matthias >>> >>> On 5/11/18 10:50 AM, Guozhang Wang wrote: >>>> Yes I was indeed thinking about now allowing `getStateStore()` to with >>> the name >>>> specified in Materialized. I understand it is a bit too restrictive, >> but >>> I >>>> cannot think of a elegant way to work around the following: >>>> >>>> 1) to programmatically enforce that the restore store is read-only. >>>> >>>> 2) With Materialized, the store name may not be specified by the user >> and >>>> hence it will be created internally; what would happen if users call >>>> `getStateStore()` with the correct internal store name? If not the >>>> semantics is a bit complex, if yes we are breaking the protocol to not >>>> expose internal store. >>>> >>>> >>>> Guozhang >>>> >>>> On Fri, May 11, 2018 at 10:42 AM, Andy Coates <a...@confluent.io> >> wrote: >>>> >>>>> Just a thought - but on the subject of disallowing access to the >>>>> materialized state store from within the transformer's init method... >>> might >>>>> this not be overly restrictive? Could there be valid uses where >>> read-only >>>>> access would be useful / valid. >>>>> >>>>> On 11 May 2018 at 10:35, Andy Coates <a...@confluent.io> wrote: >>>>> >>>>>> OK, KIP updated: >>>>>> - added overloads taking `Materialized` >>>>>> - dropped overloads taking `ValueTransformerSupplier` in favour of >> the >>>>>> `withKey` variants. >>>>>> - added more info around the limitations of the ProcessorContext >>> passed >>>>>> in to the transformer's init calls, i.e. no forward calls allowed or >>>>> calls >>>>>> to getStateStore where the store name matches the materialized result >>> of >>>>>> the call. >>>>>> >>>>>> I'll sort out the PR next. >>>>>> >>>>>> On 11 May 2018 at 10:26, Damian Guy <damian....@gmail.com> wrote: >>>>>> >>>>>>> I'm a +1 for Guozhang's suggestion >>>>>>> >>>>>>> On Fri, 11 May 2018 at 10:20 Andy Coates <a...@confluent.io> wrote: >>>>>>> >>>>>>>> Makes sense to me. What do others think? >>>>>>>> >>>>>>>> On 11 May 2018 at 10:13, Guozhang Wang <wangg...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> While looking into the overloaded functions, I'm wondering if we >> can >>>>>>> save >>>>>>>>> the transformers without key, i.e. only add two overloaded >>>>> functions: >>>>>>>>> >>>>>>>>> <VR> KTable<K, VR> transformValues(final >>>>>>>> ValueTransformerWithKeySupplier<? >>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier, >>>>>>>>> final String... >> stateStoreNames); >>>>>>>>> >>>>>>>>> <VR> KTable<K, VR> transformValues(final >>>>>>>> ValueTransformerWithKeySupplier<? >>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier, >>>>>>>>> final Materialized<K, VR, >>>>>>>>> KeyValueStore<Bytes, byte[]>> materialized, >>>>>>>>> final String... >> stateStoreNames); >>>>>>>>> >>>>>>>>> The reason is that, in KIP-149 we've added the overloaded >> functions >>>>>>>>> `withKey`, which should be covering the case without key already >>>>>>> because >>>>>>>> if >>>>>>>>> users do not really need the key, they can just take it as a dummy >>>>>>>>> parameter. We did not deprecate the old ones since some of them >> have >>>>>>> just >>>>>>>>> been added one version back. But if we agree that by the end of >> the >>>>>>> day >>>>>>>> we >>>>>>>>> would only maintain the overloaded value functions "with key" >> only, >>>>>>> then >>>>>>>> we >>>>>>>>> should not add the ones without keys any more in new KIPs. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, May 11, 2018 at 9:42 AM, Andy Coates <a...@confluent.io> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Sorry for my lack of response - I've been out of action with a >> bad >>>>>>> back >>>>>>>>> for >>>>>>>>>> a few days! >>>>>>>>>> >>>>>>>>>> I originally had the `Materialized` overloads added to the API. >>>>> I'll >>>>>>>>> update >>>>>>>>>> the KIP / PR with these again. In terms of semantics, as Matthias >>>>>>>>> suggests, >>>>>>>>>> these should be consistent with filter() and mapValues(), etc. >>>>>>>>>> >>>>>>>>>> On 8 May 2018 at 17:59, Guozhang Wang <wangg...@gmail.com> >> wrote: >>>>>>>>>> >>>>>>>>>>> To follow on Matthias and Damian's comments here: >>>>>>>>>>> >>>>>>>>>>> If we are going to add the overload functions as >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> <VR> KTable<K, VR> transformValues(final >>>>>>> ValueTransformerSupplier<? >>>>>>>>> super >>>>>>>>>>> V, >>>>>>>>>>> ? extends VR> valueTransformerSupplier, >>>>>>>>>>> final String... >>>>>>> stateStoreNames, >>>>>>>>>>> final Materialized<K, >>>>>>>>>>> VR, KeyValueStore<Bytes, byte[]> materialized); >>>>>>>>>>> >>>>>>>>>>> <VR> KTable<K, VR> transformValues(final >>>>>>>> ValueTransformerWithKeySupplie >>>>>>>>>> r<? >>>>>>>>>>> super K, ? super V, ? extends VR> valueTransformerSupplier, >>>>>>>>>>> final String... >>>>>>> stateStoreNames, >>>>>>>>>>> final Materialized<K, >>>>>>>>>>> VR, KeyValueStore<Bytes, byte[]> materialized); >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> Then are we going to still only allow the >>>>> valueTransofmer.init() / >>>>>>>>>>> process() to be able to access N stores, with N stores specified >>>>>>> with >>>>>>>>> the >>>>>>>>>>> stateStoreNames, but not the one specified in materialized.name >>>>>>> ()? >>>>>>>>>>> Personally I think it should be the case as the materialized >>>>> store >>>>>>>>> should >>>>>>>>>>> be managed by the Streams library itself, but we should probably >>>>>>> help >>>>>>>>>> users >>>>>>>>>>> to understand if they have some stores used for the same purpose >>>>>>>>> (storing >>>>>>>>>>> the value that are going to be sent to the downstream changelog >>>>>>>> stream >>>>>>>>> of >>>>>>>>>>> KTable), they should save that store and not creating by >>>>>>> themselves >>>>>>>> as >>>>>>>>> it >>>>>>>>>>> will be auto created by the Streams library. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Guozhang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, May 8, 2018 at 7:45 AM, Damian Guy < >>>>> damian....@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Initially i thought materializing a store would be overkill, >>>>> but >>>>>>>>> from a >>>>>>>>>>>> consistency point of view it makes sense to add an overload >>>>> that >>>>>>>>> takes >>>>>>>>>> a >>>>>>>>>>>> `Materialized` and only create the store if that is supplied. >>>>>>>>>>>> >>>>>>>>>>>> On Sun, 6 May 2018 at 17:52 Matthias J. Sax < >>>>>>> matth...@confluent.io >>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Andy, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for the KIP. I don't have any further comments. >>>>>>>>>>>>> >>>>>>>>>>>>> My 2cents about Guozhang's questions: as I like consistent >>>>>>>>> behavior, >>>>>>>>>> I >>>>>>>>>>>>> think transfromValues() should behave the same way as >>>>> filter() >>>>>>>> and >>>>>>>>>>>>> mapValues(). >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> On 5/2/18 2:24 PM, Guozhang Wang wrote: >>>>>>>>>>>>>> Hello Andy, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the KIP. The motivation and the general >>>>> proposal >>>>>>>> looks >>>>>>>>>>> good >>>>>>>>>>>> to >>>>>>>>>>>>>> me. I think in KTable it is indeed valuable to add the >>>>>>>> functions >>>>>>>>>> that >>>>>>>>>>>>> does >>>>>>>>>>>>>> not change key, such as mapValues, transformValues, and >>>>>>> filter. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There are a few meta comments I have about the semantics >>>>> of >>>>>>> the >>>>>>>>>> newly >>>>>>>>>>>>> added >>>>>>>>>>>>>> functions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) For the resulted KTable, how should its >>>>>>>> "queryableStoreName()" >>>>>>>>>> be >>>>>>>>>>>>>> returning? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2) More specifically, how do we decide if the resulted >>>>>>> KTable >>>>>>>> is >>>>>>>>> to >>>>>>>>>>> be >>>>>>>>>>>>>> materialized or not? E.g. if there is no store names >>>>>>> provided >>>>>>>>> then >>>>>>>>>> it >>>>>>>>>>>> is >>>>>>>>>>>>>> likely that the resulted KTable is not materialized, or at >>>>>>>> least >>>>>>>>>> not >>>>>>>>>>>>>> logically materialized and not be queryable. What if there >>>>>>> is >>>>>>>> at >>>>>>>>>>> least >>>>>>>>>>>>> one >>>>>>>>>>>>>> state store provided? Will any of them be provided as the >>>>>>>>>>> materialized >>>>>>>>>>>>>> store, or should we still add a Materialized parameter for >>>>>>> this >>>>>>>>>>>> purpose? >>>>>>>>>>>>>> >>>>>>>>>>>>>> 3) For its internal implementations, how should the >>>>>>> key/value >>>>>>>>>> serde, >>>>>>>>>>>>>> sendOldValues flag etc be inherited from its parent >>>>>>> processor >>>>>>>>> node? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, May 2, 2018 at 12:43 PM, Andy Coates < >>>>>>>> a...@confluent.io> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I would like to start a discussion for KIP 292. I would >>>>>>>>> appreciate >>>>>>>>>>> it >>>>>>>>>>>> if >>>>>>>>>>>>>>> you could review and provide feedback. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> KIP: KIP-292: Add transformValues() method to KTable >>>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>> 292%3A+Add+transformValues%28%29+method+to+KTable> >>>>>>>>>>>>>>> Jira: KAFKA-6849 <https://issues.apache.org/ >>>>>>>>>> jira/browse/KAFKA-6849> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PR: #4959 <https://github.com/apache/kafka/pull/4959> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Andy >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- Guozhang >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -- Guozhang >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>> >> > > >
signature.asc
Description: OpenPGP digital signature