Hi Bill, For the record, the current proposal looks good to me also.
Thanks, -John On Tue, Sep 17, 2019 at 5:06 PM Matthias J. Sax <matth...@confluent.io> wrote: > > > Just to clarify I'll update `as(StoreSupplier, StoreSupplier)` to > > `with(..., ...)` and change the class name from `StreamJoined` to > > `StreamJoin` > > > Thanks Bill. SGTM. > > > > -Matthias > > > On 9/17/19 4:52 PM, aishwarya kumar wrote: > > Hi Bill, > > > > Thanks for clarifying, and the KIP looks great!! > > > > Best regards, > > Aishwarya > > > > On Tue, Sep 17, 2019, 6:16 PM Bill Bejeck <bbej...@gmail.com> wrote: > > > >> Hi Aishwarya, > >> > >> On Tue, Sep 17, 2019 at 1:41 PM aishwarya kumar <ash26...@gmail.com> > >> wrote: > >> > >>> Will this be applicable to Kstream-Ktable joins as well? Or do users > >> always > >>> materialize these joins explicitly? > >>> > >> > >> No, this change applies to KStream-KStream joins only. With KStream-KTable > >> joins KafkaStreams has materialized the KTable already, and we don't need > >> to do anything with the KStream instance in this case. > >> > >> > >>> I'm not sure if its even necessary (or makes sense), just trying to > >>> understand why the change is applicable to Kstream joins only? > >>> > >> > >> The full details are in the KIP, but in a nutshell, we needed to break the > >> naming of the StateStore from `Joined.withName` and provide users a way to > >> name the StateStore explicitly. While making the changes, we realized it > >> would be beneficial to give users the ability to use different WindowStore > >> implementations. The most likely reason to use different WindowStores > >> would be to enable in-memory joins. > >> > >> > >>> Best, > >>> Aishwarya > >>> > >> > >> Regards, > >> Bill > >> > >> > >>> > >>> On Tue, Sep 17, 2019 at 4:05 PM Bill Bejeck <bbej...@gmail.com> wrote: > >>> > >>>> Guozhang, > >>>> > >>>> Thanks for the comments. > >>>> > >>>> Yes, you are correct in your assessment regarding names, *if* the users > >>>> provide their own StoreSuppliers When constructing as StoreSupplier, > >> the > >>>> name can't be null, and additionally, there is no way to update the > >> name. > >>>> Further downstream, the underlying StateStore instances use the > >> provided > >>>> name for registration so they must be unique. If users don't provide > >>>> distinct names for the StoreSuppliers, KafkaStreams will thow a > >>>> StreamsException when building the topology. > >>>> > >>>> Thanks, > >>>> Bill > >>>> > >>>> > >>>> > >>>> On Tue, Sep 17, 2019 at 9:39 AM Guozhang Wang <wangg...@gmail.com> > >>> wrote: > >>>> > >>>>> Hello Bill, > >>>>> > >>>>> Thanks for the updated KIP. I made a pass on the StreamJoined > >> section. > >>>> Just > >>>>> a quick question from user's perspective: if a user wants to provide > >> a > >>>>> customized store-supplier, she is forced to also provide a name via > >> the > >>>>> store-supplier. So there's no way to say "I want to provide my own > >>> store > >>>>> engine but let the library decide its name", is that right? > >>>>> > >>>>> > >>>>> Guozhang > >>>>> > >>>>> > >>>>> On Tue, Sep 17, 2019 at 8:53 AM Bill Bejeck <bbej...@gmail.com> > >> wrote: > >>>>> > >>>>>> Bumping this discussion as we need to re-vote before the KIP > >>> deadline. > >>>>>> > >>>>>> On Fri, Sep 13, 2019 at 10:29 AM Bill Bejeck <bbej...@gmail.com> > >>>> wrote: > >>>>>> > >>>>>>> Hi All, > >>>>>>> > >>>>>>> While working on the implementation of KIP-479, some issues came > >> to > >>>>> light > >>>>>>> that the KIP as written won't work. I have updated the KIP with > >> a > >>>>>> solution > >>>>>>> I believe will solve the original problem as well as address the > >>>>>>> impediment to the initial approach. > >>>>>>> > >>>>>>> This update is a significant change, so please review the updated > >>> KIP > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join > >>>>>> and > >>>>>>> provide feedback. After we conclude the discussion, there will > >> be > >>> a > >>>>>>> re-vote. > >>>>>>> > >>>>>>> Thanks! > >>>>>>> Bill > >>>>>>> > >>>>>>> On Wed, Jul 17, 2019 at 7:01 PM Guozhang Wang < > >> wangg...@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Bill, thanks for your explanations. I'm on board with your > >>>> decision > >>>>>>>> too. > >>>>>>>> > >>>>>>>> > >>>>>>>> Guozhang > >>>>>>>> > >>>>>>>> On Wed, Jul 17, 2019 at 10:20 AM Bill Bejeck <bbej...@gmail.com > >>> > >>>>> wrote: > >>>>>>>> > >>>>>>>>> Thanks for the response, John. > >>>>>>>>> > >>>>>>>>>> If I can offer my thoughts, it seems better to just document > >>> on > >>>>> the > >>>>>>>>>> Stream join javadoc for the Materialized parameter that it > >>> will > >>>>> not > >>>>>>>>>> make the join result queriable. I'm not opposed to the > >>> queriable > >>>>>> flag > >>>>>>>>>> in general, but introducing it is a much larger > >> consideration > >>>> that > >>>>>> has > >>>>>>>>>> previously derailed this KIP discussion. In the interest of > >>> just > >>>>>>>>>> closing the gap and keeping the API change small, it seems > >>>> better > >>>>> to > >>>>>>>>>> just go with documentation for now. > >>>>>>>>> > >>>>>>>>> I agree with your statement here. IMHO the most important > >> goal > >>> of > >>>>>> this > >>>>>>>> KIP > >>>>>>>>> is to not breaking existing users and gain some consistency of > >>> the > >>>>>> API. > >>>>>>>>> > >>>>>>>>> I'll update the KIP accordingly. > >>>>>>>>> > >>>>>>>>> -Bill > >>>>>>>>> > >>>>>>>>> On Tue, Jul 16, 2019 at 11:55 AM John Roesler < > >>> j...@confluent.io> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Bill, > >>>>>>>>>> > >>>>>>>>>> Thanks for driving this KIP toward a conclusion. I'm on > >> board > >>>> with > >>>>>>>>>> your decision. > >>>>>>>>>> > >>>>>>>>>> You didn't mention whether you're still proposing to add the > >>>>>>>>>> "queriable" flag to the Materialized config object, or just > >>>>> document > >>>>>>>>>> that a Stream join is never queriable. Both options have > >> come > >>> up > >>>>>>>>>> earlier in the discussion, so it would be good to pin this > >>> down. > >>>>>>>>>> > >>>>>>>>>> If I can offer my thoughts, it seems better to just document > >>> on > >>>>> the > >>>>>>>>>> Stream join javadoc for the Materialized parameter that it > >>> will > >>>>> not > >>>>>>>>>> make the join result queriable. I'm not opposed to the > >>> queriable > >>>>>> flag > >>>>>>>>>> in general, but introducing it is a much larger > >> consideration > >>>> that > >>>>>> has > >>>>>>>>>> previously derailed this KIP discussion. In the interest of > >>> just > >>>>>>>>>> closing the gap and keeping the API change small, it seems > >>>> better > >>>>> to > >>>>>>>>>> just go with documentation for now. > >>>>>>>>>> > >>>>>>>>>> Thanks again, > >>>>>>>>>> -John > >>>>>>>>>> > >>>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck < > >>> bbej...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Thanks all for the great discussion so far. > >>>>>>>>>>> > >>>>>>>>>>> Everyone has made excellent points, and I appreciate the > >>>> detail > >>>>>>>>> everyone > >>>>>>>>>>> has put into their arguments. > >>>>>>>>>>> > >>>>>>>>>>> However, after carefully evaluating all the points made so > >>>> far, > >>>>>>>>> creating > >>>>>>>>>> an > >>>>>>>>>>> overload with Materialized is still my #1 option. > >>>>>>>>>>> My reasoning for saying so is two-fold: > >>>>>>>>>>> > >>>>>>>>>>> 1. It's a small change, and IMHO since it's consistent > >>> with > >>>>> our > >>>>>>>>>> current > >>>>>>>>>>> API concerning state store usage, the cognitive load on > >>>> users > >>>>>>>> will > >>>>>>>>> be > >>>>>>>>>>> minimal. > >>>>>>>>>>> 2. It achieves the most important goal of this KIP, > >>> namely > >>>> to > >>>>>>>> close > >>>>>>>>>> the > >>>>>>>>>>> gap of naming state stores independently of the join > >>>> operator > >>>>>>>> name. > >>>>>>>>>>> > >>>>>>>>>>> Additionally, I agree with the points made by Matthias > >>> earlier > >>>>> (I > >>>>>>>>> realize > >>>>>>>>>>> there is some overlap here). > >>>>>>>>>>> > >>>>>>>>>>>> - the main purpose of this KIP is to close the naming > >> gap > >>>>> what > >>>>>> we > >>>>>>>>>> achieve > >>>>>>>>>>>> - we can allow people to use the new in-memory store > >>>>>>>>>>>> - we allow people to enable/disable caching > >>>>>>>>>>>> - we unify the API > >>>>>>>>>>>> - we decouple querying from naming > >>>>>>>>>>>> - it's a small API change > >>>>>>>>>>> > >>>>>>>>>>> Although it's not a perfect solution, IMHO the positives > >> of > >>>>> using > >>>>>>>>>>> Materialize far outweigh the negatives, and from what > >> we've > >>>>>>>> discussed > >>>>>>>>> so > >>>>>>>>>>> far, anything we implement seems to involve an additional > >>>> change > >>>>>>>> down > >>>>>>>>> the > >>>>>>>>>>> road. > >>>>>>>>>>> > >>>>>>>>>>> If others are still strongly opposed to using > >> Materialized, > >>> my > >>>>>> other > >>>>>>>>>>> preferences would be > >>>>>>>>>>> > >>>>>>>>>>> 1. Add a "withStoreName" to Joined. Although I agree > >>> with > >>>>>>>> Guozhang > >>>>>>>>>> that > >>>>>>>>>>> having a parameter that only applies to one use-case > >>> would > >>>> be > >>>>>>>>> clumsy. > >>>>>>>>>>> 2. Add a String overload for naming the store, but this > >>>> would > >>>>>> be > >>>>>>>> my > >>>>>>>>>>> least favorite option as IMHO this seems to be a step > >>>>> backward > >>>>>>>> from > >>>>>>>>>> why we > >>>>>>>>>>> introduced configuration objects in the first place. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Bill > >>>>>>>>>>> > >>>>>>>>>>> On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax < > >>>>>>>> matth...@confluent.io > >>>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Thanks for the KIP Bill! > >>>>>>>>>>>> > >>>>>>>>>>>> Great discussion to far. > >>>>>>>>>>>> > >>>>>>>>>>>> About John's idea about querying upstream stores and > >> don't > >>>>>>>>> materialize > >>>>>>>>>> a > >>>>>>>>>>>> store: I agree with Bill that this seems to be an > >>> orthogonal > >>>>>>>>> question, > >>>>>>>>>>>> and it might be better to treat it as an independent > >>>>>> optimization > >>>>>>>> and > >>>>>>>>>>>> exclude from this KIP. > >>>>>>>>>>>> > >>>>>>>>>>>>> What should be the behavior if there is no store > >>>>>>>>>>>>> configured (e.g., if Materialized with only serdes) > >> and > >>>>>>>> querying is > >>>>>>>>>>>>> enabled? > >>>>>>>>>>>> > >>>>>>>>>>>> IMHO, this could be an error case. If one wants to > >> query a > >>>>>> store, > >>>>>>>>> they > >>>>>>>>>>>> need to provide a name -- if you don't know the name, > >> how > >>>>> would > >>>>>>>> you > >>>>>>>>>>>> actually query the store (even if it would be possible > >> to > >>>> get > >>>>>> the > >>>>>>>>> name > >>>>>>>>>>>> from the `TopologyDescription`, it seems clumsy). > >>>>>>>>>>>> > >>>>>>>>>>>> If we don't want to throw an error, materializing seems > >> to > >>>> be > >>>>>> the > >>>>>>>>> right > >>>>>>>>>>>> option, to exclude "query optimization" from this KIP. I > >>>> would > >>>>>> be > >>>>>>>> ok > >>>>>>>>>>>> with this option, even if it's clumsy to get the name > >> from > >>>>>>>>>>>> `TopologyDescription`; hence, I would prefer to treat it > >>> as > >>>> an > >>>>>>>> error. > >>>>>>>>>>>> > >>>>>>>>>>>>> To get back to the current behavior, users would have > >> to > >>>>>>>>>>>>> add a "bytes store supplier" to the Materialized to > >>>> indicate > >>>>>>>> that, > >>>>>>>>>>>>> yes, they really want a state store there. > >>>>>>>>>>>> > >>>>>>>>>>>> This sound like a quite subtle semantic difference on > >> how > >>> to > >>>>> use > >>>>>>>> the > >>>>>>>>>>>> API. Might be hard to explain to users. I would prefer > >> to > >>>> not > >>>>>>>>>> introduce it. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> About Guozhang's points: > >>>>>>>>>>>> > >>>>>>>>>>>> 1a) That is actually a good point. However, I believe we > >>>>> cannot > >>>>>>>> get > >>>>>>>>>>>> around this issue easily, and it seems ok to me, to > >> expose > >>>> the > >>>>>>>> actual > >>>>>>>>>>>> store type we are using. (More thoughts later.) > >>>>>>>>>>>> > >>>>>>>>>>>> 1b) I don't see an issue with allowing users to query > >> all > >>>>>> stores? > >>>>>>>>> What > >>>>>>>>>>>> is the rational behind it? What do we gain by not > >> allowing > >>>> it? > >>>>>>>>>>>> > >>>>>>>>>>>> 2) While I understand what you are saying, we also > >>> want/need > >>>>> to > >>>>>>>> have > >>>>>>>>> a > >>>>>>>>>>>> way in the PAPI to allow users adding "internal/private" > >>>>>>>>> non-queryable > >>>>>>>>>>>> stores to a topology. That's possible via > >>>>>>>>>>>> `Materialized#withQueryingDisabled()`. We could also > >>> update > >>>>>>>>>>>> `Topology#addStateStore(StoreBuilder, boolean > >> isQueryable, > >>>>>>>>> String...)` > >>>>>>>>>>>> to address this. Again, I agree with Bill that the > >> current > >>>> API > >>>>>> is > >>>>>>>>> built > >>>>>>>>>>>> in a certain way, and if we want to change it, it should > >>> be > >>>> a > >>>>>>>>> separate > >>>>>>>>>>>> KIP, as it seems to be an orthogonal concern. > >>>>>>>>>>>> > >>>>>>>>>>>>> Instead, we just restrict KIP-307 to NOT > >>>>>>>>>>>>> use the Joined.name for state store names and always > >> use > >>>>>>>> internal > >>>>>>>>>> names > >>>>>>>>>>>> as > >>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not > >> being > >>>>> able > >>>>>> to > >>>>>>>>>> cover > >>>>>>>>>>>> all > >>>>>>>>>>>>> internal names here > >>>>>>>>>>>> > >>>>>>>>>>>> I think it's important to close this gap. Naming > >> entities > >>>>> seems > >>>>>>>> to a > >>>>>>>>>>>> binary feature: if there is a gap, the feature is more > >> or > >>>> less > >>>>>>>>> useless, > >>>>>>>>>>>> rendering KIP-307 void. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I like John's detailed list of required features and > >> what > >>>>>>>>>>>> Materialized/WindowByteStoreSuppliers offers. My take > >> is, > >>>> that > >>>>>>>> adding > >>>>>>>>>>>> Materialized including the required run-time checks is > >> the > >>>>> best > >>>>>>>>> option > >>>>>>>>>>>> we have, for the following reasons: > >>>>>>>>>>>> > >>>>>>>>>>>> - the main purpose of this KIP is to close the naming > >> gap > >>>>> what > >>>>>> we > >>>>>>>>>> achieve > >>>>>>>>>>>> - we can allow people to use the new in-memory store > >>>>>>>>>>>> - we allow people to enable/disable caching > >>>>>>>>>>>> - we unify the API > >>>>>>>>>>>> - we decouple querying from naming > >>>>>>>>>>>> - it's a small API change > >>>>>>>>>>>> > >>>>>>>>>>>> Adding an overload and only passing in a name, would > >>> address > >>>>> the > >>>>>>>> main > >>>>>>>>>>>> purpose of the KIP. However, it falls short on all the > >>> other > >>>>>>>>> "goodies". > >>>>>>>>>>>> As you mentioned, passing in `Materialized` might not be > >>>>> perfect > >>>>>>>> and > >>>>>>>>>>>> maybe we need to deprecate is at some point; but this is > >>>> also > >>>>>> true > >>>>>>>>> for > >>>>>>>>>>>> passing in just a name. > >>>>>>>>>>>> > >>>>>>>>>>>> I am also not convinced, that a `StreamJoinStore` would > >>>>> resolve > >>>>>>>> all > >>>>>>>>> the > >>>>>>>>>>>> issues. In the end, as long as we are using a > >>>> `WindowedStore` > >>>>>>>>>>>> internally, we need to expose this "implemenation > >> detail" > >>> to > >>>>>>>> users to > >>>>>>>>>>>> allow them to plug in a custom store. Adding > >>> `Materialized` > >>>>> seem > >>>>>>>> to > >>>>>>>>> be > >>>>>>>>>>>> the best short-term fix from my point of view. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -Matthias > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 6/27/19 9:56 AM, Guozhang Wang wrote: > >>>>>>>>>>>>> Hi John, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I actually feels better about a new interface but I'm > >>> not > >>>>> sure > >>>>>>>> if > >>>>>>>>> we > >>>>>>>>>>>> would > >>>>>>>>>>>>> need the full configuration of store / log / cache, > >> now > >>> or > >>>>> in > >>>>>>>> the > >>>>>>>>>> future > >>>>>>>>>>>>> ever for stream-stream join. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Right now I feel that 1) we want to improve our > >>>>> implementation > >>>>>>>> of > >>>>>>>>>>>>> stream-stream join, and potentially also allow users > >> to > >>>>>>>> customize > >>>>>>>>>> this > >>>>>>>>>>>>> implementation but with a more suitable interface than > >>> the > >>>>>>>> current > >>>>>>>>>>>>> WindowStore interface, how to do that is less clear > >> and > >>>>>>>>>> execution-wise > >>>>>>>>>>>> it's > >>>>>>>>>>>>> (arguably..) not urgent; 2) we want to close the last > >>> gap > >>>>>>>>>> (Stream-stream > >>>>>>>>>>>>> join) of allowing users to specify all internal names > >> to > >>>>> help > >>>>>> on > >>>>>>>>>> backward > >>>>>>>>>>>>> compatibility, which is urgent. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Therefore if we want to unblock 2) from 1) in the near > >>>>> term, I > >>>>>>>> feel > >>>>>>>>>>>>> slightly inclined to just add overload functions that > >>>> takes > >>>>>> in a > >>>>>>>>>> store > >>>>>>>>>>>> name > >>>>>>>>>>>>> for stream-stream joins only -- and admittedly, in the > >>>>> future > >>>>>>>> this > >>>>>>>>>>>> function > >>>>>>>>>>>>> maybe deprecated -- i.e. if we have to do something > >> that > >>>> we > >>>>>> "may > >>>>>>>>>> regret" > >>>>>>>>>>>> in > >>>>>>>>>>>>> the future, I'd like to pick the least intrusive > >> option. > >>>>>>>>>>>>> > >>>>>>>>>>>>> About `Joined#withStoreName`: since the Joined class > >>>> itself > >>>>> is > >>>>>>>> also > >>>>>>>>>> used > >>>>>>>>>>>> in > >>>>>>>>>>>>> other join types, I feel less comfortable to have a > >>>>>>>>>>>> `Joined#withStoreName` > >>>>>>>>>>>>> which is only going to be used by stream-stream join. > >> Or > >>>>>> maybe I > >>>>>>>>> miss > >>>>>>>>>>>>> something here about the "latter" case that you are > >>>>> referring > >>>>>>>> to? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, Jun 24, 2019 at 12:16 PM John Roesler < > >>>>>>>> j...@confluent.io> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks Guozhang, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Yep. Maybe we can consider just exactly what the join > >>>>> needs: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> the WindowStore<Bytes, byte[]> itself is fine, if > >>> overly > >>>>>>>> broad, > >>>>>>>>>>>>>>> since the only two methods we need are > >>> `window.put(key, > >>>>>> value, > >>>>>>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2> > >>>> iter > >>>>> = > >>>>>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> One "middle ground" would be to extract _this_ into a > >>> new > >>>>>> store > >>>>>>>>>>>>>> interface, which only supports these API calls, like > >>>>>>>>>>>>>> StreamJoinStore<K, V>. This would give us the > >> latitude > >>> we > >>>>>> need > >>>>>>>> to > >>>>>>>>>>>>>> efficiently support the exact operation without > >>>> concerning > >>>>>>>>> ourselves > >>>>>>>>>>>>>> with all the other things a WindowStore can do (which > >>> are > >>>>>>>>>> unreachable > >>>>>>>>>>>>>> for the join use case). It would also let us drop > >>> "store > >>>>>>>>> duplicates" > >>>>>>>>>>>>>> from the main WindowStore interface, since it only > >>> exists > >>>>> to > >>>>>>>>> support > >>>>>>>>>>>>>> the join use case. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If we were to add a new StreamJoinStore interface, > >> then > >>>>> it'd > >>>>>> be > >>>>>>>>>>>>>> straightforward how we could add also > >>>>>>>>>>>>>> `Materialized.as(StreamJoinBytesStoreSupplier)` and > >> use > >>>>>>>>>> Materialized, > >>>>>>>>>>>>>> or alternatively add the ability to set the bytes > >> store > >>>> on > >>>>>>>> Joined. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Personally, I'm kind of leaning toward the latter > >> (and > >>>> also > >>>>>>>> doing > >>>>>>>>>>>>>> `Joined#withStoreName`), since adding the new > >> interface > >>>> to > >>>>>>>>>>>>>> Materialized then also pollutes the interface for its > >>>>>> _actual_ > >>>>>>>> use > >>>>>>>>>>>>>> case of materializing a table view. Of course, to > >> solve > >>>> the > >>>>>>>>>> immediate > >>>>>>>>>>>>>> problem, all we need is the store name, but we might > >>> feel > >>>>>>>> better > >>>>>>>>>> about > >>>>>>>>>>>>>> adding the store name to Joined if we _also_ feel > >> like > >>> in > >>>>> the > >>>>>>>>>> future, > >>>>>>>>>>>>>> we would add store/log/cache configuration to Joined > >> as > >>>>> well. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -John > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 12:56 PM Guozhang Wang < > >>>>>>>>> wangg...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello John, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> My main concern is exactly the first point at the > >>> bottom > >>>>> of > >>>>>>>> your > >>>>>>>>>>>> analysis > >>>>>>>>>>>>>>> here: "* configure the bytes store". I'm not sure if > >>>>> using a > >>>>>>>>> window > >>>>>>>>>>>> bytes > >>>>>>>>>>>>>>> store would be ideal for stream-stream windowed > >> join; > >>>> e.g. > >>>>>> we > >>>>>>>>> could > >>>>>>>>>>>>>>> consider two dimensional list sorted by timestamps > >> and > >>>>> then > >>>>>> by > >>>>>>>>>> keys to > >>>>>>>>>>>> do > >>>>>>>>>>>>>>> the join, whereas a windowed bytes store is > >> basically > >>>>> sorted > >>>>>>>> by > >>>>>>>>> key > >>>>>>>>>>>>>> first, > >>>>>>>>>>>>>>> then by timestamp. If we expose the Materialized to > >>> let > >>>>> user > >>>>>>>> pass > >>>>>>>>>> in a > >>>>>>>>>>>>>>> windowed bytes store, then we would need to change > >>> that > >>>> if > >>>>>> we > >>>>>>>>> want > >>>>>>>>>> to > >>>>>>>>>>>>>>> replace it with a different implementation > >> interface. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 8:59 AM John Roesler < > >>>>>>>> j...@confluent.io> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hey Guozhang and Bill, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For what it's worth, I agree with you both! > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think it might help the discussion to look > >>> concretely > >>>>> at > >>>>>>>> what > >>>>>>>>>>>>>>>> Materialized does: > >>>>>>>>>>>>>>>> * set a WindowBytesStoreSupplier > >>>>>>>>>>>>>>>> * set a name > >>>>>>>>>>>>>>>> * set the key/value serdes > >>>>>>>>>>>>>>>> * disable/enable/configure change-logging > >>>>>>>>>>>>>>>> * disable/enable caching > >>>>>>>>>>>>>>>> * configure retention > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Further, looking into the WindowBytesStoreSupplier, > >>> the > >>>>>>>>> interface > >>>>>>>>>> lets > >>>>>>>>>>>>>> you: > >>>>>>>>>>>>>>>> * get the segment interval > >>>>>>>>>>>>>>>> * get the window size > >>>>>>>>>>>>>>>> * get whether "duplicates" are enabled > >>>>>>>>>>>>>>>> * get the retention period > >>>>>>>>>>>>>>>> * (obviously) get a WindowStore<Bytes, byte[]> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> We know that Materialized isn't exactly what we > >> need > >>>> for > >>>>>>>> stream > >>>>>>>>>> joins, > >>>>>>>>>>>>>>>> but we can see how close Materialized is to what we > >>>> need. > >>>>>> If > >>>>>>>> it > >>>>>>>>> is > >>>>>>>>>>>>>>>> close, maybe we can use it and document the gaps, > >> and > >>>> if > >>>>> it > >>>>>>>> is > >>>>>>>>> not > >>>>>>>>>>>>>>>> close, then maybe we should just add what we need > >> to > >>>>>> Joined. > >>>>>>>>>>>>>>>> Stream Join's requirements for its stores: > >>>>>>>>>>>>>>>> * a multimap store (i.e., it keeps duplicates) for > >>>>> storing > >>>>>>>>> general > >>>>>>>>>>>>>>>> (not windowed) keyed records associated with their > >>>>>> insertion > >>>>>>>>>> time, and > >>>>>>>>>>>>>>>> allows efficient time-bounded lookups and also > >>>> efficient > >>>>>>>> purges > >>>>>>>>>> of old > >>>>>>>>>>>>>>>> data. > >>>>>>>>>>>>>>>> ** Note, a properly configured > >>> WindowBytesStoreSupplier > >>>>>>>>> satisfies > >>>>>>>>>> this > >>>>>>>>>>>>>>>> requirement, and the interface supports the queries > >>> we > >>>>> need > >>>>>>>> to > >>>>>>>>>> verify > >>>>>>>>>>>>>>>> the configuration at run-time > >>>>>>>>>>>>>>>> * set a name for the store > >>>>>>>>>>>>>>>> * do _not_ set the serdes (they are already set in > >>>>> Joined) > >>>>>>>>>>>>>>>> * logging could be configurable (set to enabled > >> now) > >>>>>>>>>>>>>>>> * caching could be configurable (set to enabled > >> now) > >>>>>>>>>>>>>>>> * do _not_ configure retention (determined by > >>>>> JoinWindows) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> So, out of six capabilities for Materialized, there > >>> are > >>>>> two > >>>>>>>> we > >>>>>>>>>> don't > >>>>>>>>>>>>>>>> want (serdes and retention). These would become > >>>> run-time > >>>>>>>> checks > >>>>>>>>>> if we > >>>>>>>>>>>>>>>> use it. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> A third questionable capability is to provide a > >>>>>>>>>>>>>>>> WindowBytesStoreSupplier. Looking at whether the > >>>>>>>>>>>>>>>> WindowBytesStoreSupplier is the right interface for > >>>>> Stream > >>>>>>>> Join: > >>>>>>>>>>>>>>>> * configuring segment interval is fine > >>>>>>>>>>>>>>>> * should _not_ configure window size (it's > >> determined > >>>> by > >>>>>>>>>> JoinWindows) > >>>>>>>>>>>>>>>> * duplicates _must_ be enabled > >>>>>>>>>>>>>>>> * retention should be _at least_ windowSize + > >>>>> gracePeriod, > >>>>>>>> but > >>>>>>>>>> note > >>>>>>>>>>>>>>>> that (unlike for Table window stores) there is no > >>>> utility > >>>>>> in > >>>>>>>>>> having a > >>>>>>>>>>>>>>>> longer retention time. > >>>>>>>>>>>>>>>> * the WindowStore<Bytes, byte[]> itself is fine, if > >>>>> overly > >>>>>>>>> broad, > >>>>>>>>>>>>>>>> since the only two methods we need are > >>> `window.put(key, > >>>>>>>> value, > >>>>>>>>>>>>>>>> context().timestamp())` and > >> `WindowStoreIterator<V2> > >>>>> iter = > >>>>>>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thus, flattening out the overlap for > >>>>>> WindowBytesStoreSupplier > >>>>>>>>>> onto the > >>>>>>>>>>>>>>>> overlap for Materialized, we have 9 capabilities > >>> total > >>>>>> (note > >>>>>>>>>> retention > >>>>>>>>>>>>>>>> is duplicated), we have 4 that we don't want: > >>>>>>>>>>>>>>>> * do _not_ set the serdes (they are already set in > >>>>> Joined) > >>>>>>>>>>>>>>>> * do _not_ configure retention (determined by > >>>>> JoinWindows) > >>>>>>>>>>>>>>>> * should _not_ configure window size (it's > >> determined > >>>> by > >>>>>>>>>> JoinWindows) > >>>>>>>>>>>>>>>> * duplicates _must_ be enabled > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> These gaps would have to be covered with run-time > >>>> checks > >>>>> if > >>>>>>>> we > >>>>>>>>>> re-use > >>>>>>>>>>>>>>>> Materialized and WindowStoreBytesStoreSupplier > >> both. > >>>>> Maybe > >>>>>>>> this > >>>>>>>>>> sounds > >>>>>>>>>>>>>>>> bad, but consider the other side, that we get 5 new > >>>>>>>> capabilities > >>>>>>>>>> we > >>>>>>>>>>>>>>>> don't require, but are still pretty nice: > >>>>>>>>>>>>>>>> * configure the bytes store > >>>>>>>>>>>>>>>> * set a name for the store > >>>>>>>>>>>>>>>> * configure caching > >>>>>>>>>>>>>>>> * configure logging > >>>>>>>>>>>>>>>> * configure segment interval > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Not sure where this nets us out, but it's food for > >>>>> thought. > >>>>>>>>>>>>>>>> -John > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:52 PM Guozhang Wang < > >>>>>>>>> wangg...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi Bill, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I think by giving a Materialized param into > >>>>> stream-stream > >>>>>>>> join, > >>>>>>>>>> it's > >>>>>>>>>>>>>> okay > >>>>>>>>>>>>>>>>> (though still ideal) to say "we still would not > >>> expose > >>>>> the > >>>>>>>>> store > >>>>>>>>>> for > >>>>>>>>>>>>>>>>> queries", but it would sound a bit awkward to say > >>> "we > >>>>>> would > >>>>>>>>> also > >>>>>>>>>>>>>> ignore > >>>>>>>>>>>>>>>>> whatever the passed in store supplier but just use > >>> our > >>>>>>>> default > >>>>>>>>>> ones" > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>> again the concern is that, if in the future we'd > >>> want > >>>> to > >>>>>>>> change > >>>>>>>>>> the > >>>>>>>>>>>>>>>> default > >>>>>>>>>>>>>>>>> implementation of join algorithm which no longer > >>> rely > >>>>> on a > >>>>>>>>> window > >>>>>>>>>>>>>> store > >>>>>>>>>>>>>>>>> with deduping enabled, then we need to change this > >>> API > >>>>>>>> again by > >>>>>>>>>>>>>> changing > >>>>>>>>>>>>>>>>> the store supplier type. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If we do want to fill this hole for stream-stream > >>>> join, > >>>>> I > >>>>>>>> feel > >>>>>>>>>> just > >>>>>>>>>>>>>>>> adding > >>>>>>>>>>>>>>>>> a String typed store-name would even be less > >>>>>>>> future-intrusive > >>>>>>>>> if > >>>>>>>>>> we > >>>>>>>>>>>>>>>> expect > >>>>>>>>>>>>>>>>> this parameter to be modified later. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Does that makes sense? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Sat, Jun 22, 2019 at 12:51 PM Bill Bejeck < > >>>>>>>>> bbej...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks for the comments John and Guozhang, I'll > >>>> address > >>>>>>>> each > >>>>>>>>>> one of > >>>>>>>>>>>>>>>> your > >>>>>>>>>>>>>>>>>> comments in turn. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> John, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the > >>>> truth > >>>>>>>> table > >>>>>>>>>>>>>> involving > >>>>>>>>>>>>>>>>>>> whether a Materialized is stored or not and > >>> querying > >>>>> is > >>>>>>>>>>>>>>>>>>> enabled/disabled... What should be the behavior > >> if > >>>>> there > >>>>>>>> is > >>>>>>>>> no > >>>>>>>>>>>>>> store > >>>>>>>>>>>>>>>>>>> configured (e.g., if Materialized with only > >>> serdes) > >>>>> and > >>>>>>>>>> querying > >>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> enabled? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> It seems we have two choices: > >>>>>>>>>>>>>>>>>>> 1. we can force creation of a state store in > >> this > >>>>> case, > >>>>>> so > >>>>>>>>> the > >>>>>>>>>>>>>> store > >>>>>>>>>>>>>>>>>>> can be used to serve the queries > >>>>>>>>>>>>>>>>>>> 2. we can provide just a queriable view, > >> basically > >>>>>>>> letting IQ > >>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>> into the "KTableValueGetter", which would > >>>>> transparently > >>>>>>>>>>>>>> construct the > >>>>>>>>>>>>>>>>>>> query response by applying the operator logic to > >>> the > >>>>>>>> upstream > >>>>>>>>>>>>>> state > >>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>> the operator state isn't already stored. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I agree with your assertion about a missing > >>> quadrant > >>>>> from > >>>>>>>> the > >>>>>>>>>> truth > >>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>>>> Additionally, I too like the concept of a > >> queriable > >>>>> view. > >>>>>>>>> But I > >>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> goes a bit beyond the scope of this KIP and would > >>>> like > >>>>> to > >>>>>>>>> pursue > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> feature as follow-on work. Also thinking about > >>> this > >>>>> KIP > >>>>>>>> some > >>>>>>>>>>>>>> more, I'm > >>>>>>>>>>>>>>>>>> thinking of the changes to Materialized might be > >> a > >>>>> reach > >>>>>> as > >>>>>>>>>> well. > >>>>>>>>>>>>>>>>>> Separating the naming from a store and its > >>> queryable > >>>>>> state > >>>>>>>>> seems > >>>>>>>>>>>>>> like a > >>>>>>>>>>>>>>>>>> complex issue in and of itself and should be > >>> treated > >>>>>>>>>> accordingly. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> So here's what I'm thinking now. We add > >>> Materialzied > >>>>> to > >>>>>>>> Join, > >>>>>>>>>> but > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>> now, > >>>>>>>>>>>>>>>>>> we internally disable querying. I know this > >> breaks > >>>> our > >>>>>>>>> current > >>>>>>>>>>>>>>>> semantic > >>>>>>>>>>>>>>>>>> approach, but I think it's crucial that we do two > >>>>> things > >>>>>> in > >>>>>>>>> this > >>>>>>>>>>>>>> KIP > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. Break the naming of the state stores from > >>>> Joined > >>>>> to > >>>>>>>>>>>>>>>> Materialized, so > >>>>>>>>>>>>>>>>>> the naming of state stores follows our current > >>>>> pattern > >>>>>>>> and > >>>>>>>>>>>>>> enables > >>>>>>>>>>>>>>>>>> upgrades > >>>>>>>>>>>>>>>>>> from 2.3 to 2.4 > >>>>>>>>>>>>>>>>>> 2. Offer the ability to configure the state > >>> stores > >>>>> of > >>>>>>>> the > >>>>>>>>>> join, > >>>>>>>>>>>>>> even > >>>>>>>>>>>>>>>>>> providing a different implementation (i.e. > >>>>> in-memory) > >>>>>> if > >>>>>>>>>>>>>> desired. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> With that in mind I'm considering changing the > >> KIP > >>> to > >>>>>>>> remove > >>>>>>>>> the > >>>>>>>>>>>>>>>> changes to > >>>>>>>>>>>>>>>>>> Materialized, and we document very clearly that > >> by > >>>>>>>> providing a > >>>>>>>>>>>>>>>> Materialized > >>>>>>>>>>>>>>>>>> object with a name is only for naming the state > >>>> store, > >>>>>>>> hence > >>>>>>>>> the > >>>>>>>>>>>>>>>> changelog > >>>>>>>>>>>>>>>>>> topics and any possible configurations of the > >>> store, > >>>>> but > >>>>>>>> this > >>>>>>>>>> store > >>>>>>>>>>>>>>>> *will > >>>>>>>>>>>>>>>>>> not be available for IQ.* > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Guozhang, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1. About not breaking compatibility of > >>> stream-stream > >>>>>> join > >>>>>>>>>>>>>>>> materialized > >>>>>>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, > >>> but > >>>>>> after > >>>>>>>>>>>>>> thinking > >>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>> it once more I'm not sure if exposing > >> Materialized > >>>>> would > >>>>>>>> be a > >>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>>>>>> here. My rationles: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of > >>>>>>>> window-store > >>>>>>>>>> is > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>> ideal, > >>>>>>>>>>>>>>>>>>> and we want to modify it in the near future to > >> be > >>>> more > >>>>>>>>>>>>>> efficient. Not > >>>>>>>>>>>>>>>>>>> allowing users to override such state store > >>> backend > >>>>>> gives > >>>>>>>> us > >>>>>>>>>> such > >>>>>>>>>>>>>>>> freedom > >>>>>>>>>>>>>>>>>>> (which was also considered in the original DSL > >>>>> design), > >>>>>>>>> whereas > >>>>>>>>>>>>>>>> getting a > >>>>>>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out > >> that > >>>>>> freedom > >>>>>>>>> out > >>>>>>>>>>>>>> of the > >>>>>>>>>>>>>>>>>>> window. > >>>>>>>>>>>>>>>>>>> 1.b For strema-stream join, in our original > >> design > >>>> we > >>>>>>>> intend > >>>>>>>>> to > >>>>>>>>>>>>>>>> "never" > >>>>>>>>>>>>>>>>>>> want users to query the state, since it is just > >>> for > >>>>>>>> buffering > >>>>>>>>>> the > >>>>>>>>>>>>>>>>>> upcoming > >>>>>>>>>>>>>>>>>>> records from the stream. Now I know that some > >>> users > >>>>> may > >>>>>>>>> indeed > >>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> query it from the debugging perspective, but > >>> still I > >>>>>>>>> concerned > >>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes > >> would > >>>> be > >>>>>> the > >>>>>>>>> right > >>>>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>>>>>> here. And adding Materialized object opens the > >>> door > >>>> to > >>>>>> let > >>>>>>>>>> users > >>>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>> about it (unless we did something intentionally > >> to > >>>>> still > >>>>>>>>>> forbids > >>>>>>>>>>>>>> it), > >>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>> also restricts us in the future. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 2. About the coupling between > >> Materialized.name() > >>>> and > >>>>>>>>>> queryable: > >>>>>>>>>>>>>>>> again I > >>>>>>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if > >>> the > >>>>>>>> current > >>>>>>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at > >> Materialized > >>> is > >>>>> the > >>>>>>>> best > >>>>>>>>>>>>>>>> approach. > >>>>>>>>>>>>>>>>>>> Here I think I agree with John, that generally > >>>>> speaking > >>>>>>>> it's > >>>>>>>>>>>>>> better > >>>>>>>>>>>>>>>> be a > >>>>>>>>>>>>>>>>>>> control function on the `KTable` itself, rather > >>> than > >>>>> on > >>>>>>>>>>>>>>>> `Materialized`, > >>>>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>> fixing it via adding functions through > >>>> `Materialized` > >>>>>>>> seems > >>>>>>>>>> not a > >>>>>>>>>>>>>>>> natural > >>>>>>>>>>>>>>>>>> approach either. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I understand your thoughts here, and up to a > >>> point, I > >>>>>> agree > >>>>>>>>> with > >>>>>>>>>>>>>> you. > >>>>>>>>>>>>>>>>>> But concerning not providing Materialized as it > >> may > >>>>>>>> restrict > >>>>>>>>> us > >>>>>>>>>> in > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> future for delivering different implementations, > >>> I'm > >>>>>>>> wondering > >>>>>>>>>> if > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>> doing some premature optimization here. > >>>>>>>>>>>>>>>>>> My rationale for saying so > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. I think the cost of not allowing the naming > >>> of > >>>>>> state > >>>>>>>>>> stores > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>> joins > >>>>>>>>>>>>>>>>>> is too big of a gap to leave. IMHO for joins > >>> to > >>>>>> follow > >>>>>>>>> the > >>>>>>>>>>>>>> current > >>>>>>>>>>>>>>>>>> pattern of using Materialized for naming state > >>>>> stores > >>>>>>>> would > >>>>>>>>>> be > >>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>> most > >>>>>>>>>>>>>>>>>> users would expect to use. As I said in my > >>>> comments > >>>>>>>>> above, I > >>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>> should *not include* the changes to > >> Materialized > >>>> and > >>>>>>>>> enforce > >>>>>>>>>>>>>> named > >>>>>>>>>>>>>>>>>> stores for joins as unavailable for IQ. > >>>>>>>>>>>>>>>>>> 2. We'll still have the join methods available > >>>>>> without a > >>>>>>>>>>>>>>>> Materialized > >>>>>>>>>>>>>>>>>> allowing us to do something different > >> internally > >>>> if > >>>>> a > >>>>>>>>>>>>>> Materialized > >>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>> provided. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use > >>> two > >>>>>> stones > >>>>>>>>>> rather > >>>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>>> one > >>>>>>>>>>>>>>>>>>> to kill these two birds, and probably for this > >> KIP > >>>> we > >>>>>> just > >>>>>>>>>> focus > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> 1) > >>>>>>>>>>>>>>>>>>> above. And for that I'd like to not expose the > >>>>>>>> Materialized > >>>>>>>>>>>>>> either > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> rationales that I've listed above. Instead, we > >>> just > >>>>>>>> restrict > >>>>>>>>>>>>>> KIP-307 > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> NOT > >>>>>>>>>>>>>>>>>>> use the Joined.name for state store names and > >>> always > >>>>> use > >>>>>>>>>> internal > >>>>>>>>>>>>>>>> names > >>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of > >> not > >>>>> being > >>>>>>>> able > >>>>>>>>>> to > >>>>>>>>>>>>>>>> cover > >>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>> internal names here, but now I feel this `hole` > >>> may > >>>>>>>> better be > >>>>>>>>>>>>>> filled > >>>>>>>>>>>>>>>> by, > >>>>>>>>>>>>>>>>>>> e.g. not creating changelog topics but just use > >>> the > >>>>>>>> upstream > >>>>>>>>> to > >>>>>>>>>>>>>>>>>>> re-bootstrap the materialized store, more > >>>> concretely: > >>>>>> when > >>>>>>>>>>>>>>>> materializing > >>>>>>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic > >>> on > >>>> an > >>>>>>>>> existing > >>>>>>>>>>>>>>>> topic, > >>>>>>>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>>>>>>> a) if the stream is coming directly from some > >>> source > >>>>>> topic > >>>>>>>>>>>>>> (including > >>>>>>>>>>>>>>>>>>> repartition topic), make that as changelog topic > >>> and > >>>>> if > >>>>>>>> it is > >>>>>>>>>>>>>>>> repartition > >>>>>>>>>>>>>>>>>>> topic change the retention / data purging policy > >>>>>>>> necessarily > >>>>>>>>> as > >>>>>>>>>>>>>>>> well; b) > >>>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>> the stream is coming from some stateless > >>> operators, > >>>>>>>> delegate > >>>>>>>>>> that > >>>>>>>>>>>>>>>>>> stateless > >>>>>>>>>>>>>>>>>>> operator to the parent stream similar as a); if > >>> the > >>>>>>>> stream is > >>>>>>>>>>>>>> coming > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> stream-stream join which is the only stateful > >>>> operator > >>>>>>>> that > >>>>>>>>> can > >>>>>>>>>>>>>>>> result in > >>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> stream, consider merging the join into multi-way > >>>> joins > >>>>>>>> (yes, > >>>>>>>>>>>>>> this is > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> very > >>>>>>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we > >>> do > >>>>> not > >>>>>>>> try > >>>>>>>>> to > >>>>>>>>>>>>>>>> tackle it > >>>>>>>>>>>>>>>>>>> now but leave it for a better solution :). > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I really like this idea! I agree with you in > >> that > >>>> this > >>>>>>>>> approach > >>>>>>>>>>>>>> to too > >>>>>>>>>>>>>>>>>> much for adding in this KIP, but we could pick it > >>> up > >>>>>> later > >>>>>>>> and > >>>>>>>>>>>>>>>> leverage the > >>>>>>>>>>>>>>>>>> Optimization framework to accomplish this re-use. > >>>>>>>>>>>>>>>>>> Again, while I agree we should break the naming > >> of > >>>> join > >>>>>>>> state > >>>>>>>>>>>>>> stores > >>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>> KIP-307, IMHO it's something we should fix now as > >>> it > >>>>> will > >>>>>>>> be > >>>>>>>>> the > >>>>>>>>>>>>>> last > >>>>>>>>>>>>>>>> piece > >>>>>>>>>>>>>>>>>> we can provide to give users the ability to > >>>> completely > >>>>>> make > >>>>>>>>>> their > >>>>>>>>>>>>>>>>>> topologies "upgrade proof" when adding additional > >>>>>>>> operations. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks again to both of you for comments and I > >> look > >>>>>>>> forward to > >>>>>>>>>>>>>> hearing > >>>>>>>>>>>>>>>> back > >>>>>>>>>>>>>>>>>> from you. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>> Bill > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:33 PM Guozhang Wang < > >>>>>>>>>> wangg...@gmail.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hello Bill, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks for the KIP. Glad to see that we can > >> likely > >>>>>>>> shooting > >>>>>>>>> two > >>>>>>>>>>>>>> birds > >>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>> one stone. I have some concerns though about > >> those > >>>>> "two > >>>>>>>>> birds" > >>>>>>>>>>>>>>>>>> themselves: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1. About not breaking compatibility of > >>> stream-stream > >>>>>> join > >>>>>>>>>>>>>>>> materialized > >>>>>>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, > >>> but > >>>>>> after > >>>>>>>>>>>>>> thinking > >>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>> it once more I'm not sure if exposing > >> Materialized > >>>>> would > >>>>>>>> be a > >>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>>>>>> here. My rationles: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of > >>>>>>>> window-store > >>>>>>>>>> is > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>> ideal, > >>>>>>>>>>>>>>>>>>> and we want to modify it in the near future to > >> be > >>>> more > >>>>>>>>>>>>>> efficient. Not > >>>>>>>>>>>>>>>>>>> allowing users to override such state store > >>> backend > >>>>>> gives > >>>>>>>> us > >>>>>>>>>> such > >>>>>>>>>>>>>>>> freedom > >>>>>>>>>>>>>>>>>>> (which was also considered in the original DSL > >>>>> design), > >>>>>>>>> whereas > >>>>>>>>>>>>>>>> getting a > >>>>>>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out > >> that > >>>>>> freedom > >>>>>>>>> out > >>>>>>>>>>>>>> of the > >>>>>>>>>>>>>>>>>>> window. > >>>>>>>>>>>>>>>>>>> 1.b For strema-stream join, in our original > >> design > >>>> we > >>>>>>>> intend > >>>>>>>>> to > >>>>>>>>>>>>>>>> "never" > >>>>>>>>>>>>>>>>>>> want users to query the state, since it is just > >>> for > >>>>>>>> buffering > >>>>>>>>>> the > >>>>>>>>>>>>>>>>>> upcoming > >>>>>>>>>>>>>>>>>>> records from the stream. Now I know that some > >>> users > >>>>> may > >>>>>>>>> indeed > >>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> query it from the debugging perspective, but > >>> still I > >>>>>>>>> concerned > >>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes > >> would > >>>> be > >>>>>> the > >>>>>>>>> right > >>>>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>>>>>> here. And adding Materialized object opens the > >>> door > >>>> to > >>>>>> let > >>>>>>>>>> users > >>>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>> about it (unless we did something intentionally > >> to > >>>>> still > >>>>>>>>>> forbids > >>>>>>>>>>>>>> it), > >>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>> also restricts us in the future. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 2. About the coupling between > >> Materialized.name() > >>>> and > >>>>>>>>>> queryable: > >>>>>>>>>>>>>>>> again I > >>>>>>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if > >>> the > >>>>>>>> current > >>>>>>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at > >> Materialized > >>> is > >>>>> the > >>>>>>>> best > >>>>>>>>>>>>>>>> approach. > >>>>>>>>>>>>>>>>>>> Here I think I agree with John, that generally > >>>>> speaking > >>>>>>>> it's > >>>>>>>>>>>>>> better > >>>>>>>>>>>>>>>> be a > >>>>>>>>>>>>>>>>>>> control function on the `KTable` itself, rather > >>> than > >>>>> on > >>>>>>>>>>>>>>>> `Materialized`, > >>>>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>> fixing it via adding functions through > >>>> `Materialized` > >>>>>>>> seems > >>>>>>>>>> not a > >>>>>>>>>>>>>>>> natural > >>>>>>>>>>>>>>>>>>> approach either. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use > >>> two > >>>>>> stones > >>>>>>>>>> rather > >>>>>>>>>>>>>>>> than > >>>>>>>>>>>>>>>>>> one > >>>>>>>>>>>>>>>>>>> to kill these two birds, and probably for this > >> KIP > >>>> we > >>>>>> just > >>>>>>>>>> focus > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>> 1) > >>>>>>>>>>>>>>>>>>> above. And for that I'd like to not expose the > >>>>>>>> Materialized > >>>>>>>>>>>>>> either > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> rationales that I've listed above. Instead, we > >>> just > >>>>>>>> restrict > >>>>>>>>>>>>>> KIP-307 > >>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> NOT > >>>>>>>>>>>>>>>>>>> use the Joined.name for state store names and > >>> always > >>>>> use > >>>>>>>>>> internal > >>>>>>>>>>>>>>>> names > >>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of > >> not > >>>>> being > >>>>>>>> able > >>>>>>>>>> to > >>>>>>>>>>>>>>>> cover > >>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>> internal names here, but now I feel this `hole` > >>> may > >>>>>>>> better be > >>>>>>>>>>>>>> filled > >>>>>>>>>>>>>>>> by, > >>>>>>>>>>>>>>>>>>> e.g. not creating changelog topics but just use > >>> the > >>>>>>>> upstream > >>>>>>>>> to > >>>>>>>>>>>>>>>>>>> re-bootstrap the materialized store, more > >>>> concretely: > >>>>>> when > >>>>>>>>>>>>>>>> materializing > >>>>>>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic > >>> on > >>>> an > >>>>>>>>> existing > >>>>>>>>>>>>>>>> topic, > >>>>>>>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>>>>>>> a) if the stream is coming directly from some > >>> source > >>>>>> topic > >>>>>>>>>>>>>> (including > >>>>>>>>>>>>>>>>>>> repartition topic), make that as changelog topic > >>> and > >>>>> if > >>>>>>>> it is > >>>>>>>>>>>>>>>> repartition > >>>>>>>>>>>>>>>>>>> topic change the retention / data purging policy > >>>>>>>> necessarily > >>>>>>>>> as > >>>>>>>>>>>>>>>> well; b) > >>>>>>>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>>>> the stream is coming from some stateless > >>> operators, > >>>>>>>> delegate > >>>>>>>>>> that > >>>>>>>>>>>>>>>>>> stateless > >>>>>>>>>>>>>>>>>>> operator to the parent stream similar as a); if > >>> the > >>>>>>>> stream is > >>>>>>>>>>>>>> coming > >>>>>>>>>>>>>>>>>> from a > >>>>>>>>>>>>>>>>>>> stream-stream join which is the only stateful > >>>> operator > >>>>>>>> that > >>>>>>>>> can > >>>>>>>>>>>>>>>> result > >>>>>>>>>>>>>>>>>> in a > >>>>>>>>>>>>>>>>>>> stream, consider merging the join into multi-way > >>>> joins > >>>>>>>> (yes, > >>>>>>>>>>>>>> this is > >>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> very > >>>>>>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we > >>> do > >>>>> not > >>>>>>>> try > >>>>>>>>> to > >>>>>>>>>>>>>>>> tackle it > >>>>>>>>>>>>>>>>>>> now but leave it for a better solution :). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 11:41 AM John Roesler < > >>>>>>>>>> j...@confluent.io > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hi Bill, > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks for the KIP! Awesome job catching this > >>>>>> unexpected > >>>>>>>>>>>>>>>> consequence > >>>>>>>>>>>>>>>>>>>> of the prior KIPs before it was released. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> The proposal looks good to me. On top of just > >>>> fixing > >>>>>> the > >>>>>>>>>>>>>> problem, > >>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>> seems to address two other pain points: > >>>>>>>>>>>>>>>>>>>> * that naming a s > > >