Thanks Bill! Using a new configuration object was suggested by John in the original DISCUSS thread already. We rejected it because we wanted to avoid a new configuration class. However, given your analysis, it seems it's actually the right choice to introduce a new configuration class.
Hence, overall I am +1 on the proposal. Some nits about names (as always :)) - `StreamJoined.as()` does not sound right for the `StoreSupplier` overload. Maybe it's better to call that static method `with` (not 100% sure) - Should we use plural instead of singular -> `Stream{s}Joined` -> or keep singular but call it `StreamJoin` `Joined` seems to refer to the input stream(s) while `Join` would refer to the join-operator Thoughts? You suggest to deprecate existing overload what I support -- can you list deprecated method in the "Public Interface" section? -Matthias On 9/17/19 9:39 AM, Guozhang Wang wrote: > Hello Bill, > > Thanks for the updated KIP. I made a pass on the StreamJoined section. Just > a quick question from user's perspective: if a user wants to provide a > customized store-supplier, she is forced to also provide a name via the > store-supplier. So there's no way to say "I want to provide my own store > engine but let the library decide its name", is that right? > > > Guozhang > > > On Tue, Sep 17, 2019 at 8:53 AM Bill Bejeck <bbej...@gmail.com> wrote: > >> Bumping this discussion as we need to re-vote before the KIP deadline. >> >> On Fri, Sep 13, 2019 at 10:29 AM Bill Bejeck <bbej...@gmail.com> wrote: >> >>> Hi All, >>> >>> While working on the implementation of KIP-479, some issues came to light >>> that the KIP as written won't work. I have updated the KIP with a >> solution >>> I believe will solve the original problem as well as address the >>> impediment to the initial approach. >>> >>> This update is a significant change, so please review the updated KIP >>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join >> and >>> provide feedback. After we conclude the discussion, there will be a >>> re-vote. >>> >>> Thanks! >>> Bill >>> >>> On Wed, Jul 17, 2019 at 7:01 PM Guozhang Wang <wangg...@gmail.com> >> wrote: >>> >>>> Hi Bill, thanks for your explanations. I'm on board with your decision >>>> too. >>>> >>>> >>>> Guozhang >>>> >>>> On Wed, Jul 17, 2019 at 10:20 AM Bill Bejeck <bbej...@gmail.com> wrote: >>>> >>>>> Thanks for the response, John. >>>>> >>>>>> If I can offer my thoughts, it seems better to just document on the >>>>>> Stream join javadoc for the Materialized parameter that it will not >>>>>> make the join result queriable. I'm not opposed to the queriable >> flag >>>>>> in general, but introducing it is a much larger consideration that >> has >>>>>> previously derailed this KIP discussion. In the interest of just >>>>>> closing the gap and keeping the API change small, it seems better to >>>>>> just go with documentation for now. >>>>> >>>>> I agree with your statement here. IMHO the most important goal of >> this >>>> KIP >>>>> is to not breaking existing users and gain some consistency of the >> API. >>>>> >>>>> I'll update the KIP accordingly. >>>>> >>>>> -Bill >>>>> >>>>> On Tue, Jul 16, 2019 at 11:55 AM John Roesler <j...@confluent.io> >>>> wrote: >>>>> >>>>>> Hi Bill, >>>>>> >>>>>> Thanks for driving this KIP toward a conclusion. I'm on board with >>>>>> your decision. >>>>>> >>>>>> You didn't mention whether you're still proposing to add the >>>>>> "queriable" flag to the Materialized config object, or just document >>>>>> that a Stream join is never queriable. Both options have come up >>>>>> earlier in the discussion, so it would be good to pin this down. >>>>>> >>>>>> If I can offer my thoughts, it seems better to just document on the >>>>>> Stream join javadoc for the Materialized parameter that it will not >>>>>> make the join result queriable. I'm not opposed to the queriable >> flag >>>>>> in general, but introducing it is a much larger consideration that >> has >>>>>> previously derailed this KIP discussion. In the interest of just >>>>>> closing the gap and keeping the API change small, it seems better to >>>>>> just go with documentation for now. >>>>>> >>>>>> Thanks again, >>>>>> -John >>>>>> >>>>>> On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck <bbej...@gmail.com> >>>> wrote: >>>>>>> >>>>>>> Thanks all for the great discussion so far. >>>>>>> >>>>>>> Everyone has made excellent points, and I appreciate the detail >>>>> everyone >>>>>>> has put into their arguments. >>>>>>> >>>>>>> However, after carefully evaluating all the points made so far, >>>>> creating >>>>>> an >>>>>>> overload with Materialized is still my #1 option. >>>>>>> My reasoning for saying so is two-fold: >>>>>>> >>>>>>> 1. It's a small change, and IMHO since it's consistent with our >>>>>> current >>>>>>> API concerning state store usage, the cognitive load on users >>>> will >>>>> be >>>>>>> minimal. >>>>>>> 2. It achieves the most important goal of this KIP, namely to >>>> close >>>>>> the >>>>>>> gap of naming state stores independently of the join operator >>>> name. >>>>>>> >>>>>>> Additionally, I agree with the points made by Matthias earlier (I >>>>> realize >>>>>>> there is some overlap here). >>>>>>> >>>>>>>> - the main purpose of this KIP is to close the naming gap what >> we >>>>>> achieve >>>>>>>> - we can allow people to use the new in-memory store >>>>>>>> - we allow people to enable/disable caching >>>>>>>> - we unify the API >>>>>>>> - we decouple querying from naming >>>>>>>> - it's a small API change >>>>>>> >>>>>>> Although it's not a perfect solution, IMHO the positives of using >>>>>>> Materialize far outweigh the negatives, and from what we've >>>> discussed >>>>> so >>>>>>> far, anything we implement seems to involve an additional change >>>> down >>>>> the >>>>>>> road. >>>>>>> >>>>>>> If others are still strongly opposed to using Materialized, my >> other >>>>>>> preferences would be >>>>>>> >>>>>>> 1. Add a "withStoreName" to Joined. Although I agree with >>>> Guozhang >>>>>> that >>>>>>> having a parameter that only applies to one use-case would be >>>>> clumsy. >>>>>>> 2. Add a String overload for naming the store, but this would >> be >>>> my >>>>>>> least favorite option as IMHO this seems to be a step backward >>>> from >>>>>> why we >>>>>>> introduced configuration objects in the first place. >>>>>>> >>>>>>> Thanks, >>>>>>> Bill >>>>>>> >>>>>>> On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax < >>>> matth...@confluent.io >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks for the KIP Bill! >>>>>>>> >>>>>>>> Great discussion to far. >>>>>>>> >>>>>>>> About John's idea about querying upstream stores and don't >>>>> materialize >>>>>> a >>>>>>>> store: I agree with Bill that this seems to be an orthogonal >>>>> question, >>>>>>>> and it might be better to treat it as an independent >> optimization >>>> and >>>>>>>> exclude from this KIP. >>>>>>>> >>>>>>>>> What should be the behavior if there is no store >>>>>>>>> configured (e.g., if Materialized with only serdes) and >>>> querying is >>>>>>>>> enabled? >>>>>>>> >>>>>>>> IMHO, this could be an error case. If one wants to query a >> store, >>>>> they >>>>>>>> need to provide a name -- if you don't know the name, how would >>>> you >>>>>>>> actually query the store (even if it would be possible to get >> the >>>>> name >>>>>>>> from the `TopologyDescription`, it seems clumsy). >>>>>>>> >>>>>>>> If we don't want to throw an error, materializing seems to be >> the >>>>> right >>>>>>>> option, to exclude "query optimization" from this KIP. I would >> be >>>> ok >>>>>>>> with this option, even if it's clumsy to get the name from >>>>>>>> `TopologyDescription`; hence, I would prefer to treat it as an >>>> error. >>>>>>>> >>>>>>>>> To get back to the current behavior, users would have to >>>>>>>>> add a "bytes store supplier" to the Materialized to indicate >>>> that, >>>>>>>>> yes, they really want a state store there. >>>>>>>> >>>>>>>> This sound like a quite subtle semantic difference on how to use >>>> the >>>>>>>> API. Might be hard to explain to users. I would prefer to not >>>>>> introduce it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> About Guozhang's points: >>>>>>>> >>>>>>>> 1a) That is actually a good point. However, I believe we cannot >>>> get >>>>>>>> around this issue easily, and it seems ok to me, to expose the >>>> actual >>>>>>>> store type we are using. (More thoughts later.) >>>>>>>> >>>>>>>> 1b) I don't see an issue with allowing users to query all >> stores? >>>>> What >>>>>>>> is the rational behind it? What do we gain by not allowing it? >>>>>>>> >>>>>>>> 2) While I understand what you are saying, we also want/need to >>>> have >>>>> a >>>>>>>> way in the PAPI to allow users adding "internal/private" >>>>> non-queryable >>>>>>>> stores to a topology. That's possible via >>>>>>>> `Materialized#withQueryingDisabled()`. We could also update >>>>>>>> `Topology#addStateStore(StoreBuilder, boolean isQueryable, >>>>> String...)` >>>>>>>> to address this. Again, I agree with Bill that the current API >> is >>>>> built >>>>>>>> in a certain way, and if we want to change it, it should be a >>>>> separate >>>>>>>> KIP, as it seems to be an orthogonal concern. >>>>>>>> >>>>>>>>> Instead, we just restrict KIP-307 to NOT >>>>>>>>> use the Joined.name for state store names and always use >>>> internal >>>>>> names >>>>>>>> as >>>>>>>>> well, which admittedly indeed leaves a hole of not being able >> to >>>>>> cover >>>>>>>> all >>>>>>>>> internal names here >>>>>>>> >>>>>>>> I think it's important to close this gap. Naming entities seems >>>> to a >>>>>>>> binary feature: if there is a gap, the feature is more or less >>>>> useless, >>>>>>>> rendering KIP-307 void. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I like John's detailed list of required features and what >>>>>>>> Materialized/WindowByteStoreSuppliers offers. My take is, that >>>> adding >>>>>>>> Materialized including the required run-time checks is the best >>>>> option >>>>>>>> we have, for the following reasons: >>>>>>>> >>>>>>>> - the main purpose of this KIP is to close the naming gap what >> we >>>>>> achieve >>>>>>>> - we can allow people to use the new in-memory store >>>>>>>> - we allow people to enable/disable caching >>>>>>>> - we unify the API >>>>>>>> - we decouple querying from naming >>>>>>>> - it's a small API change >>>>>>>> >>>>>>>> Adding an overload and only passing in a name, would address the >>>> main >>>>>>>> purpose of the KIP. However, it falls short on all the other >>>>> "goodies". >>>>>>>> As you mentioned, passing in `Materialized` might not be perfect >>>> and >>>>>>>> maybe we need to deprecate is at some point; but this is also >> true >>>>> for >>>>>>>> passing in just a name. >>>>>>>> >>>>>>>> I am also not convinced, that a `StreamJoinStore` would resolve >>>> all >>>>> the >>>>>>>> issues. In the end, as long as we are using a `WindowedStore` >>>>>>>> internally, we need to expose this "implemenation detail" to >>>> users to >>>>>>>> allow them to plug in a custom store. Adding `Materialized` seem >>>> to >>>>> be >>>>>>>> the best short-term fix from my point of view. >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 6/27/19 9:56 AM, Guozhang Wang wrote: >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> I actually feels better about a new interface but I'm not sure >>>> if >>>>> we >>>>>>>> would >>>>>>>>> need the full configuration of store / log / cache, now or in >>>> the >>>>>> future >>>>>>>>> ever for stream-stream join. >>>>>>>>> >>>>>>>>> Right now I feel that 1) we want to improve our implementation >>>> of >>>>>>>>> stream-stream join, and potentially also allow users to >>>> customize >>>>>> this >>>>>>>>> implementation but with a more suitable interface than the >>>> current >>>>>>>>> WindowStore interface, how to do that is less clear and >>>>>> execution-wise >>>>>>>> it's >>>>>>>>> (arguably..) not urgent; 2) we want to close the last gap >>>>>> (Stream-stream >>>>>>>>> join) of allowing users to specify all internal names to help >> on >>>>>> backward >>>>>>>>> compatibility, which is urgent. >>>>>>>>> >>>>>>>>> Therefore if we want to unblock 2) from 1) in the near term, I >>>> feel >>>>>>>>> slightly inclined to just add overload functions that takes >> in a >>>>>> store >>>>>>>> name >>>>>>>>> for stream-stream joins only -- and admittedly, in the future >>>> this >>>>>>>> function >>>>>>>>> maybe deprecated -- i.e. if we have to do something that we >> "may >>>>>> regret" >>>>>>>> in >>>>>>>>> the future, I'd like to pick the least intrusive option. >>>>>>>>> >>>>>>>>> About `Joined#withStoreName`: since the Joined class itself is >>>> also >>>>>> used >>>>>>>> in >>>>>>>>> other join types, I feel less comfortable to have a >>>>>>>> `Joined#withStoreName` >>>>>>>>> which is only going to be used by stream-stream join. Or >> maybe I >>>>> miss >>>>>>>>> something here about the "latter" case that you are referring >>>> to? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Guozhang >>>>>>>>> >>>>>>>>> On Mon, Jun 24, 2019 at 12:16 PM John Roesler < >>>> j...@confluent.io> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks Guozhang, >>>>>>>>>> >>>>>>>>>> Yep. Maybe we can consider just exactly what the join needs: >>>>>>>>>> >>>>>>>>>>> the WindowStore<Bytes, byte[]> itself is fine, if overly >>>> broad, >>>>>>>>>>> since the only two methods we need are `window.put(key, >> value, >>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2> iter = >>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. >>>>>>>>>> >>>>>>>>>> One "middle ground" would be to extract _this_ into a new >> store >>>>>>>>>> interface, which only supports these API calls, like >>>>>>>>>> StreamJoinStore<K, V>. This would give us the latitude we >> need >>>> to >>>>>>>>>> efficiently support the exact operation without concerning >>>>> ourselves >>>>>>>>>> with all the other things a WindowStore can do (which are >>>>>> unreachable >>>>>>>>>> for the join use case). It would also let us drop "store >>>>> duplicates" >>>>>>>>>> from the main WindowStore interface, since it only exists to >>>>> support >>>>>>>>>> the join use case. >>>>>>>>>> >>>>>>>>>> If we were to add a new StreamJoinStore interface, then it'd >> be >>>>>>>>>> straightforward how we could add also >>>>>>>>>> `Materialized.as(StreamJoinBytesStoreSupplier)` and use >>>>>> Materialized, >>>>>>>>>> or alternatively add the ability to set the bytes store on >>>> Joined. >>>>>>>>>> >>>>>>>>>> Personally, I'm kind of leaning toward the latter (and also >>>> doing >>>>>>>>>> `Joined#withStoreName`), since adding the new interface to >>>>>>>>>> Materialized then also pollutes the interface for its >> _actual_ >>>> use >>>>>>>>>> case of materializing a table view. Of course, to solve the >>>>>> immediate >>>>>>>>>> problem, all we need is the store name, but we might feel >>>> better >>>>>> about >>>>>>>>>> adding the store name to Joined if we _also_ feel like in the >>>>>> future, >>>>>>>>>> we would add store/log/cache configuration to Joined as well. >>>>>>>>>> >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> On Mon, Jun 24, 2019 at 12:56 PM Guozhang Wang < >>>>> wangg...@gmail.com> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hello John, >>>>>>>>>>> >>>>>>>>>>> My main concern is exactly the first point at the bottom of >>>> your >>>>>>>> analysis >>>>>>>>>>> here: "* configure the bytes store". I'm not sure if using a >>>>> window >>>>>>>> bytes >>>>>>>>>>> store would be ideal for stream-stream windowed join; e.g. >> we >>>>> could >>>>>>>>>>> consider two dimensional list sorted by timestamps and then >> by >>>>>> keys to >>>>>>>> do >>>>>>>>>>> the join, whereas a windowed bytes store is basically sorted >>>> by >>>>> key >>>>>>>>>> first, >>>>>>>>>>> then by timestamp. If we expose the Materialized to let user >>>> pass >>>>>> in a >>>>>>>>>>> windowed bytes store, then we would need to change that if >> we >>>>> want >>>>>> to >>>>>>>>>>> replace it with a different implementation interface. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Guozhang >>>>>>>>>>> >>>>>>>>>>> On Mon, Jun 24, 2019 at 8:59 AM John Roesler < >>>> j...@confluent.io> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hey Guozhang and Bill, >>>>>>>>>>>> >>>>>>>>>>>> For what it's worth, I agree with you both! >>>>>>>>>>>> >>>>>>>>>>>> I think it might help the discussion to look concretely at >>>> what >>>>>>>>>>>> Materialized does: >>>>>>>>>>>> * set a WindowBytesStoreSupplier >>>>>>>>>>>> * set a name >>>>>>>>>>>> * set the key/value serdes >>>>>>>>>>>> * disable/enable/configure change-logging >>>>>>>>>>>> * disable/enable caching >>>>>>>>>>>> * configure retention >>>>>>>>>>>> >>>>>>>>>>>> Further, looking into the WindowBytesStoreSupplier, the >>>>> interface >>>>>> lets >>>>>>>>>> you: >>>>>>>>>>>> * get the segment interval >>>>>>>>>>>> * get the window size >>>>>>>>>>>> * get whether "duplicates" are enabled >>>>>>>>>>>> * get the retention period >>>>>>>>>>>> * (obviously) get a WindowStore<Bytes, byte[]> >>>>>>>>>>>> >>>>>>>>>>>> We know that Materialized isn't exactly what we need for >>>> stream >>>>>> joins, >>>>>>>>>>>> but we can see how close Materialized is to what we need. >> If >>>> it >>>>> is >>>>>>>>>>>> close, maybe we can use it and document the gaps, and if it >>>> is >>>>> not >>>>>>>>>>>> close, then maybe we should just add what we need to >> Joined. >>>>>>>>>>>> Stream Join's requirements for its stores: >>>>>>>>>>>> * a multimap store (i.e., it keeps duplicates) for storing >>>>> general >>>>>>>>>>>> (not windowed) keyed records associated with their >> insertion >>>>>> time, and >>>>>>>>>>>> allows efficient time-bounded lookups and also efficient >>>> purges >>>>>> of old >>>>>>>>>>>> data. >>>>>>>>>>>> ** Note, a properly configured WindowBytesStoreSupplier >>>>> satisfies >>>>>> this >>>>>>>>>>>> requirement, and the interface supports the queries we need >>>> to >>>>>> verify >>>>>>>>>>>> the configuration at run-time >>>>>>>>>>>> * set a name for the store >>>>>>>>>>>> * do _not_ set the serdes (they are already set in Joined) >>>>>>>>>>>> * logging could be configurable (set to enabled now) >>>>>>>>>>>> * caching could be configurable (set to enabled now) >>>>>>>>>>>> * do _not_ configure retention (determined by JoinWindows) >>>>>>>>>>>> >>>>>>>>>>>> So, out of six capabilities for Materialized, there are two >>>> we >>>>>> don't >>>>>>>>>>>> want (serdes and retention). These would become run-time >>>> checks >>>>>> if we >>>>>>>>>>>> use it. >>>>>>>>>>>> >>>>>>>>>>>> A third questionable capability is to provide a >>>>>>>>>>>> WindowBytesStoreSupplier. Looking at whether the >>>>>>>>>>>> WindowBytesStoreSupplier is the right interface for Stream >>>> Join: >>>>>>>>>>>> * configuring segment interval is fine >>>>>>>>>>>> * should _not_ configure window size (it's determined by >>>>>> JoinWindows) >>>>>>>>>>>> * duplicates _must_ be enabled >>>>>>>>>>>> * retention should be _at least_ windowSize + gracePeriod, >>>> but >>>>>> note >>>>>>>>>>>> that (unlike for Table window stores) there is no utility >> in >>>>>> having a >>>>>>>>>>>> longer retention time. >>>>>>>>>>>> * the WindowStore<Bytes, byte[]> itself is fine, if overly >>>>> broad, >>>>>>>>>>>> since the only two methods we need are `window.put(key, >>>> value, >>>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2> iter = >>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. >>>>>>>>>>>> >>>>>>>>>>>> Thus, flattening out the overlap for >> WindowBytesStoreSupplier >>>>>> onto the >>>>>>>>>>>> overlap for Materialized, we have 9 capabilities total >> (note >>>>>> retention >>>>>>>>>>>> is duplicated), we have 4 that we don't want: >>>>>>>>>>>> * do _not_ set the serdes (they are already set in Joined) >>>>>>>>>>>> * do _not_ configure retention (determined by JoinWindows) >>>>>>>>>>>> * should _not_ configure window size (it's determined by >>>>>> JoinWindows) >>>>>>>>>>>> * duplicates _must_ be enabled >>>>>>>>>>>> >>>>>>>>>>>> These gaps would have to be covered with run-time checks if >>>> we >>>>>> re-use >>>>>>>>>>>> Materialized and WindowStoreBytesStoreSupplier both. Maybe >>>> this >>>>>> sounds >>>>>>>>>>>> bad, but consider the other side, that we get 5 new >>>> capabilities >>>>>> we >>>>>>>>>>>> don't require, but are still pretty nice: >>>>>>>>>>>> * configure the bytes store >>>>>>>>>>>> * set a name for the store >>>>>>>>>>>> * configure caching >>>>>>>>>>>> * configure logging >>>>>>>>>>>> * configure segment interval >>>>>>>>>>>> >>>>>>>>>>>> Not sure where this nets us out, but it's food for thought. >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:52 PM Guozhang Wang < >>>>> wangg...@gmail.com >>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Bill, >>>>>>>>>>>>> >>>>>>>>>>>>> I think by giving a Materialized param into stream-stream >>>> join, >>>>>> it's >>>>>>>>>> okay >>>>>>>>>>>>> (though still ideal) to say "we still would not expose the >>>>> store >>>>>> for >>>>>>>>>>>>> queries", but it would sound a bit awkward to say "we >> would >>>>> also >>>>>>>>>> ignore >>>>>>>>>>>>> whatever the passed in store supplier but just use our >>>> default >>>>>> ones" >>>>>>>>>> -- >>>>>>>>>>>>> again the concern is that, if in the future we'd want to >>>> change >>>>>> the >>>>>>>>>>>> default >>>>>>>>>>>>> implementation of join algorithm which no longer rely on a >>>>> window >>>>>>>>>> store >>>>>>>>>>>>> with deduping enabled, then we need to change this API >>>> again by >>>>>>>>>> changing >>>>>>>>>>>>> the store supplier type. >>>>>>>>>>>>> >>>>>>>>>>>>> If we do want to fill this hole for stream-stream join, I >>>> feel >>>>>> just >>>>>>>>>>>> adding >>>>>>>>>>>>> a String typed store-name would even be less >>>> future-intrusive >>>>> if >>>>>> we >>>>>>>>>>>> expect >>>>>>>>>>>>> this parameter to be modified later. >>>>>>>>>>>>> >>>>>>>>>>>>> Does that makes sense? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Guozhang >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Jun 22, 2019 at 12:51 PM Bill Bejeck < >>>>> bbej...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the comments John and Guozhang, I'll address >>>> each >>>>>> one of >>>>>>>>>>>> your >>>>>>>>>>>>>> comments in turn. >>>>>>>>>>>>>> >>>>>>>>>>>>>> John, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the truth >>>> table >>>>>>>>>> involving >>>>>>>>>>>>>>> whether a Materialized is stored or not and querying is >>>>>>>>>>>>>>> enabled/disabled... What should be the behavior if there >>>> is >>>>> no >>>>>>>>>> store >>>>>>>>>>>>>>> configured (e.g., if Materialized with only serdes) and >>>>>> querying >>>>>>>>>> is >>>>>>>>>>>>>> enabled? >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It seems we have two choices: >>>>>>>>>>>>>>> 1. we can force creation of a state store in this case, >> so >>>>> the >>>>>>>>>> store >>>>>>>>>>>>>>> can be used to serve the queries >>>>>>>>>>>>>>> 2. we can provide just a queriable view, basically >>>> letting IQ >>>>>>>>>> query >>>>>>>>>>>>>>> into the "KTableValueGetter", which would transparently >>>>>>>>>> construct the >>>>>>>>>>>>>>> query response by applying the operator logic to the >>>> upstream >>>>>>>>>> state >>>>>>>>>>>> if >>>>>>>>>>>>>>> the operator state isn't already stored. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree with your assertion about a missing quadrant from >>>> the >>>>>> truth >>>>>>>>>>>> table. >>>>>>>>>>>>>> Additionally, I too like the concept of a queriable view. >>>>> But I >>>>>>>>>> think >>>>>>>>>>>> that >>>>>>>>>>>>>> goes a bit beyond the scope of this KIP and would like to >>>>> pursue >>>>>>>>>> that >>>>>>>>>>>>>> feature as follow-on work. Also thinking about this KIP >>>> some >>>>>>>>>> more, I'm >>>>>>>>>>>>>> thinking of the changes to Materialized might be a reach >> as >>>>>> well. >>>>>>>>>>>>>> Separating the naming from a store and its queryable >> state >>>>> seems >>>>>>>>>> like a >>>>>>>>>>>>>> complex issue in and of itself and should be treated >>>>>> accordingly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So here's what I'm thinking now. We add Materialzied to >>>> Join, >>>>>> but >>>>>>>>>> for >>>>>>>>>>>> now, >>>>>>>>>>>>>> we internally disable querying. I know this breaks our >>>>> current >>>>>>>>>>>> semantic >>>>>>>>>>>>>> approach, but I think it's crucial that we do two things >> in >>>>> this >>>>>>>>>> KIP >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Break the naming of the state stores from Joined to >>>>>>>>>>>> Materialized, so >>>>>>>>>>>>>> the naming of state stores follows our current pattern >>>> and >>>>>>>>>> enables >>>>>>>>>>>>>> upgrades >>>>>>>>>>>>>> from 2.3 to 2.4 >>>>>>>>>>>>>> 2. Offer the ability to configure the state stores of >>>> the >>>>>> join, >>>>>>>>>> even >>>>>>>>>>>>>> providing a different implementation (i.e. in-memory) >> if >>>>>>>>>> desired. >>>>>>>>>>>>>> >>>>>>>>>>>>>> With that in mind I'm considering changing the KIP to >>>> remove >>>>> the >>>>>>>>>>>> changes to >>>>>>>>>>>>>> Materialized, and we document very clearly that by >>>> providing a >>>>>>>>>>>> Materialized >>>>>>>>>>>>>> object with a name is only for naming the state store, >>>> hence >>>>> the >>>>>>>>>>>> changelog >>>>>>>>>>>>>> topics and any possible configurations of the store, but >>>> this >>>>>> store >>>>>>>>>>>> *will >>>>>>>>>>>>>> not be available for IQ.* >>>>>>>>>>>>>> >>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guozhang, >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. About not breaking compatibility of stream-stream >> join >>>>>>>>>>>> materialized >>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, but >> after >>>>>>>>>> thinking >>>>>>>>>>>> about >>>>>>>>>>>>>>> it once more I'm not sure if exposing Materialized would >>>> be a >>>>>>>>>> good >>>>>>>>>>>>>> solution >>>>>>>>>>>>>>> here. My rationles: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of >>>> window-store >>>>>> is >>>>>>>>>> not >>>>>>>>>>>>>> ideal, >>>>>>>>>>>>>>> and we want to modify it in the near future to be more >>>>>>>>>> efficient. Not >>>>>>>>>>>>>>> allowing users to override such state store backend >> gives >>>> us >>>>>> such >>>>>>>>>>>> freedom >>>>>>>>>>>>>>> (which was also considered in the original DSL design), >>>>> whereas >>>>>>>>>>>> getting a >>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out that >> freedom >>>>> out >>>>>>>>>> of the >>>>>>>>>>>>>>> window. >>>>>>>>>>>>>>> 1.b For strema-stream join, in our original design we >>>> intend >>>>> to >>>>>>>>>>>> "never" >>>>>>>>>>>>>>> want users to query the state, since it is just for >>>> buffering >>>>>> the >>>>>>>>>>>>>> upcoming >>>>>>>>>>>>>>> records from the stream. Now I know that some users may >>>>> indeed >>>>>>>>>> want >>>>>>>>>>>> to >>>>>>>>>>>>>>> query it from the debugging perspective, but still I >>>>> concerned >>>>>>>>>> about >>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes would be >> the >>>>> right >>>>>>>>>>>> solution >>>>>>>>>>>>>>> here. And adding Materialized object opens the door to >> let >>>>>> users >>>>>>>>>>>> query >>>>>>>>>>>>>>> about it (unless we did something intentionally to still >>>>>> forbids >>>>>>>>>> it), >>>>>>>>>>>>>> which >>>>>>>>>>>>>>> also restricts us in the future. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. About the coupling between Materialized.name() and >>>>>> queryable: >>>>>>>>>>>> again I >>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if the >>>> current >>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at Materialized is the >>>> best >>>>>>>>>>>> approach. >>>>>>>>>>>>>>> Here I think I agree with John, that generally speaking >>>> it's >>>>>>>>>> better >>>>>>>>>>>> be a >>>>>>>>>>>>>>> control function on the `KTable` itself, rather than on >>>>>>>>>>>> `Materialized`, >>>>>>>>>>>>>> so >>>>>>>>>>>>>>> fixing it via adding functions through `Materialized` >>>> seems >>>>>> not a >>>>>>>>>>>> natural >>>>>>>>>>>>>> approach either. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand your thoughts here, and up to a point, I >> agree >>>>> with >>>>>>>>>> you. >>>>>>>>>>>>>> But concerning not providing Materialized as it may >>>> restrict >>>>> us >>>>>> in >>>>>>>>>> the >>>>>>>>>>>>>> future for delivering different implementations, I'm >>>> wondering >>>>>> if >>>>>>>>>> we >>>>>>>>>>>> are >>>>>>>>>>>>>> doing some premature optimization here. >>>>>>>>>>>>>> My rationale for saying so >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. I think the cost of not allowing the naming of >> state >>>>>> stores >>>>>>>>>> for >>>>>>>>>>>> joins >>>>>>>>>>>>>> is too big of a gap to leave. IMHO for joins to >> follow >>>>> the >>>>>>>>>> current >>>>>>>>>>>>>> pattern of using Materialized for naming state stores >>>> would >>>>>> be >>>>>>>>>> what >>>>>>>>>>>> most >>>>>>>>>>>>>> users would expect to use. As I said in my comments >>>>> above, I >>>>>>>>>> think >>>>>>>>>>>> we >>>>>>>>>>>>>> should *not include* the changes to Materialized and >>>>> enforce >>>>>>>>>> named >>>>>>>>>>>>>> stores for joins as unavailable for IQ. >>>>>>>>>>>>>> 2. We'll still have the join methods available >> without a >>>>>>>>>>>> Materialized >>>>>>>>>>>>>> allowing us to do something different internally if a >>>>>>>>>> Materialized >>>>>>>>>>>> is >>>>>>>>>>>>>> not >>>>>>>>>>>>>> provided. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use two >> stones >>>>>> rather >>>>>>>>>>>> than >>>>>>>>>>>>>> one >>>>>>>>>>>>>>> to kill these two birds, and probably for this KIP we >> just >>>>>> focus >>>>>>>>>> on >>>>>>>>>>>> 1) >>>>>>>>>>>>>>> above. And for that I'd like to not expose the >>>> Materialized >>>>>>>>>> either >>>>>>>>>>>> for >>>>>>>>>>>>>>> rationales that I've listed above. Instead, we just >>>> restrict >>>>>>>>>> KIP-307 >>>>>>>>>>>> to >>>>>>>>>>>>>> NOT >>>>>>>>>>>>>>> use the Joined.name for state store names and always use >>>>>> internal >>>>>>>>>>>> names >>>>>>>>>>>>>> as >>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not being >>>> able >>>>>> to >>>>>>>>>>>> cover >>>>>>>>>>>>>> all >>>>>>>>>>>>>>> internal names here, but now I feel this `hole` may >>>> better be >>>>>>>>>> filled >>>>>>>>>>>> by, >>>>>>>>>>>>>>> e.g. not creating changelog topics but just use the >>>> upstream >>>>> to >>>>>>>>>>>>>>> re-bootstrap the materialized store, more concretely: >> when >>>>>>>>>>>> materializing >>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic on an >>>>> existing >>>>>>>>>>>> topic, >>>>>>>>>>>>>> e.g. >>>>>>>>>>>>>>> a) if the stream is coming directly from some source >> topic >>>>>>>>>> (including >>>>>>>>>>>>>>> repartition topic), make that as changelog topic and if >>>> it is >>>>>>>>>>>> repartition >>>>>>>>>>>>>>> topic change the retention / data purging policy >>>> necessarily >>>>> as >>>>>>>>>>>> well; b) >>>>>>>>>>>>>> if >>>>>>>>>>>>>>> the stream is coming from some stateless operators, >>>> delegate >>>>>> that >>>>>>>>>>>>>> stateless >>>>>>>>>>>>>>> operator to the parent stream similar as a); if the >>>> stream is >>>>>>>>>> coming >>>>>>>>>>>> from >>>>>>>>>>>>>> a >>>>>>>>>>>>>>> stream-stream join which is the only stateful operator >>>> that >>>>> can >>>>>>>>>>>> result in >>>>>>>>>>>>>> a >>>>>>>>>>>>>>> stream, consider merging the join into multi-way joins >>>> (yes, >>>>>>>>>> this is >>>>>>>>>>>> a >>>>>>>>>>>>>> very >>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we do not >>>> try >>>>> to >>>>>>>>>>>> tackle it >>>>>>>>>>>>>>> now but leave it for a better solution :). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I really like this idea! I agree with you in that this >>>>> approach >>>>>>>>>> to too >>>>>>>>>>>>>> much for adding in this KIP, but we could pick it up >> later >>>> and >>>>>>>>>>>> leverage the >>>>>>>>>>>>>> Optimization framework to accomplish this re-use. >>>>>>>>>>>>>> Again, while I agree we should break the naming of join >>>> state >>>>>>>>>> stores >>>>>>>>>>>> from >>>>>>>>>>>>>> KIP-307, IMHO it's something we should fix now as it will >>>> be >>>>> the >>>>>>>>>> last >>>>>>>>>>>> piece >>>>>>>>>>>>>> we can provide to give users the ability to completely >> make >>>>>> their >>>>>>>>>>>>>> topologies "upgrade proof" when adding additional >>>> operations. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks again to both of you for comments and I look >>>> forward to >>>>>>>>>> hearing >>>>>>>>>>>> back >>>>>>>>>>>>>> from you. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Bill >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:33 PM Guozhang Wang < >>>>>> wangg...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello Bill, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the KIP. Glad to see that we can likely >>>> shooting >>>>> two >>>>>>>>>> birds >>>>>>>>>>>>>> with >>>>>>>>>>>>>>> one stone. I have some concerns though about those "two >>>>> birds" >>>>>>>>>>>>>> themselves: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. About not breaking compatibility of stream-stream >> join >>>>>>>>>>>> materialized >>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, but >> after >>>>>>>>>> thinking >>>>>>>>>>>> about >>>>>>>>>>>>>>> it once more I'm not sure if exposing Materialized would >>>> be a >>>>>>>>>> good >>>>>>>>>>>>>> solution >>>>>>>>>>>>>>> here. My rationles: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of >>>> window-store >>>>>> is >>>>>>>>>> not >>>>>>>>>>>>>> ideal, >>>>>>>>>>>>>>> and we want to modify it in the near future to be more >>>>>>>>>> efficient. Not >>>>>>>>>>>>>>> allowing users to override such state store backend >> gives >>>> us >>>>>> such >>>>>>>>>>>> freedom >>>>>>>>>>>>>>> (which was also considered in the original DSL design), >>>>> whereas >>>>>>>>>>>> getting a >>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out that >> freedom >>>>> out >>>>>>>>>> of the >>>>>>>>>>>>>>> window. >>>>>>>>>>>>>>> 1.b For strema-stream join, in our original design we >>>> intend >>>>> to >>>>>>>>>>>> "never" >>>>>>>>>>>>>>> want users to query the state, since it is just for >>>> buffering >>>>>> the >>>>>>>>>>>>>> upcoming >>>>>>>>>>>>>>> records from the stream. Now I know that some users may >>>>> indeed >>>>>>>>>> want >>>>>>>>>>>> to >>>>>>>>>>>>>>> query it from the debugging perspective, but still I >>>>> concerned >>>>>>>>>> about >>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes would be >> the >>>>> right >>>>>>>>>>>> solution >>>>>>>>>>>>>>> here. And adding Materialized object opens the door to >> let >>>>>> users >>>>>>>>>>>> query >>>>>>>>>>>>>>> about it (unless we did something intentionally to still >>>>>> forbids >>>>>>>>>> it), >>>>>>>>>>>>>> which >>>>>>>>>>>>>>> also restricts us in the future. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. About the coupling between Materialized.name() and >>>>>> queryable: >>>>>>>>>>>> again I >>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if the >>>> current >>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at Materialized is the >>>> best >>>>>>>>>>>> approach. >>>>>>>>>>>>>>> Here I think I agree with John, that generally speaking >>>> it's >>>>>>>>>> better >>>>>>>>>>>> be a >>>>>>>>>>>>>>> control function on the `KTable` itself, rather than on >>>>>>>>>>>> `Materialized`, >>>>>>>>>>>>>> so >>>>>>>>>>>>>>> fixing it via adding functions through `Materialized` >>>> seems >>>>>> not a >>>>>>>>>>>> natural >>>>>>>>>>>>>>> approach either. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use two >> stones >>>>>> rather >>>>>>>>>>>> than >>>>>>>>>>>>>> one >>>>>>>>>>>>>>> to kill these two birds, and probably for this KIP we >> just >>>>>> focus >>>>>>>>>> on >>>>>>>>>>>> 1) >>>>>>>>>>>>>>> above. And for that I'd like to not expose the >>>> Materialized >>>>>>>>>> either >>>>>>>>>>>> for >>>>>>>>>>>>>>> rationales that I've listed above. Instead, we just >>>> restrict >>>>>>>>>> KIP-307 >>>>>>>>>>>> to >>>>>>>>>>>>>> NOT >>>>>>>>>>>>>>> use the Joined.name for state store names and always use >>>>>> internal >>>>>>>>>>>> names >>>>>>>>>>>>>> as >>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not being >>>> able >>>>>> to >>>>>>>>>>>> cover >>>>>>>>>>>>>> all >>>>>>>>>>>>>>> internal names here, but now I feel this `hole` may >>>> better be >>>>>>>>>> filled >>>>>>>>>>>> by, >>>>>>>>>>>>>>> e.g. not creating changelog topics but just use the >>>> upstream >>>>> to >>>>>>>>>>>>>>> re-bootstrap the materialized store, more concretely: >> when >>>>>>>>>>>> materializing >>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic on an >>>>> existing >>>>>>>>>>>> topic, >>>>>>>>>>>>>> e.g. >>>>>>>>>>>>>>> a) if the stream is coming directly from some source >> topic >>>>>>>>>> (including >>>>>>>>>>>>>>> repartition topic), make that as changelog topic and if >>>> it is >>>>>>>>>>>> repartition >>>>>>>>>>>>>>> topic change the retention / data purging policy >>>> necessarily >>>>> as >>>>>>>>>>>> well; b) >>>>>>>>>>>>>> if >>>>>>>>>>>>>>> the stream is coming from some stateless operators, >>>> delegate >>>>>> that >>>>>>>>>>>>>> stateless >>>>>>>>>>>>>>> operator to the parent stream similar as a); if the >>>> stream is >>>>>>>>>> coming >>>>>>>>>>>>>> from a >>>>>>>>>>>>>>> stream-stream join which is the only stateful operator >>>> that >>>>> can >>>>>>>>>>>> result >>>>>>>>>>>>>> in a >>>>>>>>>>>>>>> stream, consider merging the join into multi-way joins >>>> (yes, >>>>>>>>>> this is >>>>>>>>>>>> a >>>>>>>>>>>>>> very >>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we do not >>>> try >>>>> to >>>>>>>>>>>> tackle it >>>>>>>>>>>>>>> now but leave it for a better solution :). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 11:41 AM John Roesler < >>>>>> j...@confluent.io >>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Bill, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the KIP! Awesome job catching this >> unexpected >>>>>>>>>>>> consequence >>>>>>>>>>>>>>>> of the prior KIPs before it was released. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The proposal looks good to me. On top of just fixing >> the >>>>>>>>>> problem, >>>>>>>>>>>> it >>>>>>>>>>>>>>>> seems to address two other pain points: >>>>>>>>>>>>>>>> * that naming a state store automatically causes it to >>>>> become >>>>>>>>>>>>>> queriable. >>>>>>>>>>>>>>>> * that there's currently no way to configure the bytes >>>> store >>>>>>>>>> for >>>>>>>>>>>> join >>>>>>>>>>>>>>>> windows. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It's awesome that we can fix this issue and two others >>>> with >>>>>> one >>>>>>>>>>>>>> feature. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the truth >>>> table >>>>>>>>>>>> involving >>>>>>>>>>>>>>>> whether a Materialized is stored or not and querying is >>>>>>>>>>>>>>>> enabled/disabled... What should be the behavior if >> there >>>> is >>>>> no >>>>>>>>>>>> store >>>>>>>>>>>>>>>> configured (e.g., if Materialized with only serdes) and >>>>>>>>>> querying is >>>>>>>>>>>>>>>> enabled? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It seems we have two choices: >>>>>>>>>>>>>>>> 1. we can force creation of a state store in this case, >>>> so >>>>> the >>>>>>>>>>>> store >>>>>>>>>>>>>>>> can be used to serve the queries >>>>>>>>>>>>>>>> 2. we can provide just a queriable view, basically >>>> letting >>>>> IQ >>>>>>>>>> query >>>>>>>>>>>>>>>> into the "KTableValueGetter", which would transparently >>>>>>>>>> construct >>>>>>>>>>>> the >>>>>>>>>>>>>>>> query response by applying the operator logic to the >>>>> upstream >>>>>>>>>>>> state if >>>>>>>>>>>>>>>> the operator state isn't already stored. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Offhand, it seems like the second is actually a pretty >>>>> awesome >>>>>>>>>>>>>>>> capability. But it might have an awkward interaction >> with >>>>> the >>>>>>>>>>>> current >>>>>>>>>>>>>>>> semantics. Presently, if I provide a >>>> Materialized.withName, >>>>> it >>>>>>>>>>>> implies >>>>>>>>>>>>>>>> that querying should be enabled AND that the view >> should >>>>>>>>>> actually >>>>>>>>>>>> be >>>>>>>>>>>>>>>> stored in a state store. Under option 2 above, this >>>> behavior >>>>>>>>>> would >>>>>>>>>>>>>>>> change to NOT provision a state store and instead just >>>>> consult >>>>>>>>>> the >>>>>>>>>>>>>>>> ValueGetter. To get back to the current behavior, users >>>>> would >>>>>>>>>> have >>>>>>>>>>>> to >>>>>>>>>>>>>>>> add a "bytes store supplier" to the Materialized to >>>> indicate >>>>>>>>>> that, >>>>>>>>>>>>>>>> yes, they really want a state store there. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Behavior changes are always kind of scary, but I think >> in >>>>> this >>>>>>>>>>>> case, >>>>>>>>>>>>>>>> it might actually be preferable. In the event where >> only >>>> the >>>>>>>>>> name >>>>>>>>>>>> is >>>>>>>>>>>>>>>> provided, it means that people just wanted to make the >>>>>>>>>> operation >>>>>>>>>>>>>>>> result queriable. If we automatically convert this to a >>>>>>>>>> non-stored >>>>>>>>>>>>>>>> view, then simply upgrading results in the same >>>> observable >>>>>>>>>> behavior >>>>>>>>>>>>>>>> and semantics, but a linear reduction in local storage >>>>>>>>>> requirements >>>>>>>>>>>>>>>> and disk i/o, as well as a corresponding linear >>>> reduction in >>>>>>>>>> memory >>>>>>>>>>>>>>>> usage both on and off heap. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jun 18, 2019 at 9:21 PM Bill Bejeck < >>>>>> bbej...@gmail.com >>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd like to start a discussion for adding a >> Materialized >>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>> object to KStream.join for naming state stores >> involved >>>> in >>>>>>>>>> joins. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+Materialized+to+Join >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Your comments and suggestions are welcome. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Bill >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- Guozhang >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> -- Guozhang >>>> >>> >> > >
signature.asc
Description: OpenPGP digital signature