Eno, re: GlobalKTable - yeah that seems fine. On Tue, 11 Apr 2017 at 14:18 Eno Thereska <eno.there...@gmail.com> wrote:
> About GlobalKTables, I suppose there is no reason why they cannot also use > this KIP for consistency, e.g., today you have: > > public <K, V> GlobalKTable<K, V> globalTable(final Serde<K> keySerde, > final Serde<V> valSerde, > final String topic, > final String storeName) > > For consistency with the KIP you could also have an overload without the > store name, for people who want to construct a global ktable, but don't > care about querying it directly: > > public <K, V> GlobalKTable<K, V> globalTable(final Serde<K> keySerde, > final Serde<V> valSerde, > final String topic) > > Damian, what do you think? I'm thinking of adding this to KIP. Thanks to > Michael for bringing it up. > > Eno > > > > > On 11 Apr 2017, at 06:13, Eno Thereska <eno.there...@gmail.com> wrote: > > > > Hi Michael, comments inline: > > > >> On 11 Apr 2017, at 03:25, Michael Noll <mich...@confluent.io> wrote: > >> > >> Thanks for the updates, Eno! > >> > >> In addition to what has already been said: We should also explicitly > >> mention that this KIP is not touching GlobalKTable. I'm sure that some > >> users will throw KTable and GlobalKTable into one conceptual "it's all > >> tables!" bucket and then wonder how the KIP might affect global tables. > > > > Good point, I'll add. > > > > > >> > >> Damian wrote: > >>> I think if no store name is provided users would still be able to query > >> the > >>> store, just the store name would be some internally generated name. > They > >>> would be able to discover those names via the IQ API. > >> > >> I, too, think that users should be able to query a store even if its > name > >> was internally generated. After all, the data is already there / > >> materialized. > > > > Yes, there is nothing that will prevent users from querying internally > generated stores, but they cannot > > assume a store will necessarily be queryable. So if it's there, they can > query it. If it's not there, and they didn't > > provide a queryable name, they cannot complain and say "hey, where is my > store". If they must absolutely be certain that > > a store is queryable, then they must provide a queryable name. > > > > > >> > >> > >> Damian wrote: > >>> I think for some stores it will make sense to not create a physical > >> store, i.e., > >>> for thinks like `filter`, as this will save the rocksdb overhead. But i > >> guess that > >>> is more of an implementation detail. > >> > >> I think it would help if the KIP would clarify what we'd do in such a > >> case. For example, if the user did not specify a store name for > >> `KTable#filter` -- would it be queryable? If so, would this imply we'd > >> always materialize the state store, or...? > > > > I'll clarify in the KIP with some more examples. Materialization will be > an internal concept. A store can be queryable whether it's materialized or > not > > (e.g., through advanced implementations that compute the value of a > filter on a fly, rather than materialize the answer). > > > > Thanks, > > Eno > > > > > >> > >> -Michael > >> > >> > >> > >> > >> On Tue, Apr 11, 2017 at 9:14 AM, Damian Guy <damian....@gmail.com> > wrote: > >> > >>> Hi Eno, > >>> > >>> Thanks for the update. I agree with what Matthias said. I wonder if > the KIP > >>> should talk less about materialization and more about querying? After > all, > >>> that is what is being provided from an end-users perspective. > >>> > >>> I think if no store name is provided users would still be able to > query the > >>> store, just the store name would be some internally generated name. > They > >>> would be able to discover those names via the IQ API > >>> > >>> I think for some stores it will make sense to not create a physical > store, > >>> i.e., for thinks like `filter`, as this will save the rocksdb > overhead. But > >>> i guess that is more of an implementation detail. > >>> > >>> Cheers, > >>> Damian > >>> > >>> On Tue, 11 Apr 2017 at 00:36 Eno Thereska <eno.there...@gmail.com> > wrote: > >>> > >>>> Hi Matthias, > >>>> > >>>>> However, this still forces users, to provide a name for store that we > >>>>> must materialize, even if users are not interested in querying the > >>>>> stores. Thus, I would like to have overloads for all currently > existing > >>>>> methods having mandatory storeName paremeter, with overloads, that do > >>>>> not require the storeName parameter. > >>>> > >>>> > >>>> Oh yeah, absolutely, this is part of the KIP. I guess I didn't make it > >>>> clear, I'll clarify. > >>>> > >>>> Thanks > >>>> Eno > >>>> > >>>> > >>>>> On 10 Apr 2017, at 16:00, Matthias J. Sax <matth...@confluent.io> > >>> wrote: > >>>>> > >>>>> Thanks for pushing this KIP Eno. > >>>>> > >>>>> The update give a very clear description about the scope, that is > super > >>>>> helpful for the discussion! > >>>>> > >>>>> - To put it into my own words, the KIP focus is on enable to query > all > >>>>> KTables. > >>>>> ** The ability to query a store is determined by providing a name for > >>>>> the store. > >>>>> ** At the same time, providing a name -- and thus making a store > >>>>> queryable -- does not say anything about an actual materialization > (ie, > >>>>> being queryable and being materialized are orthogonal). > >>>>> > >>>>> > >>>>> I like this overall a lot. However, I would go one step further. > Right > >>>>> now, you suggest to add new overload methods that allow users to > >>> specify > >>>>> a storeName -- if `null` is provided and the store is not > materialized, > >>>>> we ignore it completely -- if `null` is provided but the store must > be > >>>>> materialized we generate a internal name. So far so good. > >>>>> > >>>>> However, this still forces users, to provide a name for store that we > >>>>> must materialize, even if users are not interested in querying the > >>>>> stores. Thus, I would like to have overloads for all currently > existing > >>>>> methods having mandatory storeName paremeter, with overloads, that do > >>>>> not require the storeName parameter. > >>>>> > >>>>> Otherwise, we would still have some methods which optional storeName > >>>>> parameter and other method with mandatory storeName parameter -- > thus, > >>>>> still some inconsistency. > >>>>> > >>>>> > >>>>> -Matthias > >>>>> > >>>>> > >>>>> On 4/9/17 8:35 AM, Eno Thereska wrote: > >>>>>> Hi there, > >>>>>> > >>>>>> I've now done a V2 of the KIP, that hopefully addresses the feedback > >>> in > >>>> this discussion thread: > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>> 114%3A+KTable+materialization+and+improved+semantics > >>>> < > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > >>> 114:+KTable+materialization+and+improved+semantics>. > >>>> Notable changes: > >>>>>> > >>>>>> - clearly outline what is in the scope of the KIP and what is not. > We > >>>> ran into the issue where lots of useful, but somewhat tangential > >>>> discussions came up on interactive queries, declarative DSL etc. The > >>> exact > >>>> scope of this KIP is spelled out. > >>>>>> - decided to go with overloaded methods, not .materialize(), to stay > >>>> within the spirit of the current declarative DSL. > >>>>>> - clarified the depreciation plan > >>>>>> - listed part of the discussion we had under rejected alternatives > >>>>>> > >>>>>> If you have any further feedback on this, let's continue on this > >>> thread. > >>>>>> > >>>>>> Thank you > >>>>>> Eno > >>>>>> > >>>>>> > >>>>>>> On 1 Feb 2017, at 09:04, Eno Thereska <eno.there...@gmail.com> > >>> wrote: > >>>>>>> > >>>>>>> Thanks everyone! I think it's time to do a V2 on the KIP so I'll do > >>>> that and we can see how it looks and continue the discussion from > there. > >>>> Stay tuned. > >>>>>>> > >>>>>>> Thanks > >>>>>>> Eno > >>>>>>> > >>>>>>>> On 30 Jan 2017, at 17:23, Matthias J. Sax <matth...@confluent.io> > >>>> wrote: > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> I think Eno's separation is very clear and helpful. In order to > >>>>>>>> streamline this discussion, I would suggest we focus back on point > >>> (1) > >>>>>>>> only, as this is the original KIP question. > >>>>>>>> > >>>>>>>> Even if I started to DSL design discussion somehow, because I > >>> thought > >>>> it > >>>>>>>> might be helpful to resolve both in a single shot, I feel that we > >>> have > >>>>>>>> too many options about DSL design and we should split it up in two > >>>>>>>> steps. This will have the disadvantage that we will change the API > >>>>>>>> twice, but still, I think it will be a more focused discussion. > >>>>>>>> > >>>>>>>> I just had another look at the KIP, an it proposes 3 changes: > >>>>>>>> > >>>>>>>> 1. add .materialized() -> IIRC it was suggested to name this > >>>>>>>> .materialize() though (can you maybe update the KIP Eno?) > >>>>>>>> 2. remove print(), writeAsText(), and foreach() > >>>>>>>> 3. rename toStream() to toKStream() > >>>>>>>> > >>>>>>>> > >>>>>>>> I completely agree with (2) -- not sure about (3) though because > >>>>>>>> KStreamBuilder also hast .stream() and .table() as methods. > >>>>>>>> > >>>>>>>> However, we might want to introduce a KStream#toTable() -- this > was > >>>>>>>> requested multiple times -- might also be part of a different KIP. > >>>>>>>> > >>>>>>>> > >>>>>>>> Thus, we end up with (1). I would suggest to do a step backward > here > >>>> and > >>>>>>>> instead of a discussion how to express the changes in the DSL (new > >>>>>>>> overload, new methods...) we should discuss what the actual change > >>>>>>>> should be. Like (1) materialize all KTable all the time (2) all > the > >>>> user > >>>>>>>> to force a materialization to enable querying the KTable (3) allow > >>> for > >>>>>>>> queryable non-materialized KTable. > >>>>>>>> > >>>>>>>> On more question is, if we want to allow a user-forced > >>> materialization > >>>>>>>> only as as local store without changelog, or both (together / > >>>>>>>> independently)? We got some request like this already. > >>>>>>>> > >>>>>>>> > >>>>>>>> -Matthias > >>>>>>>> > >>>>>>>> > >>>>>>>> On 1/30/17 3:50 AM, Jan Filipiak wrote: > >>>>>>>>> Hi Eno, > >>>>>>>>> > >>>>>>>>> thanks for putting into different points. I want to put a few > >>> remarks > >>>>>>>>> inline. > >>>>>>>>> > >>>>>>>>> Best Jan > >>>>>>>>> > >>>>>>>>> On 30.01.2017 12:19, Eno Thereska wrote: > >>>>>>>>>> So I think there are several important discussion threads that > are > >>>>>>>>>> emerging here. Let me try to tease them apart: > >>>>>>>>>> > >>>>>>>>>> 1. inconsistency in what is materialized and what is not, what > is > >>>>>>>>>> queryable and what is not. I think we all agree there is some > >>>>>>>>>> inconsistency there and this will be addressed with any of the > >>>>>>>>>> proposed approaches. Addressing the inconsistency is the point > of > >>>> the > >>>>>>>>>> original KIP. > >>>>>>>>>> > >>>>>>>>>> 2. the exact API for materializing a KTable. We can specify 1) a > >>>>>>>>>> "store name" (as we do today) or 2) have a ".materialize[d]" > call > >>> or > >>>>>>>>>> 3) get a handle from a KTable ".getQueryHandle" or 4) have a > >>> builder > >>>>>>>>>> construct. So we have discussed 4 options. It is important to > >>>> remember > >>>>>>>>>> in this discussion that IQ is not designed for just local > queries, > >>>> but > >>>>>>>>>> also for distributed queries. In all cases an identifying > name/id > >>> is > >>>>>>>>>> needed for the store that the user is interested in querying. So > >>> we > >>>>>>>>>> end up with a discussion on who provides the name, the user (as > >>> done > >>>>>>>>>> today) or if it is generated automatically (as Jan suggests, as > I > >>>>>>>>>> understand it). If it is generated automatically we need a way > to > >>>>>>>>>> expose these auto-generated names to the users and link them to > >>> the > >>>>>>>>>> KTables they care to query. > >>>>>>>>> Hi, the last sentence is what I currently arguing against. The > user > >>>>>>>>> would never see a stringtype indentifier name or anything. All he > >>>> gets > >>>>>>>>> is the queryHandle if he executes a get(K) that will be an > >>>> interactive > >>>>>>>>> query get. with all the finding the right servers that currently > >>>> have a > >>>>>>>>> copy of this underlying store stuff going on. The nice part is > that > >>>> if > >>>>>>>>> someone retrieves a queryHandle, you know that you have to > >>>> materialized > >>>>>>>>> (if you are not already) as queries will be coming. Taking away > the > >>>>>>>>> confusion mentioned in point 1 IMO. > >>>>>>>>>> > >>>>>>>>>> 3. The exact boundary between the DSL, that is the processing > >>>>>>>>>> language, and the storage/IQ queries, and how we jump from one > to > >>>> the > >>>>>>>>>> other. This is mostly for how we get a handle on a store (so > it's > >>>>>>>>>> related to point 2), rather than for how we query the store. I > >>> think > >>>>>>>>>> we all agree that we don't want to limit ways one can query a > >>> store > >>>>>>>>>> (e.g., using gets or range queries etc) and the query APIs are > not > >>>> in > >>>>>>>>>> the scope of the DSL. > >>>>>>>>> Does the IQ work with range currently? The range would have to be > >>>>>>>>> started on all stores and then merged by maybe the client. Range > >>>> force a > >>>>>>>>> flush to RocksDB currently so I am sure you would get a > performance > >>>> hit > >>>>>>>>> right there. Time-windows might be okay, but I am not sure if the > >>>> first > >>>>>>>>> version should offer the user range access. > >>>>>>>>>> > >>>>>>>>>> 4. The nature of the DSL and whether its declarative enough, or > >>>>>>>>>> flexible enough. Damian made the point that he likes the builder > >>>>>>>>>> pattern since users can specify, per KTable, things like caching > >>> and > >>>>>>>>>> logging needs. His observation (as I understand it) is that the > >>>>>>>>>> processor API (PAPI) is flexible but doesn't provide any help at > >>> all > >>>>>>>>>> to users. The current DSL provides declarative abstractions, but > >>>> it's > >>>>>>>>>> not fine-grained enough. This point is much broader than the > KIP, > >>>> but > >>>>>>>>>> discussing it in this KIPs context is ok, since we don't want to > >>>> make > >>>>>>>>>> small piecemeal changes and then realise we're not in the spot > we > >>>> want > >>>>>>>>>> to be. > >>>>>>>>> This is indeed much broader. My guess here is that's why both > API's > >>>>>>>>> exists and helping the users to switch back and forth might be a > >>>> thing. > >>>>>>>>>> > >>>>>>>>>> Feel free to pitch in if I have misinterpreted something. > >>>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> Eno > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On 30 Jan 2017, at 10:22, Jan Filipiak < > jan.filip...@trivago.com > >>>> > >>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Eno, > >>>>>>>>>>> > >>>>>>>>>>> I have a really hard time understanding why we can't. From my > >>> point > >>>>>>>>>>> of view everything could be super elegant DSL only + public api > >>> for > >>>>>>>>>>> the PAPI-people as already exist. > >>>>>>>>>>> > >>>>>>>>>>> The above aproach implementing a .get(K) on KTable is foolisch > in > >>>> my > >>>>>>>>>>> opinion as it would be to late to know that materialisation > would > >>>> be > >>>>>>>>>>> required. > >>>>>>>>>>> But having an API that allows to indicate I want to query this > >>>> table > >>>>>>>>>>> and then wrapping the say table's processorname can work out > >>> really > >>>>>>>>>>> really nice. The only obstacle I see is people not willing to > >>> spend > >>>>>>>>>>> the additional time in implementation and just want a quick > shot > >>>>>>>>>>> option to make it work. > >>>>>>>>>>> > >>>>>>>>>>> For me it would look like this: > >>>>>>>>>>> > >>>>>>>>>>> table = builder.table() > >>>>>>>>>>> filteredTable = table.filter() > >>>>>>>>>>> rawHandle = table.getQueryHandle() // Does the materialisation, > >>>>>>>>>>> really all names possible but id rather hide the implication of > >>> it > >>>>>>>>>>> materializes > >>>>>>>>>>> filteredTableHandle = filteredTable.getQueryHandle() // this > >>> would > >>>>>>>>>>> _not_ materialize again of course, the source or the aggregator > >>>> would > >>>>>>>>>>> stay the only materialized processors > >>>>>>>>>>> streams = new streams(builder) > >>>>>>>>>>> > >>>>>>>>>>> This middle part is highly flexible I could imagin to force the > >>>> user > >>>>>>>>>>> todo something like this. This implies to the user that his > >>> streams > >>>>>>>>>>> need to be running > >>>>>>>>>>> instead of propagating the missing initialisation back by > >>>> exceptions. > >>>>>>>>>>> Also if the users is forced to pass the appropriate streams > >>>> instance > >>>>>>>>>>> back can change. > >>>>>>>>>>> I think its possible to build multiple streams out of one > >>> topology > >>>>>>>>>>> so it would be easiest to implement aswell. This is just what I > >>>> maybe > >>>>>>>>>>> had liked the most > >>>>>>>>>>> > >>>>>>>>>>> streams.start(); > >>>>>>>>>>> rawHandle.prepare(streams) > >>>>>>>>>>> filteredHandle.prepare(streams) > >>>>>>>>>>> > >>>>>>>>>>> later the users can do > >>>>>>>>>>> > >>>>>>>>>>> V value = rawHandle.get(K) > >>>>>>>>>>> V value = filteredHandle.get(K) > >>>>>>>>>>> > >>>>>>>>>>> This could free DSL users from anything like storenames and how > >>> and > >>>>>>>>>>> what to materialize. Can someone indicate what the problem > would > >>> be > >>>>>>>>>>> implementing it like this. > >>>>>>>>>>> Yes I am aware that the current IQ API will not support > querying > >>> by > >>>>>>>>>>> KTableProcessorName instread of statestoreName. But I think > that > >>>> had > >>>>>>>>>>> to change if you want it to be intuitive > >>>>>>>>>>> IMO you gotta apply the filter read time > >>>>>>>>>>> > >>>>>>>>>>> Looking forward to your opinions > >>>>>>>>>>> > >>>>>>>>>>> Best Jan > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> #DeathToIQMoreAndBetterConnectors > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 30.01.2017 10:42, Eno Thereska wrote: > >>>>>>>>>>>> Hi there, > >>>>>>>>>>>> > >>>>>>>>>>>> The inconsistency will be resolved, whether with materialize > or > >>>>>>>>>>>> overloaded methods. > >>>>>>>>>>>> > >>>>>>>>>>>> With the discussion on the DSL & stores I feel we've gone in a > >>>>>>>>>>>> slightly different tangent, which is worth discussing > >>> nonetheless. > >>>>>>>>>>>> We have entered into an argument around the scope of the DSL. > >>> The > >>>>>>>>>>>> DSL has been designed primarily for processing. The DSL does > not > >>>>>>>>>>>> dictate ways to access state stores or what hind of queries to > >>>>>>>>>>>> perform on them. Hence, I see the mechanism for accessing > >>> storage > >>>> as > >>>>>>>>>>>> decoupled from the DSL. > >>>>>>>>>>>> > >>>>>>>>>>>> We could think of ways to get store handles from part of the > >>> DSL, > >>>>>>>>>>>> like the KTable abstraction. However, subsequent queries will > be > >>>>>>>>>>>> store-dependent and not rely on the DSL, hence I'm not sure we > >>> get > >>>>>>>>>>>> any grand-convergence DSL-Store here. So I am arguing that the > >>>>>>>>>>>> current way of getting a handle on state stores is fine. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> Eno > >>>>>>>>>>>> > >>>>>>>>>>>>> On 30 Jan 2017, at 03:56, Guozhang Wang <wangg...@gmail.com> > >>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thinking loud here about the API options (materialize v.s. > >>>> overloaded > >>>>>>>>>>>>> functions) and its impact on IQ: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. The first issue of the current DSL is that, there is > >>>>>>>>>>>>> inconsistency upon > >>>>>>>>>>>>> whether / how KTables should be materialized: > >>>>>>>>>>>>> > >>>>>>>>>>>>> a) in many cases the library HAS TO materialize KTables no > >>>>>>>>>>>>> matter what, > >>>>>>>>>>>>> e.g. KStream / KTable aggregation resulted KTables, and hence > >>> we > >>>>>>>>>>>>> enforce > >>>>>>>>>>>>> users to provide store names and throw RTE if it is null; > >>>>>>>>>>>>> b) in some other cases, the KTable can be materialized or > not; > >>>> for > >>>>>>>>>>>>> example in KStreamBuilder.table(), store names can be > nullable > >>>> and > >>>>>>>>>>>>> in which > >>>>>>>>>>>>> case the KTable would not be materialized; > >>>>>>>>>>>>> c) in some other cases, the KTable will never be > materialized, > >>>> for > >>>>>>>>>>>>> example KTable.filter() resulted KTables, and users have no > >>>> options to > >>>>>>>>>>>>> enforce them to be materialized; > >>>>>>>>>>>>> d) this is related to a), where some KTables are required to > >>> be > >>>>>>>>>>>>> materialized, but we do not enforce users to provide a state > >>>> store > >>>>>>>>>>>>> name, > >>>>>>>>>>>>> e.g. KTables involved in joins; a RTE will be thrown not > >>>>>>>>>>>>> immediately but > >>>>>>>>>>>>> later in this case. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2. The second issue is related to IQ, where state stores are > >>>>>>>>>>>>> accessed by > >>>>>>>>>>>>> their state stores; so only those KTable's that have > >>>> user-specified > >>>>>>>>>>>>> state > >>>>>>>>>>>>> stores will be queryable. But because of 1) above, many > stores > >>>> may > >>>>>>>>>>>>> not be > >>>>>>>>>>>>> interested to users for IQ but they still need to provide a > >>>>>>>>>>>>> (dummy?) state > >>>>>>>>>>>>> store name for them; while on the other hand users cannot > query > >>>>>>>>>>>>> some state > >>>>>>>>>>>>> stores, e.g. the ones generated by KTable.filter() as there > is > >>> no > >>>>>>>>>>>>> APIs for > >>>>>>>>>>>>> them to specify a state store name. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 3. We are aware from user feedbacks that such backend details > >>>> would be > >>>>>>>>>>>>> better be abstracted away from the DSL layer, where app > >>>> developers > >>>>>>>>>>>>> should > >>>>>>>>>>>>> just focus on processing logic, while state stores along with > >>>> their > >>>>>>>>>>>>> changelogs etc would better be in a different mechanism; same > >>>>>>>>>>>>> arguments > >>>>>>>>>>>>> have been discussed for serdes / windowing triggers as well. > >>> For > >>>>>>>>>>>>> serdes > >>>>>>>>>>>>> specifically, we had a very long discussion about it and > >>>> concluded > >>>>>>>>>>>>> that, at > >>>>>>>>>>>>> least in Java7, we cannot completely abstract serde away in > the > >>>>>>>>>>>>> DSL, so we > >>>>>>>>>>>>> choose the other extreme to enforce users to be completely > >>> aware > >>>> of > >>>>>>>>>>>>> the > >>>>>>>>>>>>> serde requirements when some KTables may need to be > >>> materialized > >>>> vis > >>>>>>>>>>>>> overloaded API functions. While for the state store names, I > >>> feel > >>>>>>>>>>>>> it is a > >>>>>>>>>>>>> different argument than serdes (details below). > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> So to me, for either materialize() v.s. overloaded functions > >>>>>>>>>>>>> directions, > >>>>>>>>>>>>> the first thing I'd like to resolve is the inconsistency > issue > >>>>>>>>>>>>> mentioned > >>>>>>>>>>>>> above. So in either case: KTable materialization will not be > >>>> affect > >>>>>>>>>>>>> by user > >>>>>>>>>>>>> providing state store name or not, but will only be decided > by > >>>> the > >>>>>>>>>>>>> library > >>>>>>>>>>>>> when it is necessary. More specifically, only join operator > and > >>>>>>>>>>>>> builder.table() resulted KTables are not always materialized, > >>> but > >>>>>>>>>>>>> are still > >>>>>>>>>>>>> likely to be materialized lazily (e.g. when participated in a > >>>> join > >>>>>>>>>>>>> operator). > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> For overloaded functions that would mean: > >>>>>>>>>>>>> > >>>>>>>>>>>>> a) we have an overloaded function for ALL operators that > could > >>>>>>>>>>>>> result > >>>>>>>>>>>>> in a KTable, and allow it to be null (i.e. for the function > >>>> without > >>>>>>>>>>>>> this > >>>>>>>>>>>>> param it is null by default); > >>>>>>>>>>>>> b) null-state-store-name do not indicate that a KTable would > >>>>>>>>>>>>> not be > >>>>>>>>>>>>> materialized, but that it will not be used for IQ at all > >>>> (internal > >>>>>>>>>>>>> state > >>>>>>>>>>>>> store names will be generated when necessary). > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> For materialize() that would mean: > >>>>>>>>>>>>> > >>>>>>>>>>>>> a) we will remove state store names from ALL operators that > >>>> could > >>>>>>>>>>>>> result in a KTable. > >>>>>>>>>>>>> b) KTables that not calling materialized do not indicate that > >>> a > >>>>>>>>>>>>> KTable > >>>>>>>>>>>>> would not be materialized, but that it will not be used for > IQ > >>>> at all > >>>>>>>>>>>>> (internal state store names will be generated when > necessary). > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Again, in either ways the API itself does not "hint" about > >>>> anything > >>>>>>>>>>>>> for > >>>>>>>>>>>>> materializing a KTable or not at all; it is still purely > >>>> determined > >>>>>>>>>>>>> by the > >>>>>>>>>>>>> library when parsing the DSL for now. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Following these thoughts, I feel that 1) we should probably > >>>> change > >>>>>>>>>>>>> the name > >>>>>>>>>>>>> "materialize" since it may be misleading to users as what > >>>> actually > >>>>>>>>>>>>> happened > >>>>>>>>>>>>> behind the scene, to e.g. Damian suggested > >>> "queryableStore(String > >>>>>>>>>>>>> storeName)", > >>>>>>>>>>>>> which returns a QueryableStateStore, and can replace the > >>>>>>>>>>>>> `KafkaStreams.store` function; 2) comparing those two options > >>>>>>>>>>>>> assuming we > >>>>>>>>>>>>> get rid of the misleading function name, I personally favor > not > >>>>>>>>>>>>> adding more > >>>>>>>>>>>>> overloading functions as it keeps the API simpler. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Sat, Jan 28, 2017 at 2:32 PM, Jan Filipiak > >>>>>>>>>>>>> <jan.filip...@trivago.com> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> thanks for your mail, felt like this can clarify some > things! > >>>> The > >>>>>>>>>>>>>> thread > >>>>>>>>>>>>>> unfortunately split but as all branches close in on what my > >>>>>>>>>>>>>> suggestion was > >>>>>>>>>>>>>> about Ill pick this to continue > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Of course only the table the user wants to query would be > >>>>>>>>>>>>>> materialized. > >>>>>>>>>>>>>> (retrieving the queryhandle implies materialisation). So In > >>> the > >>>>>>>>>>>>>> example of > >>>>>>>>>>>>>> KTable::filter if you call > >>>>>>>>>>>>>> getIQHandle on both tables only the one source that is there > >>>> would > >>>>>>>>>>>>>> materialize and the QueryHandleabstraction would make sure > it > >>>> gets > >>>>>>>>>>>>>> mapped > >>>>>>>>>>>>>> and filtered and what not uppon read as usual. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Of Course the Object you would retrieve would maybe only > wrap > >>>> the > >>>>>>>>>>>>>> storeName / table unique identifier and a way to access the > >>>> streams > >>>>>>>>>>>>>> instance and then basically uses the same mechanism that is > >>>>>>>>>>>>>> currently used. > >>>>>>>>>>>>>> From my point of view this is the least confusing way for > DSL > >>>>>>>>>>>>>> users. If > >>>>>>>>>>>>>> its to tricky to get a hand on the streams instance one > could > >>>> ask > >>>>>>>>>>>>>> the user > >>>>>>>>>>>>>> to pass it in before executing queries, therefore making > sure > >>>> the > >>>>>>>>>>>>>> streams > >>>>>>>>>>>>>> instance has been build. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The effort to implement this is indeed some orders of > >>> magnitude > >>>>>>>>>>>>>> higher > >>>>>>>>>>>>>> than the overloaded materialized call. As long as I could > help > >>>>>>>>>>>>>> getting a > >>>>>>>>>>>>>> different view I am happy. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best Jan > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 28.01.2017 09:36, Eno Thereska wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Jan, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I understand your concern. One implication of not passing > any > >>>>>>>>>>>>>>> store name > >>>>>>>>>>>>>>> and just getting an IQ handle is that all KTables would > need > >>>> to be > >>>>>>>>>>>>>>> materialised. Currently the store name (or proposed > >>>>>>>>>>>>>>> .materialize() call) > >>>>>>>>>>>>>>> act as hints on whether to materialise the KTable or not. > >>>>>>>>>>>>>>> Materialising > >>>>>>>>>>>>>>> every KTable can be expensive, although there are some > tricks > >>>> one > >>>>>>>>>>>>>>> can play, > >>>>>>>>>>>>>>> e.g., have a virtual store rather than one backed by a > Kafka > >>>> topic. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> However, even with the above, after getting an IQ handle, > the > >>>>>>>>>>>>>>> user would > >>>>>>>>>>>>>>> still need to use IQ APIs to query the state. As such, we > >>> would > >>>>>>>>>>>>>>> still > >>>>>>>>>>>>>>> continue to be outside the original DSL so this wouldn't > >>>> address > >>>>>>>>>>>>>>> your > >>>>>>>>>>>>>>> original concern. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So I read this suggestion as simplifying the APIs by > removing > >>>> the > >>>>>>>>>>>>>>> store > >>>>>>>>>>>>>>> name, at the cost of having to materialise every KTable. > It's > >>>>>>>>>>>>>>> definitely an > >>>>>>>>>>>>>>> option we'll consider as part of this KIP. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>> Eno > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 28 Jan 2017, at 06:49, Jan Filipiak < > >>>> jan.filip...@trivago.com> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> Hi Exactly > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I know it works from the Processor API, but my suggestion > >>>> would > >>>>>>>>>>>>>>>> prevent > >>>>>>>>>>>>>>>> DSL users dealing with storenames what so ever. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> In general I am pro switching between DSL and Processor > API > >>>>>>>>>>>>>>>> easily. (In > >>>>>>>>>>>>>>>> my Stream applications I do this a lot with reflection and > >>>>>>>>>>>>>>>> instanciating > >>>>>>>>>>>>>>>> KTableImpl) Concerning this KIP all I say is that there > >>> should > >>>>>>>>>>>>>>>> be a DSL > >>>>>>>>>>>>>>>> concept of "I want to expose this __KTable__. This can be > a > >>>>>>>>>>>>>>>> Method like > >>>>>>>>>>>>>>>> KTable::retrieveIQHandle():InteractiveQueryHandle, the > >>> table > >>>>>>>>>>>>>>>> would know > >>>>>>>>>>>>>>>> to materialize, and the user had a reference to the "store > >>>> and the > >>>>>>>>>>>>>>>> distributed query mechanism by the Interactive Query > Handle" > >>>>>>>>>>>>>>>> under the hood > >>>>>>>>>>>>>>>> it can use the same mechanism as the PIP people again. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I hope you see my point J > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best Jan > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> #DeathToIQMoreAndBetterConnectors :) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 27.01.2017 21:59, Matthias J. Sax wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Jan, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> the IQ feature is not limited to Streams DSL but can also > >>> be > >>>>>>>>>>>>>>>>> used for > >>>>>>>>>>>>>>>>> Stores used in PAPI. Thus, we need a mechanism that does > >>> work > >>>>>>>>>>>>>>>>> for PAPI > >>>>>>>>>>>>>>>>> and DSL. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Nevertheless I see your point and I think we could > provide > >>> a > >>>>>>>>>>>>>>>>> better API > >>>>>>>>>>>>>>>>> for KTable stores including the discovery of remote > shards > >>> of > >>>>>>>>>>>>>>>>> the same > >>>>>>>>>>>>>>>>> KTable. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> @Michael: Yes, right now we do have a lot of overloads > and > >>> I > >>>> am > >>>>>>>>>>>>>>>>> not a > >>>>>>>>>>>>>>>>> big fan of those -- I would rather prefer a builder > >>> pattern. > >>>>>>>>>>>>>>>>> But that > >>>>>>>>>>>>>>>>> might be a different discussion (nevertheless, if we > would > >>>> aim > >>>>>>>>>>>>>>>>> for a API > >>>>>>>>>>>>>>>>> rework, we should get the changes with regard to stores > >>> right > >>>>>>>>>>>>>>>>> from the > >>>>>>>>>>>>>>>>> beginning on, in order to avoid a redesign later on.) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> something like: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> stream.groupyByKey() > >>>>>>>>>>>>>>>>> .window(TimeWindow.of(5000)) > >>>>>>>>>>>>>>>>> .aggregate(...) > >>>>>>>>>>>>>>>>> .withAggValueSerde(new CustomTypeSerde()) > >>>>>>>>>>>>>>>>> .withStoreName("storeName); > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> (This would also reduce JavaDoc redundancy -- maybe a > >>>> personal > >>>>>>>>>>>>>>>>> pain > >>>>>>>>>>>>>>>>> point right now :)) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 1/27/17 11:10 AM, Jan Filipiak wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Yeah, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Maybe my bad that I refuse to look into IQ as i don't > find > >>>> them > >>>>>>>>>>>>>>>>>> anywhere > >>>>>>>>>>>>>>>>>> close to being interesting. The Problem IMO is that > people > >>>>>>>>>>>>>>>>>> need to know > >>>>>>>>>>>>>>>>>> the Store name), so we are working on different levels > to > >>>>>>>>>>>>>>>>>> achieve a > >>>>>>>>>>>>>>>>>> single goal. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> What is your peoples opinion on having a method on > KTABLE > >>>> that > >>>>>>>>>>>>>>>>>> returns > >>>>>>>>>>>>>>>>>> them something like a Keyvalue store. There is of course > >>>>>>>>>>>>>>>>>> problems like > >>>>>>>>>>>>>>>>>> "it cant be used before the streamthreads are going and > >>>>>>>>>>>>>>>>>> groupmembership > >>>>>>>>>>>>>>>>>> is established..." but the benefit would be that for the > >>>> user > >>>>>>>>>>>>>>>>>> there is > >>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> consistent way of saying "Hey I need it materialized as > >>>>>>>>>>>>>>>>>> querries gonna > >>>>>>>>>>>>>>>>>> be comming" + already get a Thing that he can execute > the > >>>>>>>>>>>>>>>>>> querries on > >>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>> 1 step. > >>>>>>>>>>>>>>>>>> What I think is unintuitive here is you need to say > >>>>>>>>>>>>>>>>>> materialize on this > >>>>>>>>>>>>>>>>>> Ktable and then you go somewhere else and find its store > >>>> name > >>>>>>>>>>>>>>>>>> and then > >>>>>>>>>>>>>>>>>> you go to the kafkastreams instance and ask for the > store > >>>> with > >>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>> name. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> So one could the user help to stay in DSL land and > >>> therefore > >>>>>>>>>>>>>>>>>> maybe > >>>>>>>>>>>>>>>>>> confuse him less. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best Jan > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> #DeathToIQMoreAndBetterConnectors :) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 27.01.2017 16:51, Damian Guy wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I think Jan is saying that they don't always need to be > >>>>>>>>>>>>>>>>>>> materialized, > >>>>>>>>>>>>>>>>>>> i.e., > >>>>>>>>>>>>>>>>>>> filter just needs to apply the ValueGetter, it doesn't > >>>> need yet > >>>>>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>> physical state store. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Fri, 27 Jan 2017 at 15:49 Michael Noll < > >>>> mich...@confluent.io> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Like Damian, and for the same reasons, I am more in > favor > >>>> of > >>>>>>>>>>>>>>>>>>>> overloading > >>>>>>>>>>>>>>>>>>>> methods rather than introducing `materialize()`. > >>>>>>>>>>>>>>>>>>>> FWIW, we already have a similar API setup for e.g. > >>>>>>>>>>>>>>>>>>>> `KTable#through(topicName, stateStoreName)`. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> A related but slightly different question is what e.g. > >>> Jan > >>>>>>>>>>>>>>>>>>>> Filipiak > >>>>>>>>>>>>>>>>>>>> mentioned earlier in this thread: > >>>>>>>>>>>>>>>>>>>> I think we need to explain more clearly why KIP-114 > >>>> doesn't > >>>>>>>>>>>>>>>>>>>> propose > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> seemingly simpler solution of always materializing > >>>> tables/state > >>>>>>>>>>>>>>>>>>>> stores. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Fri, Jan 27, 2017 at 4:38 PM, Jan Filipiak < > >>>>>>>>>>>>>>>>>>>> jan.filip...@trivago.com> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>> Yeah its confusing, Why shoudn't it be querable by > IQ? > >>> If > >>>>>>>>>>>>>>>>>>>>> you uses > >>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>> ValueGetter of Filter it will apply the filter and > >>>> should be > >>>>>>>>>>>>>>>>>>>>> completely > >>>>>>>>>>>>>>>>>>>>> transparent as to if another processor or IQ is > >>> accessing > >>>>>>>>>>>>>>>>>>>>> it? How > >>>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> new method help? > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I cannot see the reason for the additional > materialize > >>>>>>>>>>>>>>>>>>>>> method being > >>>>>>>>>>>>>>>>>>>>> required! Hence I suggest leave it alone. > >>>>>>>>>>>>>>>>>>>>> regarding removing the others I dont have strong > >>> opinions > >>>>>>>>>>>>>>>>>>>>> and it > >>>>>>>>>>>>>>>>>>>>> seems to > >>>>>>>>>>>>>>>>>>>>> be unrelated. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best Jan > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On 26.01.2017 20:48, Eno Thereska wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Forwarding this thread to the users list too in case > >>>> people > >>>>>>>>>>>>>>>>>>>>> would > >>>>>>>>>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> comment. It is also on the dev list. > >>>>>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>>>>> Eno > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Begin forwarded message: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> From: "Matthias J. Sax" <matth...@confluent.io> > >>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] KIP-114: KTable > >>> materialization > >>>> and > >>>>>>>>>>>>>>>>>>>>>>> improved > >>>>>>>>>>>>>>>>>>>>>>> semantics > >>>>>>>>>>>>>>>>>>>>>>> Date: 24 January 2017 at 19:30:10 GMT > >>>>>>>>>>>>>>>>>>>>>>> To: dev@kafka.apache.org > >>>>>>>>>>>>>>>>>>>>>>> Reply-To: dev@kafka.apache.org > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> That not what I meant by "huge impact". > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I refer to the actions related to materialize a > >>> KTable: > >>>>>>>>>>>>>>>>>>>>>>> creating a > >>>>>>>>>>>>>>>>>>>>>>> RocksDB store and a changelog topic -- users should > >>> be > >>>>>>>>>>>>>>>>>>>>>>> aware about > >>>>>>>>>>>>>>>>>>>>>>> runtime implication and this is better expressed by > >>> an > >>>>>>>>>>>>>>>>>>>>>>> explicit > >>>>>>>>>>>>>>>>>>>>>>> method > >>>>>>>>>>>>>>>>>>>>>>> call, rather than implicitly triggered by using a > >>>> different > >>>>>>>>>>>>>>>>>>>>>>> overload of > >>>>>>>>>>>>>>>>>>>>>>> a method. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On 1/24/17 1:35 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I think your definition of a huge impact and mine > are > >>>> rather > >>>>>>>>>>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>>>> ;-P > >>>>>>>>>>>>>>>>>>>>>>>> Overloading a few methods is not really a huge > >>> impact > >>>>>>>>>>>>>>>>>>>>>>>> IMO. It is > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> also a > >>>>>>>>>>>>>>>>>>>>> sacrifice worth making for readability, usability of > >>> the > >>>> API. > >>>>>>>>>>>>>>>>>>>>>>>> On Mon, 23 Jan 2017 at 17:55 Matthias J. Sax < > >>>>>>>>>>>>>>>>>>>>>>>> matth...@confluent.io> > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> I understand your argument, but do not agree with > >>> it. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Your first version (even if the "flow" is not as > >>>> nice) > >>>>>>>>>>>>>>>>>>>>>>>>> is more > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> explicit > >>>>>>>>>>>>>>>>>>>>> than the second version. Adding a stateStoreName > >>>> parameter > >>>>>>>>>>>>>>>>>>>>> is quite > >>>>>>>>>>>>>>>>>>>>>>>>> implicit but has a huge impact -- thus, I prefer > >>> the > >>>>>>>>>>>>>>>>>>>>>>>>> rather more > >>>>>>>>>>>>>>>>>>>>>>>>> verbose > >>>>>>>>>>>>>>>>>>>>>>>>> but explicit version. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On 1/23/17 1:39 AM, Damian Guy wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> I'm not a fan of materialize. I think it > interrupts > >>>> the > >>>>>>>>>>>>>>>>>>>>>>>>> flow, > >>>>>>>>>>>>>>>>>>>>>>>>>> i.e, > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>> table.mapValue(..).materialize().join(..).materialize() > >>>>>>>>>>>>>>>>>>>>>>>>>> compared to: > >>>>>>>>>>>>>>>>>>>>>>>>>> table.mapValues(..).join(..) > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> I know which one i prefer. > >>>>>>>>>>>>>>>>>>>>>>>>>> My preference is stil to provide overloaded > >>> methods > >>>> where > >>>>>>>>>>>>>>>>>>>>>>>>>> people can > >>>>>>>>>>>>>>>>>>>>>>>>>> specify the store names if they want, otherwise > we > >>>> just > >>>>>>>>>>>>>>>>>>>>>>>>>> generate > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> them. > >>>>>>>>>>>>>>>>>>>>> On Mon, 23 Jan 2017 at 05:30 Matthias J. Sax > >>>>>>>>>>>>>>>>>>>>>>>>>> <matth...@confluent.io > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the KIP Eno! Here are my 2 cents: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 1) I like Guozhang's proposal about removing > >>> store > >>>>>>>>>>>>>>>>>>>>>>>>>>> name from > >>>>>>>>>>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>>>>>>>> KTable > >>>>>>>>>>>>>>>>>>>>>>>>>>> methods and generate internal names (however, I > >>>> would > >>>>>>>>>>>>>>>>>>>>>>>>>>> do this > >>>>>>>>>>>>>>>>>>>>>>>>>>> as > >>>>>>>>>>>>>>>>>>>>>>>>>>> overloads). Furthermore, I would not force > users > >>>> to call > >>>>>>>>>>>>>>>>>>>>>>>>>>> .materialize() > >>>>>>>>>>>>>>>>>>>>>>>>>>> if they want to query a store, but add one more > >>>> method > >>>>>>>>>>>>>>>>>>>>>>>>>>> .stateStoreName() > >>>>>>>>>>>>>>>>>>>>>>>>>>> that returns the store name if the KTable is > >>>>>>>>>>>>>>>>>>>>>>>>>>> materialized. > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thus, > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> also > >>>>>>>>>>>>>>>>>>>>> .materialize() must not necessarily have a parameter > >>>> storeName > >>>>>>>>>>>>>>>>>>>>>>>>>>> (ie, > >>>>>>>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>> should have some overloads here). > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I would also not allow to provide a null store > >>>> name (to > >>>>>>>>>>>>>>>>>>>>>>>>>>> indicate no > >>>>>>>>>>>>>>>>>>>>>>>>>>> materialization if not necessary) but throw an > >>>>>>>>>>>>>>>>>>>>>>>>>>> exception. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> This yields some simplification (see below). > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 2) I also like Guozhang's proposal about > >>>>>>>>>>>>>>>>>>>>>>>>>>> KStream#toTable() > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 3) > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 3. What will happen when you call materialize > on > >>>>>>>>>>>>>>>>>>>>>>>>>>> KTable > >>>>>>>>>>>>>>>>>>>>>>>>>>>> that is > >>>>>>>>>>>>>>>>>>>>>>>>>>>> already > >>>>>>>>>>>>>>>>>>>>>>>>>>>> materialized? Will it create another > StateStore > >>>>>>>>>>>>>>>>>>>>>>>>>>>> (providing > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> name > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>> different), throw an Exception? > >>>>>>>>>>>>>>>>>>>>>>>>>>> Currently an exception is thrown, but see > below. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> If we follow approach (1) from Guozhang, there > >>> is > >>>> no > >>>>>>>>>>>>>>>>>>>>>>>>>>>> need to > >>>>>>>>>>>>>>>>>>>>>>>>>>>> worry > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> about > >>>>>>>>>>>>>>>>>>>>>>>>>>> a second materialization and also no exception > >>>> must be > >>>>>>>>>>>>>>>>>>>>>>>>>>> throws. A > >>>>>>>>>>>>>>>>>>>>>>>>>>> call to > >>>>>>>>>>>>>>>>>>>>>>>>>>> .materialize() basically sets a "materialized > >>>> flag" (ie, > >>>>>>>>>>>>>>>>>>>>>>>>>>> idempotent > >>>>>>>>>>>>>>>>>>>>>>>>>>> operation) and sets a new name. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 4) > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Rename toStream() to toKStream() for > consistency. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Not sure whether that is really required. We > >>> also > >>>> use > >>>>>>>>>>>>>>>>>>>>>>>>>>>> `KStreamBuilder#stream()` and > >>>>>>>>>>>>>>>>>>>>>>>>>>>> `KStreamBuilder#table()`, for > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> example, > >>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>>>> don't care about the "K" prefix. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Eno's reply: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I think changing it to `toKStream` would make > it > >>>>>>>>>>>>>>>>>>>>>>>>>>> absolutely > >>>>>>>>>>>>>>>>>>>>>>>>>>>> clear > >>>>>>>>>>>>>>>>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> we are converting it to. > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'd say we should probably change the > >>>> KStreamBuilder > >>>>>>>>>>>>>>>>>>>>>>>>>>> methods > >>>>>>>>>>>>>>>>>>>>>>>>>>>> (but > >>>>>>>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>>>> this KIP). > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I would keep #toStream(). (see below) > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> 5) We should not remove any methods but only > >>>>>>>>>>>>>>>>>>>>>>>>>>> deprecate them. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> A general note: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I do not understand your comments "Rejected > >>>>>>>>>>>>>>>>>>>>>>>>>>> Alternatives". You > >>>>>>>>>>>>>>>>>>>>>>>>>>> say > >>>>>>>>>>>>>>>>>>>>>>>>>>> "Have > >>>>>>>>>>>>>>>>>>>>>>>>>>> the KTable be the materialized view" was > >>> rejected. > >>>>>>>>>>>>>>>>>>>>>>>>>>> But your > >>>>>>>>>>>>>>>>>>>>>>>>>>> KIP > >>>>>>>>>>>>>>>>>>>>>>>>>>> actually > >>>>>>>>>>>>>>>>>>>>>>>>>>> does exactly this -- the changelog abstraction > of > >>>>>>>>>>>>>>>>>>>>>>>>>>> KTable is > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> secondary > >>>>>>>>>>>>>>>>>>>>> after those changes and the "view" abstraction is > what > >>> a > >>>>>>>>>>>>>>>>>>>>>>>>>>> KTable is. > >>>>>>>>>>>>>>>>>>>>>>>>>>> And > >>>>>>>>>>>>>>>>>>>>>>>>>>> just to be clear, I like this a lot: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> - it aligns with the name KTable > >>>>>>>>>>>>>>>>>>>>>>>>>>> - is aligns with stream-table-duality > >>>>>>>>>>>>>>>>>>>>>>>>>>> - it aligns with IQ > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> I would say that a KTable is a "view > abstraction" > >>>> (as > >>>>>>>>>>>>>>>>>>>>>>>>>>> materialization is > >>>>>>>>>>>>>>>>>>>>>>>>>>> optional). > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> On 1/22/17 5:05 PM, Guozhang Wang wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP Eno, I have a few meta > >>> comments > >>>>>>>>>>>>>>>>>>>>>>>>>>> and a few > >>>>>>>>>>>>>>>>>>>>>>>>>>>> detailed > >>>>>>>>>>>>>>>>>>>>>>>>>>>> comments: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. I like the materialize() function in > general, > >>>> but > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I would > >>>>>>>>>>>>>>>>>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> see > >>>>>>>>>>>>>>>>>>>>>>>>>> how other KTable functions should be updated > >>>>>>>>>>>>>>>>>>>>>>>>>> accordingly. For > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> example, > >>>>>>>>>>>>>>>>