Hi Bill,

Thanks for the KIP! Awesome job catching this unexpected consequence
of the prior KIPs before it was released.

The proposal looks good to me. On top of just fixing the problem, it
seems to address two other pain points:
* that naming a state store automatically causes it to become queriable.
* that there's currently no way to configure the bytes store for join windows.

It's awesome that we can fix this issue and two others with one feature.

I'm wondering about a missing quadrant from the truth table involving
whether a Materialized is stored or not and querying is
enabled/disabled... What should be the behavior if there is no store
configured (e.g., if Materialized with only serdes) and querying is
enabled?

It seems we have two choices:
1. we can force creation of a state store in this case, so the store
can be used to serve the queries
2. we can provide just a queriable view, basically letting IQ query
into the "KTableValueGetter", which would transparently construct the
query response by applying the operator logic to the upstream state if
the operator state isn't already stored.

Offhand, it seems like the second is actually a pretty awesome
capability. But it might have an awkward interaction with the current
semantics. Presently, if I provide a Materialized.withName, it implies
that querying should be enabled AND that the view should actually be
stored in a state store. Under option 2 above, this behavior would
change to NOT provision a state store and instead just consult the
ValueGetter. To get back to the current behavior, users would have to
add a "bytes store supplier" to the Materialized to indicate that,
yes, they really want a state store there.

Behavior changes are always kind of scary, but I think in this case,
it might actually be preferable. In the event where only the name is
provided, it means that people just wanted to make the operation
result queriable. If we automatically convert this to a non-stored
view, then simply upgrading results in the same observable behavior
and semantics, but a linear reduction in local storage requirements
and disk i/o, as well as a corresponding linear reduction in memory
usage both on and off heap.

What do you think?
-John

On Tue, Jun 18, 2019 at 9:21 PM Bill Bejeck <bbej...@gmail.com> wrote:
>
> All,
>
> I'd like to start a discussion for adding a Materialized configuration
> object to KStream.join for naming state stores involved in joins.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+Materialized+to+Join
>
> Your comments and suggestions are welcome.
>
> Thanks,
> Bill

Reply via email to