Re: [DISCUSS] KIP-213: Second follow-up on Foreign Key Joins

John Roesler Wed, 26 Jun 2019 11:54:48 -0700

I think the "scenario trace" is very nice, but has one point that I
found confusing:

You indicate a retraction in the join output as (k,null) and a join
result as (k, leftval, rightval), but confusingly, you also write a
join result as (k, JoinResult) when one side is null. Maybe just call
it as (k, leftval, null) or (k, null, rightval)? That way the readers
can more easily determine if the results meet their expectations for
each join type.

(procedural note: if you update the KIP, you might want to send a new
"diff link" to this thread, since the one I posted at the beginning
would not automatically show your latest changes)

I was initially concerned that the proposed algorithm would wind up
propagating something that looks like a left join (k, leftval, null)
under the case that Joe pointed out, but after reviewing your
scenario, I see that it will emit a tombstone (k, null) instead. This
is appropriate, and unavoidable, since we have to retract the join
result from the logical view (the join result is a logical Table).

Looking closely at the proposal, can you explain more about the
propagateIfNull field in SubscriptionResponseWrapper?
It sort of looks like it's always going to be equal to (RHS-result != null).

In other words, can we drop that field and just send back RHS-result
or null, and then handle it on the left-hand side like:
if (rhsOriginalValueHash doesn't match) {
    emit nothing, just drop the update
} else if (joinType==inner && rhsValue == null) {
    emit tombstone
} else {
    emit joiner(lhsValue, rhsValue)
}

To your concern about emitting extra tombstones, personally, I think
it's fine. Clearly, we should try to avoid unnecessary tombstones, but
all things considered, it's not harmful to emit some unnecessary
tombstones: their payload is small, and they are trivial to handle
downstream. If users want to, they can materialize the join result to
suppress any extra tombstones, so there's a way out.

Thanks for the awesome idea. It's better than what I was thinking.
-john

On Wed, Jun 26, 2019 at 11:37 AM Adam Bellemare
<adam.bellem...@gmail.com> wrote:
>
> Thanks John.
>
> I'm looking forward to any feedback on this. In the meantime I will work on
> the unit tests to ensure that we have well-defined and readable coverage.
>
> At the moment I cannot see a way around emitting (k,null) whenever we emit
> an event that lacks a matching foreign key on the RHS, except in the
> (k,null) -> (k,fk) case.
> If this LHS oldValue=null, we know we would have emitted a deletion and so
> (k,null) would be emitted out of the join. In this case we don't need to
> send another null.
>
> Adam
>
> On Wed, Jun 26, 2019 at 11:53 AM John Roesler <j...@confluent.io> wrote:
>
> > Hi Adam,
> >
> > Thanks for the proposed revision to your KIP
> > (
> > https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=74684836&selectedPageVersions=77&selectedPageVersions=74
> > )
> >
> > in response to the concern pointed out during code review
> > (https://github.com/apache/kafka/pull/5527#issuecomment-505137962)
> >
> > We should have a brief discussion thread (here) in the mailing list to
> > make sure everyone who wants to gets a chance to consider the
> > modification to the design.
> >
> > Thanks,
> > -John
> >

Re: [DISCUSS] KIP-213: Second follow-up on Foreign Key Joins

Reply via email to