Thanks, Guozhang.

I've been thinking about the following approach: https://imgur.com/a/pP92Z

Does this approach make sense?

A key consideration will be that the product dimension table updates are
processed and added to kafka before the corresponding purchase transaction
record is processed.



On 17 October 2017 at 02:15, Guozhang Wang <wangg...@gmail.com> wrote:

> Hello Chris,
>
> The global table described in KIP-99 will keep the most recent snapshot of
> the table when applying updates to the table, i.e. it is like type 1:
> overwrite. So when a table or stream is joined with the global table, it is
> always joined with the most recent values of the global table.
>
> However, note that in Kafka Streams api, joining streams are synchronized
> based on their incoming record's timestamps (i.e. the library will choose
> which records to process next, either from the global dimension table's
> changelog, or from the fact table's changelog, based on their stream time
> in the best effort), so if you have an updated value on the fact table,
> that update's timestamp will be aligned with the the current updates on the
> global table as well.
>
>
> Guozhang
>
>
> On Mon, Oct 16, 2017 at 12:51 PM, chris snow <chsnow...@gmail.com> wrote:
>
> > The streams global ktable wiki page [1] describes a data warehouse syle
> > operation whereby dimension tables are joined to fact tables.
> >
> > I’m interested in whether this approach works for type 2 slowly changing
> > dimensions [2]?  In type 2 scd the dimension record history is preserved
> > and the fact table record is joined to the appropriate version of the
> > dimension table record.
> >
> > —
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 99%3A+Add+Global+Tables+to+Kafka+Streams
> > [2] https://en.m.wikipedia.org/wiki/Slowly_changing_dimension
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to