Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

Stephan Ewen Tue, 16 Nov 2021 09:15:29 -0800

Hi Jingsong!

Thank you for all the explanations. To follow up on the points:

(1) Log implementation

Good to hear you are looking to make this extensible.

(2) change tracking

Understood, makes sense (used for re-processing).

(3) Log Scan Startup mode.

Your explanation makes sense.

As I understand it, that means, though, that any managed table that uses
another managed table as a source will automatically always use the "full
scan" mode (snapshot + log subscription).

(4) Data retention

> - The data of the snapshot will never expire, and the user needs to
> delete the partition by themselves if needed.
> - The expiration time of the log is unlimited by default, but it can
> be configured. In fact, we only need the latest log by default,
> because we have saved the previous snapshot.

Are you referring to cleanup here in the sense of garbage collection, or
also the deletion of data that makes the Managed Table semantically wrong?

Let's assume I define a managed table with such a query below, where the
"interactions" table is derived from a stream of Kafka records.

 INSERT INTO the_table
  SELECT window_end, COUNT(*)
    FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5' MINUTES))
GROUP BY window_end
  HAVING now() - window_end <= INTERVAL '14' DAYS;

When a user does "Select *" from that table, they don't want to see old
data, it would make the query reading from the managed table incorrect. Who
would then filter or would prune the old data? The query maintaining the
managed table? The query reading from the managed table?

Maybe this isn't something you want to solve in the first version, but it
would be good to have a definite answer what the plan in case data
retention is part of the query semantics.

If the assumption is that managed tables can never have retention defined
in their query semantics (and the assumption is that all filtering is done
by the query that reads the managed table), then I think this is a
supercritical design property that we need to make very explicit for users
to understand.

(5) Unified format in the log

Makes sense.

(6, 7) PK mode and consistency

I get the technical reason that with PKs there is a case for duplicate
filtering, but I am skeptical of the user experience if this details
implicitly changes the consistency users see. This may not be relevant for
Flink queries that read from the changelog, but it is relevant for external
programs subscribing to the changelog.

What would be the additional overhead of creating a separate config switch
"log consistency" with values "transactional", "eventual".
By default, this would be transactional regardless of whether the result
has a PK or not. If there is a PK, then users can optimize the latency if
they want by using the "eventual" setting. That means there is not a
surprise for users through the fact that changing the schema of the table
(adding a PK) suddenly changes the consistency semantics.

If I understand correctly, "transactional" with PK would actually not be
correct, it would still contain some duplicates.
But that means we cannot expose a correct changelog to users (external
subscribers) in all cases?

The PK behavior seems like something that needs some general improvement in
the SQL engine.

(8) Optimistic locking.

Thanks for clarifying, that makes sense. I think that would be good to add
to the FLIP. The reason why something is done is as important as what
exactly is being done.

Thanks a lot!
Stephan

On Mon, Nov 15, 2021 at 10:41 AM Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi Timo,
>
> > It would be great if we can add not only `Receive any type of changelog`
> but also `Receive any type of datatype`.
>
> Nice, I think we can.
>
> > Please clarify whether the compact DDL is a synchronous or asynchrounous
> operation in the API? So far all DDL was synchrounous. And only DML
> asynchrounous.
>
> It should be a synchronous operation.
>
> > I find this 'change-tracking' = 'false' a bit confusing. Even in batch
> scenarios we have a changelog, only with insert-only changes. Can you
> elaborate? Wouldn't 'exclude-from-log-store' or 'exclude-log-store' or
> 'log.disabled' be more accurate?
>
> Change tracking is from Oracle and snowflake [1][2][3]. It matches the
> "emit-changes" syntax. It means that after closing, the downstream
> consumption cannot obtain the corresponding changes.
>
> > DESCRIBE DETAIL TABLE
>
> +1 to `DESCRIBE TABLE EXTENDED`.
>
> > Set checkpoint interval to 1 min if checkpoint is not enabled
> when the planner detects a sink to built-in dynamic table.
> This sounds like too much magic to me.
>
> You are right. And one minute may not be enough for all situations. +1
> to throw detailed exception to alert user.
>
> > GenericCatalog to `Catalog#supportesTableStorage`
>
> I originally thought about completely distinguishing it from external
> catalog, but it is also possible to add a new method.
>
> > CatalogBaseTable.TableKind
>
> Yes, we can create a new TableKind for this table. Catalog can easily
> distinguish them.
>
> > enrichOptions(Context context)
> Why is this method returning a Map<String, String>? Shouldn't the caller
> assume that all options enriched via `CatalogTable.copy` should have
> been applied by `enrichOptions`?
>
> Yes, the planner adds some options to the table before creating it.
>
> >  Partitioning and Event-time:
> Have you considered to support semantics similar to
> `sink.partition-commit.trigger` based on `partition-time`. It could
> beneficial to have the partitions committed by watermarks as well. My
> biggest concern is how we can enable watermarking end-to-end using a
> file store (I think for log store this should not be a problem?).
>
> Yes, we can also write watermark to storage, which has many advantages:
> - Users can see the progress of a partition.
> - Partition commit can have some optimization for this partition, like
> compact.
> - Lookup join a table, we can use the watermark to align.   Many users
> complain that the lookup join cannot find data when the lookup table
> is not ready, it is difficult to align the main stream and the lookup
> table.
>
> But this is not the current blocker. We can improve this in future.
>
> [1] https://docs.snowflake.com/en/user-guide/streams.html
> [2]
> https://docs.oracle.com/database/121/ADMQS/GUID-3BAA0D48-CA35-4CD7-810E-50C703DC6FEB.htm
> [3]
> https://docs.oracle.com/database/121/DWHSG/advmv.htm#DWHSG-GUID-F7394DFE-7CF6-401C-A312-C36603BEB01B
>
> Best,
> Jingsong
>

Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

Reply via email to