Hi Dawid,

I don't have other concerns now. Thanks for the explanation.

Best,
Jark

On Thu, 21 Nov 2019 at 18:13, Kurt Young <ykt...@gmail.com> wrote:

> Hi all,
>
> After some offline discussion with Dawid, I changed my mind. We still agree
> on that
> we should only support reading from tables with primary key, forbid writing
> to such table.
>
> But the enforce flag should be introduced with primary key, and probably
> should be
> `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is
> when
> creating a table with primary key constraint. If we follow the SQL standard
> here, we should
> do the validity check during the creating operation, and this will
> introduce a really big
> burden here and might make supporting this feature not feasible.
>
> Best,
> Kurt
>
>
> On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <ykt...@gmail.com> wrote:
>
> > Hi Dawid,
> >
> > Actually I'm not 100% sure about both choices: always enforce primary key
> > by Flink and not enforce at all.
> >
> > As you said, always enforce primary key is hard to achieve since Flink
> > doesn't
> > own data most of the time. But with some certain cases, it's actually
> > doable, like
> > writing a new table in a batch job, or if we have upsert flag supported
> in
> > message
> > queue. So it's hard to say Flink has no chance to enforce primary key
> > here.
> >
> > On the other hand, not enforce primary key indeed save us a lot of work,
> > but also
> > would cause confusing sometimes. Especially when the table is written and
> > read
> > by Flink only. For example, some user created a table with primary key
> not
> > enforced,
> > and they will use it as source and sink table. This situation is quite
> > confusing, because
> > Flink didn't enforce primary key when writing, so it's hard to say we can
> > rely on such
> > information during read. It looks we are contradictory with ourself.
> >
> > AFAIK, most databases will only do validity check only when writing to
> > table, and
> > simply trust such information during read. I think we could also follow
> > this way. Since
> > we are not sure yet what kind of behavior when writing to a table with
> > primary key, I would
> > suggest to defer this util we start to talk about the source and sink
> > behavior with primary
> > key, maybe as well as introducing update, delete flag.
> >
> > To summarize, my proposal would be:
> > 1. Add primary key but not introducing enforce flag now
> > 2. Table with primary key can't be used as sink, and we will simply trust
> > pk info during read.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <dwysakow...@apache.org
> >
> > wrote:
> >
> >> Hi Kurt,
> >>
> >> You are right that this is included in the SQL standard.
> >>
> >> The problem is we are not a database and we do not own the data. There
> >> is no way for us to truly enforce the uniqueness constraint. We are
> >> always writing to some external systems that, moreover we are not an
> >> exclusive produce/consumer for that data. We cannot ensure that external
> >> systems will not corrupt the data.
> >>
> >> The way RDBM systems ensure uniqueness constraint is that they keep the
> >> index of all keys for a table that they check against for every write.
> >> This works only if the system that maintains the index is the only
> >> writer to that table.
> >>
> >> Moreover some of the systems also allow loosening this restriction e.g.
> >>
> >> DB2:
> >>
> >>
> https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html
> >>
> >> HIVE(with a non standard syntax, they do it for the same reasons as
> >> listed above):
> >>
> >>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints
> >>
> >>
> >> What do you think?
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >>
> >> On 20/11/2019 13:51, Kurt Young wrote:
> >> > Hi all,
> >> >
> >> > Thanks Dawid for bringing this up, this is very important property of
> >> SQL
> >> > and
> >> > I'm big +1 to have this.
> >> >
> >> > But before the discussion going deeply, I want to first mention that
> >> > according to
> >> > SQL standard, any unique key constraint which including primary key,
> >> should
> >> > be
> >> > always be enforced.
> >> >
> >> > Please see:
> >> >
> >> > 4.18.3.1 Introduction to table constraints
> >> >
> >> > "NOTE 50 — A unique constraint is always enforced. A table check
> >> constraint
> >> > or
> >> > a referential constraint can either be enforced or not enforced."
> >> >
> >> > Best,
> >> > Kurt
> >> >
> >> >
> >> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz <
> >> dwysakow...@apache.org>
> >> > wrote:
> >> >
> >> >> Hey,
> >> >>
> >> >> After an offline discussion I updated the FLIP slightly. In the
> initial
> >> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it
> >> >> into the statement paranthesis. I also moved the primary key to the
> >> >> TableSchema class.
> >> >>
> >> >> I also extended sections about UNIQUE key support and how do the
> >> primary
> >> >> keys affect sources and sinks.
> >> >>
> >> >> Please have another look. If there are no further concerns, I would
> >> like
> >> >> to start the vote soon.
> >> >>
> >> >> Best,
> >> >>
> >> >> Dawid
> >> >>
> >> >> On 20/11/2019 04:32, Jark Wu wrote:
> >> >>> Hi Dawid,
> >> >>>
> >> >>> Thanks for driving this. Primary key is a very important and useful
> >> >> feature
> >> >>> to Flink Table API / SQL.
> >> >>>
> >> >>> I'm +1 to the general design.
> >> >>>
> >> >>> And have some thoughts as following:
> >> >>>
> >> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the
> DDL
> >> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode.
> >> >>>     What will we do when user specify an ENFORCED mode?  Shall we
> >> forbid
> >> >>> it?
> >> >>>
> >> >>> 2. > "When creating a table, creating a primary key constraint will
> >> alter
> >> >>> the columns nullability."
> >> >>>     I think we should force the columns to be declared NOT NULL,
> >> instead
> >> >> of
> >> >>> change the columns nullability implicitly.
> >> >>>     Otherwise, it may be confused what is the nullbility of the
> >> primary
> >> >> key
> >> >>> columns.
> >> >>>
> >> >>> 3. > "Support for Unique key is not part of the FLIP. It is just
> >> >> mentioned
> >> >>> to show how can we extend the primary key concept with more
> >> constraints
> >> >> in
> >> >>> the future."
> >> >>>     I think we can include Unique Key as part of the FLIP, as it is
> >> >> already
> >> >>> stated clearly. We can put the implemenation task of unique key in a
> >> >> lower
> >> >>> priority.
> >> >>>
> >> >>> 4. > Method for retrieving primary key constraint is in
> >> CatalogBaseTable.
> >> >>>     In SQL standard, primary key is declared in line or out of line.
> >> If
> >> >> it
> >> >>> is out of line, it is still in the schema part, i.e. besides column
> >> >>> definitions and in the parentheses.
> >> >>>     So I think maybe primary key should belong to `TableSchema`
> >> becuase
> >> >> it
> >> >>> is a part of schema.
> >> >>>
> >> >>>     CREATE TABLE xxx (
> >> >>>        a int,
> >> >>>        b varchar,
> >> >>>        c bigint,
> >> >>>        primary key (a, b)
> >> >>>     ) with (...)
> >> >>>
> >> >>>
> >> >>> Best,
> >> >>> Jark
> >> >>>
> >> >>>
> >> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz <
> >> dwysakow...@apache.org>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I wanted to bring up the topic of primary key constraints that we
> >> could
> >> >>>> leverage for runtime optimizations. Please have a look at the
> >> proposal
> >> >>>> and I would especially draw your attention to the topic of
> >> nullability
> >> >>>> of columns that are part of a primary key. Some time in the future
> we
> >> >>>> will also be able to leverage it for upserting streaming tables.
> >> >>>>
> >> >>>> Here is the proposal:
> >> >>>>
> >> >>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API
> >> >>>> Best,
> >> >>>>
> >> >>>> Dawid
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>
> >>
> >>
>

Reply via email to