Hi Dawid, I don't have other concerns now. Thanks for the explanation.
Best, Jark On Thu, 21 Nov 2019 at 18:13, Kurt Young <ykt...@gmail.com> wrote: > Hi all, > > After some offline discussion with Dawid, I changed my mind. We still agree > on that > we should only support reading from tables with primary key, forbid writing > to such table. > > But the enforce flag should be introduced with primary key, and probably > should be > `NOR ENFORCE` be default. The problem with making `ENFORCE` by default is > when > creating a table with primary key constraint. If we follow the SQL standard > here, we should > do the validity check during the creating operation, and this will > introduce a really big > burden here and might make supporting this feature not feasible. > > Best, > Kurt > > > On Thu, Nov 21, 2019 at 10:25 AM Kurt Young <ykt...@gmail.com> wrote: > > > Hi Dawid, > > > > Actually I'm not 100% sure about both choices: always enforce primary key > > by Flink and not enforce at all. > > > > As you said, always enforce primary key is hard to achieve since Flink > > doesn't > > own data most of the time. But with some certain cases, it's actually > > doable, like > > writing a new table in a batch job, or if we have upsert flag supported > in > > message > > queue. So it's hard to say Flink has no chance to enforce primary key > > here. > > > > On the other hand, not enforce primary key indeed save us a lot of work, > > but also > > would cause confusing sometimes. Especially when the table is written and > > read > > by Flink only. For example, some user created a table with primary key > not > > enforced, > > and they will use it as source and sink table. This situation is quite > > confusing, because > > Flink didn't enforce primary key when writing, so it's hard to say we can > > rely on such > > information during read. It looks we are contradictory with ourself. > > > > AFAIK, most databases will only do validity check only when writing to > > table, and > > simply trust such information during read. I think we could also follow > > this way. Since > > we are not sure yet what kind of behavior when writing to a table with > > primary key, I would > > suggest to defer this util we start to talk about the source and sink > > behavior with primary > > key, maybe as well as introducing update, delete flag. > > > > To summarize, my proposal would be: > > 1. Add primary key but not introducing enforce flag now > > 2. Table with primary key can't be used as sink, and we will simply trust > > pk info during read. > > > > Best, > > Kurt > > > > > > On Wed, Nov 20, 2019 at 9:26 PM Dawid Wysakowicz <dwysakow...@apache.org > > > > wrote: > > > >> Hi Kurt, > >> > >> You are right that this is included in the SQL standard. > >> > >> The problem is we are not a database and we do not own the data. There > >> is no way for us to truly enforce the uniqueness constraint. We are > >> always writing to some external systems that, moreover we are not an > >> exclusive produce/consumer for that data. We cannot ensure that external > >> systems will not corrupt the data. > >> > >> The way RDBM systems ensure uniqueness constraint is that they keep the > >> index of all keys for a table that they check against for every write. > >> This works only if the system that maintains the index is the only > >> writer to that table. > >> > >> Moreover some of the systems also allow loosening this restriction e.g. > >> > >> DB2: > >> > >> > https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.wn.doc/doc/c0061153.html > >> > >> HIVE(with a non standard syntax, they do it for the same reasons as > >> listed above): > >> > >> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints > >> > >> > >> What do you think? > >> > >> Best, > >> > >> Dawid > >> > >> > >> On 20/11/2019 13:51, Kurt Young wrote: > >> > Hi all, > >> > > >> > Thanks Dawid for bringing this up, this is very important property of > >> SQL > >> > and > >> > I'm big +1 to have this. > >> > > >> > But before the discussion going deeply, I want to first mention that > >> > according to > >> > SQL standard, any unique key constraint which including primary key, > >> should > >> > be > >> > always be enforced. > >> > > >> > Please see: > >> > > >> > 4.18.3.1 Introduction to table constraints > >> > > >> > "NOTE 50 — A unique constraint is always enforced. A table check > >> constraint > >> > or > >> > a referential constraint can either be enforced or not enforced." > >> > > >> > Best, > >> > Kurt > >> > > >> > > >> > On Wed, Nov 20, 2019 at 5:43 PM Dawid Wysakowicz < > >> dwysakow...@apache.org> > >> > wrote: > >> > > >> >> Hey, > >> >> > >> >> After an offline discussion I updated the FLIP slightly. In the > initial > >> >> FLIP I put the OUT_OF_LINE definition at a wrong position. I moved it > >> >> into the statement paranthesis. I also moved the primary key to the > >> >> TableSchema class. > >> >> > >> >> I also extended sections about UNIQUE key support and how do the > >> primary > >> >> keys affect sources and sinks. > >> >> > >> >> Please have another look. If there are no further concerns, I would > >> like > >> >> to start the vote soon. > >> >> > >> >> Best, > >> >> > >> >> Dawid > >> >> > >> >> On 20/11/2019 04:32, Jark Wu wrote: > >> >>> Hi Dawid, > >> >>> > >> >>> Thanks for driving this. Primary key is a very important and useful > >> >> feature > >> >>> to Flink Table API / SQL. > >> >>> > >> >>> I'm +1 to the general design. > >> >>> > >> >>> And have some thoughts as following: > >> >>> > >> >>> 1. > The design says Flink only support NOT ENFORCED mode. But the > DDL > >> >>> and KeyConstraint#primaryKey(..) can pass in ENFORCED mode. > >> >>> What will we do when user specify an ENFORCED mode? Shall we > >> forbid > >> >>> it? > >> >>> > >> >>> 2. > "When creating a table, creating a primary key constraint will > >> alter > >> >>> the columns nullability." > >> >>> I think we should force the columns to be declared NOT NULL, > >> instead > >> >> of > >> >>> change the columns nullability implicitly. > >> >>> Otherwise, it may be confused what is the nullbility of the > >> primary > >> >> key > >> >>> columns. > >> >>> > >> >>> 3. > "Support for Unique key is not part of the FLIP. It is just > >> >> mentioned > >> >>> to show how can we extend the primary key concept with more > >> constraints > >> >> in > >> >>> the future." > >> >>> I think we can include Unique Key as part of the FLIP, as it is > >> >> already > >> >>> stated clearly. We can put the implemenation task of unique key in a > >> >> lower > >> >>> priority. > >> >>> > >> >>> 4. > Method for retrieving primary key constraint is in > >> CatalogBaseTable. > >> >>> In SQL standard, primary key is declared in line or out of line. > >> If > >> >> it > >> >>> is out of line, it is still in the schema part, i.e. besides column > >> >>> definitions and in the parentheses. > >> >>> So I think maybe primary key should belong to `TableSchema` > >> becuase > >> >> it > >> >>> is a part of schema. > >> >>> > >> >>> CREATE TABLE xxx ( > >> >>> a int, > >> >>> b varchar, > >> >>> c bigint, > >> >>> primary key (a, b) > >> >>> ) with (...) > >> >>> > >> >>> > >> >>> Best, > >> >>> Jark > >> >>> > >> >>> > >> >>> On Tue, 19 Nov 2019 at 23:18, Dawid Wysakowicz < > >> dwysakow...@apache.org> > >> >>> wrote: > >> >>> > >> >>>> Hi, > >> >>>> > >> >>>> I wanted to bring up the topic of primary key constraints that we > >> could > >> >>>> leverage for runtime optimizations. Please have a look at the > >> proposal > >> >>>> and I would especially draw your attention to the topic of > >> nullability > >> >>>> of columns that are part of a primary key. Some time in the future > we > >> >>>> will also be able to leverage it for upserting streaming tables. > >> >>>> > >> >>>> Here is the proposal: > >> >>>> > >> >>>> > >> >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API > >> >>>> Best, > >> >>>> > >> >>>> Dawid > >> >>>> > >> >>>> > >> >>>> > >> >> > >> > >> >