Hi guys, Just my two cents.
I agree with Jark that we should use "event-time/processing-time temporal join" as the name for this join. But I'm not sure about the definition of "temporal table". Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a > non-temporal table, which also sounds like contradictions. > Why would we require "temporal join" only joins a "temporal table", if the standard doesn't even define what is a "temporal table"? What about defining the term "temporal table" to be "a table is > changing over time", the same as the "dynamic table". > I'd prefer not to introduce another term if it has the same meaning as "dynamic table". I wonder whether it's possible to consider "temporal join" as a specific kind of join that may work with regular or versioned tables, instead of a join that requires a specific kind of table. On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <imj...@gmail.com> wrote: > Hi everyone, > > Thank you for the great discussion @Leonard and @Fabian. > > *Regarding to choose a different name for this join:* > > From my point of view, I don't agree to introduce a new grammar called > whatever "lookup join" or "version join", because: > 1. "lookup" is a physical behavior not a logical concept. > 2. The SQL standard has proposed temporal join and it fits Flink well with > the event-time and processing-time attributes. > 3. We already have so many different join grammer, e.g. "regular join", > "interval join", "temporal join", and maybe "window join" in the future. > It may confuse users when having more join concepts. > > So I think the existing "event-time temporal join" and "processing-time > temporal join" work well and we should still use them. > > *Regarding to the "temporal table without versions":* > > I agree there are contradictions between "temporal table" and "temporal > table without versions" if we think "temporal table" tracks full history. > However, if we call "temporal table without versions" as a "regular > table", not a kind of "temporal table". > Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a > non-temporal table, which also sounds like contradictions. > > I also notice that in SQL Server, the "system-versioned temporal table" is > a combination of "Temporal Table" (current data) and "History Table" [1]. > The "Temporal Table" is the current data of the database which is the same > as our definition of "temporal table without version". The documentation > says: > > > *This additional table is referred to as the history table, while the > main table that stores current (actual) row versions is referred to as the > current table or simply as the temporal table. * > > Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I > think we can define the meaning ourselves. > > *> Interestingly, SQL:2011 manages to provide this (system-versioned > table) support without actually defining or using the terms “temporal data” > or “temporal table”.* > > What about defining the term "temporal table" to be "a table is > changing over time", the same as the "dynamic table". > A "versioned temporal table" is a special temporal table which tracks the > full history of the changing table and supports point-in-time access. > A regular temporal table only supports access to the current data (no > earlier versions). > > In this way, the "temporal join" still makes sense, because the joined > table is a "temporal table", and all the tables in Flink is "temporal > table" (or dynamic table). > > Best, > Jark > > [1]: > https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table > [2]: > https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf > > > > > > > On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi everyone, >> >> Yes, just to help user distinguish the difference between versioned >>> temporal table and latest-only temporal table. >>> >>> >> I don't think we help users to understand the differences if we invent >> new (IMO confusing) terms ("temporal table without version" or "latest-only >> temporal table") instead of using known terminology ("table"). >> >> >>> As I suggested before, it might make sense to choose a different name >>> for this join instead of using a fancy name for the tables such that the >>> name fits the name of the join. >>> >>> I think lookup join is a physical implementation of Processing-Time >>> temporal table join, we can even offer a lookup join implementation for >>> Event-time temporal table join if the external system support lookup >>> history data. >>> >>> Versioned temporal table can be used in Event-time temporal table join, >>> all temporal table(including versioned temporal tale and regular table) can >>> be used in Processing-time temporal table join, >>> lookup join is a kind of physical implementation of Processing-time >>> temporal table join which lookups the external system’s data. >>> >>> >> Lookup Join was just a quick shot proposal without putting much thought >> into it, but a different name might help to solve the naming issue with >> "temporal table without versions". >> Another proposal would be "Version Join" or "Table Version Join". >> >> >>> If we agree regular tables are temporal table, I tend to keep >>> latest-only temporal table to clarify the temporal and version concept in >>> regular table. >>> >>> >> In my understanding access to a table's history is the core property of >> temporal tables (that's also how temporal tables are defined for SQL Server >> [1] or PostgreSQL [2]). >> A "temporal table without versions"or "latest-only temporal table" is >> just a table. It does not have any other property than all the tables we've >> always dealt with. >> >> So I would say that there are temporal tables (for which earlier versions >> can be accessed) and just tables like those that our users have always used >> so far (for which there are no earlier versions). >> >> Maybe it's just me, but "temporal table without versions" or "latest-only >> temporal table" sound like contradictions to me, similar to "tiny >> skyscraper". >> >> How about we try to get a few more opinions on this? >> What do others think about this issue? >> >> >>> Best >>> Leonard >>> >>> >> Best, >> Fabian >> >> [1] >> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15 >> [2] https://pgxn.org/dist/temporal_tables/ >> >> >> Cheers, >>> Fabian >>> >>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjt...@gmail.com >>> >: >>> >>>> Thanks Fabian and Seth for the feedback >>>> >>>> I agree the name “temporal table without version” is less accurate >>>> because this kind of temporal table has a latest version rather than has no >>>> version. >>>> >>>> How about “Latest-only temporal table” ? The related concept section >>>> updated as following: >>>> >>>> *Temporal Table:* Temporal table is a table that evolves over time, rows >>>> in temporal table are associated with one or more temporal periods. >>>> >>>> * Version:* A temporal table can split into a set of versioned >>>> table snapshots, the *version* in table snapshots represents the valid >>>> life circle of rows, the start time and the end time of the valid period >>>> can be assigned by users. >>>> >>>> * Versioned temporal table*: If the row in temporal table can track >>>> its history changes and visit its history versions, we called this kind of >>>> temporal table as versioned temporal table. >>>> >>>> * Latest-only temporal table**:* If the row in temporal table can >>>> only track its latest version, we called this kind of temporal table as >>>> latest-only temporal table. The temporal table in lookup join can only >>>> track its latest version and thus it's also a latest-only temporal table. >>>> >>>> Best >>>> Leonard >>>> >>>> 在 2020年8月19日,04:46,Seth Wiesman <sjwies...@gmail.com> 写道: >>>> >>>> +1 to the updated design. >>>> >>>> I agree with Fabian that the naming of "temporal table without version" >>>> is >>>> a bit confusing but the actual semantics make sense to me. I think just >>>> saying its a Flink managed lookup join makes sense. >>>> >>>> Seth >>>> >>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fhue...@gmail.com> >>>> wrote: >>>> >>>> Thanks for the updated FLIP Leonard! >>>> In my opinion this was an improvement. >>>> So +1 for this design. >>>> >>>> I have just one remark regarding the terminology. >>>> I find the term "Temporal Table without Version" somewhat confusing. >>>> IMO, versions are the core principle of temporal tables and temporal >>>> tables without versions don't make much sense to me. >>>> >>>> What makes such a table a "Temporal" table? Isn't it just a regular >>>> table? >>>> If I understand the proposal correctly, "Temporal Tables without >>>> Version" >>>> can only be used in processing time temporal table joins, because this >>>> join >>>> only requests the current version. >>>> But all regular tables can be used in processing time (temporal) table >>>> joins as well. >>>> It's basically the same as a lookup join, with the only difference that >>>> the table is maintained in Flink and not accessed in an external system >>>> (for example via JDBC). >>>> >>>> Are "Temporal Tables without Version" called "Temporal" because they can >>>> be used in "processing time temporal table joins" and due to its name >>>> this >>>> join needs to join something that's called "Temporal"? >>>> In that case, we might want to rename "processing time temporal table >>>> joins" into something else that does not imply a versioning. >>>> Maybe we can call them just lookup joins to avoid introducing another >>>> term? >>>> >>>> Thanks, Fabian >>>> >>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li < >>>> lirui.fu...@gmail.com>: >>>> >>>> Thanks Leonard for the clarifications! >>>> >>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xbjt...@gmail.com> wrote: >>>> >>>> >>>> But are we still able to track different views of such a >>>> table through time, as rows are added/deleted to/from the table? >>>> >>>> >>>> Yes, in fact we support temporal table from changlog which contains all >>>> possible message types(INSERT/UPDATE/DELETE). >>>> >>>> For >>>> example, suppose I have an append-only table source with event-time >>>> >>>> and PK, >>>> >>>> will I be allowed to do an event-time temporal join with this table? >>>> >>>> Yes, I list some examples in the doc, the example versioned_rates3 is >>>> this case exactly. >>>> >>>> Best >>>> Leonard >>>> >>>> >>>> >>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjt...@gmail.com <mailto: >>>> >>>> xbjt...@gmail.com>> wrote: >>>> >>>> >>>> Hi, all >>>> >>>> After a detailed offline discussion about the temporal table related >>>> concept and behavior, we had a reliable solution and rejected several >>>> alternatives. >>>> Compared to rejected alternatives, the proposed approach is a more >>>> >>>> unified >>>> >>>> story and also friendly to user and current Flink framework. >>>> I improved the FLIP[1] with the proposed approach and refactored the >>>> document organization to make it clear enough. >>>> >>>> Please let me know if you have any concerns, I’m looking forward your >>>> comments. >>>> >>>> >>>> Best >>>> Leonard >>>> >>>> [1] >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> >>>> < >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> < >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> >>>> >>>> >>>> >>>> >>>> 在 2020年8月4日,21:25,Leonard Xu <xbjt...@gmail.com <mailto: >>>> >>>> xbjt...@gmail.com>> 写道: >>>> >>>> >>>> Hi, all >>>> >>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`. >>>> >>>> Best >>>> Leonard >>>> [1] >>>> >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> < >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> >>>> >>>> < >>>> >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> < >>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL >>>> >>>> >>>> >>>> >>>> 在 2020年8月4日,20:58,Leonard Xu <xbjt...@gmail.com <mailto: >>>> >>>> xbjt...@gmail.com> <mailto: >>>> >>>> xbjt...@gmail.com <mailto:xbjt...@gmail.com>>> 写道: >>>> >>>> >>>> Hi, Timo >>>> >>>> Thanks for you response. >>>> >>>> 1) Naming: Is operation time a good term for this concept? If I >>>> >>>> read >>>> >>>> "The operation time is the time when the changes happened in system." >>>> >>>> or >>>> >>>> "The system time of DML execution in database", why don't we call it >>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of >>>> >>>> time in >>>> >>>> Flink should be thought through. >>>> >>>> >>>> I agree that we should thought through. I have considered the name >>>> >>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on >>>> >>>> the >>>> >>>> name. >>>> >>>> >>>> I proposed `operationTime` because most changelog comes from >>>> >>>> Database >>>> >>>> and we always called an action as `operation` rather than `change` in >>>> Database, the operation time is easier to understand for database >>>> >>>> users, >>>> >>>> but it's more like a database terminology. >>>> >>>> >>>> For `SystemTime`, user may confuse which one does the system in >>>> >>>> `SystemTime` represents? Flink, Database or CDC tool. Maybe it’s >>>> >>>> not a >>>> >>>> good name. >>>> >>>> >>>> `ChangelogTime` is a pretty choice which is more unified with >>>> >>>> existed >>>> >>>> terminology `Changelog` and `ChangelogMode`, so let me use >>>> >>>> `ChangelogTime` >>>> >>>> and I’ll update the FLIP. >>>> >>>> >>>> >>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also >>>> >>>> expose the concept of time through the user-level `Row` type? The >>>> >>>> FLIP does >>>> >>>> not mention this explictly. I think we can keep it as an internal >>>> >>>> concept >>>> >>>> but I just wanted to ask for clarification. >>>> >>>> >>>> Yes, I want to keep it as an internal concept, we have discussed >>>> >>>> that >>>> >>>> changelog time concept should be the third time concept(the other two >>>> >>>> are >>>> >>>> event-time and processing-time). It’s not easy for normal users(or to >>>> >>>> help >>>> >>>> normal users) understand the three concepts accurately, and I did not >>>> >>>> find >>>> >>>> a big enough scenario that user need to touch the changelog time for >>>> >>>> now, >>>> >>>> so I tend to do not expose the concept to users. >>>> >>>> >>>> >>>> Best, >>>> Leonard >>>> >>>> >>>> >>>> On 04.08.20 04:58, Leonard Xu wrote: >>>> >>>> Thanks Konstantin, >>>> Regarding your questions, hope my comments has address your >>>> >>>> questions >>>> >>>> and I also add a few explanation in the FLIP. >>>> >>>> Thank you all for the feedback, >>>> It seems everyone involved in this thread has reached a >>>> >>>> consensus. >>>> >>>> I will start a vote thread later. >>>> Best, >>>> Leonard >>>> >>>> 在 2020年8月3日,19:35,godfrey he <godfre...@gmail.com <mailto: >>>> >>>> godfre...@gmail.com> <mailto: >>>> >>>> godfre...@gmail.com <mailto:godfre...@gmail.com>>> 写道: >>>> >>>> >>>> Thanks Lennard for driving this FLIP. >>>> Looks good to me. >>>> >>>> Best, >>>> Godfrey >>>> >>>> Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> <mailto: >>>> >>>> imj...@gmail.com <mailto:imj...@gmail.com>>> 于2020年8月3日周一 >>>> >>>> 下午12:04写道: >>>> >>>> >>>> Thanks Leonard for the great FLIP. I think it is in very good >>>> >>>> shape. >>>> >>>> +1 to start a vote. >>>> >>>> Best, >>>> Jark >>>> >>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhue...@gmail.com >>>> >>>> <mailto:fhue...@gmail.com> >>>> >>>> <mailto:fhue...@gmail.com <mailto:fhue...@gmail.com>>> wrote: >>>> >>>> >>>> Hi Leonard, >>>> >>>> Thanks for this FLIP! >>>> Looks good from my side. >>>> >>>> Cheers, Fabian >>>> >>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman < >>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com> <mailto: >>>> >>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com>> >>>> >>>> : >>>> >>>> >>>> Hi Leondard, >>>> >>>> Thank you for pushing this, I think the updated syntax looks >>>> >>>> really >>>> >>>> good >>>> >>>> and the semantics make sense to me. >>>> >>>> +1 >>>> >>>> Seth >>>> >>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu < >>>> >>>> xbjt...@gmail.com <mailto:xbjt...@gmail.com> >>>> >>>> <mailto:xbjt...@gmail.com <mailto:xbjt...@gmail.com>>> wrote: >>>> >>>> >>>> Hi, Konstantin >>>> >>>> >>>> 1) A "Versioned Temporal Table DDL on source" can only be >>>> >>>> joined >>>> >>>> on >>>> >>>> the >>>> >>>> PRIMARY KEY attribute, correct? >>>> >>>> Yes, the PRIMARY KEY would be join key. >>>> >>>> >>>> 2) Isn't it the time attribute in the ORDER BY clause of the >>>> >>>> VIEW >>>> >>>> definition that defines >>>> >>>> whether a event-time or processing time temporal table join >>>> >>>> is >>>> >>>> used? >>>> >>>> >>>> I think event-time or processing-time temporal table join >>>> >>>> depends on >>>> >>>> fact >>>> >>>> table’s time attribute in temporal join rather than from >>>> >>>> temporal >>>> >>>> table >>>> >>>> side, the event-time or processing time in temporal table is >>>> >>>> just >>>> >>>> used >>>> >>>> to >>>> >>>> split the validity period of versioned snapshot of temporal >>>> >>>> table. >>>> >>>> The >>>> >>>> processing time attribute is not necessary for temporal >>>> >>>> table >>>> >>>> without >>>> >>>> version, only the primary key is required, the following >>>> >>>> VIEW is >>>> >>>> also >>>> >>>> valid >>>> >>>> for temporal table without version. >>>> CREATE VIEW latest_rates AS >>>> SELECT currency, LAST_VALUE(rate) -- only keep the >>>> >>>> latest >>>> >>>> version >>>> FROM rates >>>> GROUP BY currency; -- inferred >>>> >>>> primary >>>> >>>> key >>>> >>>> >>>> >>>> >>>> 3) A "Versioned Temporal Table DDL on source" is always >>>> >>>> versioned >>>> >>>> on >>>> >>>> operation_time regardless of the lookup table attribute >>>> >>>> (event-time >>>> >>>> or >>>> >>>> processing time attribute), correct? >>>> >>>> >>>> >>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is >>>> >>>> using the >>>> >>>> o.time >>>> >>>> value to lookup the version of the temporal table. >>>> For fact table has the processing time attribute, it means >>>> >>>> only >>>> >>>> lookup >>>> >>>> the >>>> >>>> latest version of temporal table and we can do some >>>> >>>> optimization >>>> >>>> in >>>> >>>> implementation like only keep the latest version. >>>> >>>> >>>> Best >>>> Leonard >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> Best regards! >>>> Rui Li >>>> >>>> >>>> >>>> >>>> -- >>>> Best regards! >>>> Rui Li >>>> >>>> >>>> >>>> >>> -- Best regards! Rui Li