Hi, all After a detailed offline discussion about the temporal table related concept and behavior, we had a reliable solution and rejected several alternatives. Compared to rejected alternatives, the proposed approach is a more unified story and also friendly to user and current Flink framework. I improved the FLIP[1] with the proposed approach and refactored the document organization to make it clear enough.
Please let me know if you have any concerns, I’m looking forward your comments. Best Leonard [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL> > 在 2020年8月4日,21:25,Leonard Xu <xbjt...@gmail.com> 写道: > > Hi, all > > I’ve updated the FLIP[1] with the terminology `ChangelogTime`. > > Best > Leonard > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL> > >> 在 2020年8月4日,20:58,Leonard Xu <xbjt...@gmail.com <mailto:xbjt...@gmail.com>> >> 写道: >> >> Hi, Timo >> >> Thanks for you response. >> >>> 1) Naming: Is operation time a good term for this concept? If I read "The >>> operation time is the time when the changes happened in system." or "The >>> system time of DML execution in database", why don't we call it >>> `ChangelogTime` or `SystemTime`? Introducing another terminology of time in >>> Flink should be thought through. >> >> I agree that we should thought through. I have considered the name >> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the >> name. >> >> I proposed `operationTime` because most changelog comes from Database and we >> always called an action as `operation` rather than `change` in Database, the >> operation time is easier to understand for database users, but it's more >> like a database terminology. >> >> For `SystemTime`, user may confuse which one does the system in `SystemTime` >> represents? Flink, Database or CDC tool. Maybe it’s not a good name. >> >> `ChangelogTime` is a pretty choice which is more unified with existed >> terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` >> and I’ll update the FLIP. >> >> >>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also expose >>> the concept of time through the user-level `Row` type? The FLIP does not >>> mention this explictly. I think we can keep it as an internal concept but I >>> just wanted to ask for clarification. >> >> Yes, I want to keep it as an internal concept, we have discussed that >> changelog time concept should be the third time concept(the other two are >> event-time and processing-time). It’s not easy for normal users(or to help >> normal users) understand the three concepts accurately, and I did not find a >> big enough scenario that user need to touch the changelog time for now, so I >> tend to do not expose the concept to users. >> >> >> Best, >> Leonard >> >> >>> >>> On 04.08.20 04:58, Leonard Xu wrote: >>>> Thanks Konstantin, >>>> Regarding your questions, hope my comments has address your questions and >>>> I also add a few explanation in the FLIP. >>>> Thank you all for the feedback, >>>> It seems everyone involved in this thread has reached a consensus. >>>> I will start a vote thread later. >>>> Best, >>>> Leonard >>>>> 在 2020年8月3日,19:35,godfrey he <godfre...@gmail.com >>>>> <mailto:godfre...@gmail.com>> 写道: >>>>> >>>>> Thanks Lennard for driving this FLIP. >>>>> Looks good to me. >>>>> >>>>> Best, >>>>> Godfrey >>>>> >>>>> Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> 于2020年8月3日周一 >>>>> 下午12:04写道: >>>>> >>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape. >>>>>> +1 to start a vote. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhue...@gmail.com >>>>>> <mailto:fhue...@gmail.com>> wrote: >>>>>> >>>>>>> Hi Leonard, >>>>>>> >>>>>>> Thanks for this FLIP! >>>>>>> Looks good from my side. >>>>>>> >>>>>>> Cheers, Fabian >>>>>>> >>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman < >>>>>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com> >>>>>>>> : >>>>>>> >>>>>>>> Hi Leondard, >>>>>>>> >>>>>>>> Thank you for pushing this, I think the updated syntax looks really >>>>>> good >>>>>>>> and the semantics make sense to me. >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Seth >>>>>>>> >>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjt...@gmail.com >>>>>>>> <mailto:xbjt...@gmail.com>> wrote: >>>>>>>> >>>>>>>>> Hi, Konstantin >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 1) A "Versioned Temporal Table DDL on source" can only be joined >>>>>> on >>>>>>>> the >>>>>>>>>> PRIMARY KEY attribute, correct? >>>>>>>>> Yes, the PRIMARY KEY would be join key. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW >>>>>>>>> definition that defines >>>>>>>>>> whether a event-time or processing time temporal table join is >>>>>> used? >>>>>>>>> >>>>>>>>> I think event-time or processing-time temporal table join depends on >>>>>>> fact >>>>>>>>> table’s time attribute in temporal join rather than from temporal >>>>>> table >>>>>>>>> side, the event-time or processing time in temporal table is just >>>>>> used >>>>>>> to >>>>>>>>> split the validity period of versioned snapshot of temporal table. >>>>>> The >>>>>>>>> processing time attribute is not necessary for temporal table >>>>>> without >>>>>>>>> version, only the primary key is required, the following VIEW is also >>>>>>>> valid >>>>>>>>> for temporal table without version. >>>>>>>>> CREATE VIEW latest_rates AS >>>>>>>>> SELECT currency, LAST_VALUE(rate) -- only keep the latest >>>>>>>>> version >>>>>>>>> FROM rates >>>>>>>>> GROUP BY currency; -- inferred primary key >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned >>>>>> on >>>>>>>>>> operation_time regardless of the lookup table attribute (event-time >>>>>>> or >>>>>>>>>> processing time attribute), correct? >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the >>>>>>> o.time >>>>>>>>> value to lookup the version of the temporal table. >>>>>>>>> For fact table has the processing time attribute, it means only >>>>>> lookup >>>>>>>> the >>>>>>>>> latest version of temporal table and we can do some optimization in >>>>>>>>> implementation like only keep the latest version. >>>>>>>>> >>>>>>>>> >>>>>>>>> Best >>>>>>>>> Leonard >>>>>>>> >>>>>>> >>>>>> >>> >> >