Hi, all

After a detailed offline discussion about the temporal table related concept 
and behavior, we had a reliable solution and rejected several alternatives.
Compared to rejected alternatives, the proposed approach is a more unified 
story and also friendly to user and current Flink framework.
I improved the FLIP[1] with the proposed approach and refactored the document 
organization to make it clear enough.

Please let me know if you have any concerns, I’m looking forward your comments.


Best
Leonard

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>


> 在 2020年8月4日,21:25,Leonard Xu <xbjt...@gmail.com> 写道:
> 
> Hi, all
> 
> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
> 
> Best
> Leonard
> [1]  
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL 
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL>
> 
>> 在 2020年8月4日,20:58,Leonard Xu <xbjt...@gmail.com <mailto:xbjt...@gmail.com>> 
>> 写道:
>> 
>> Hi, Timo
>> 
>> Thanks for you response.
>> 
>>> 1) Naming: Is operation time a good term for this concept? If I read "The 
>>> operation time is the time when the changes happened in system." or "The 
>>> system time of DML execution in database", why don't we call it 
>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of time in 
>>> Flink should be thought through.
>> 
>> I agree that we should thought through. I have considered the name 
>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on the 
>> name.
>> 
>> I proposed `operationTime` because most changelog comes from Database and we 
>> always called an action as `operation` rather than `change` in Database, the 
>> operation time is  easier to understand  for database users, but it's more 
>> like a database terminology. 
>> 
>> For `SystemTime`, user may confuse which one does the system in `SystemTime` 
>> represents?  Flink, Database or CDC tool.  Maybe it’s not a good name.
>> 
>> `ChangelogTime` is a pretty choice which is more unified with existed 
>> terminology `Changelog` and `ChangelogMode`, so let me use `ChangelogTime` 
>> and I’ll update the FLIP.
>> 
>> 
>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also expose 
>>> the concept of time through the user-level `Row` type? The FLIP does not 
>>> mention this explictly. I think we can keep it as an internal concept but I 
>>> just wanted to ask for clarification.
>> 
>> Yes, I want to keep it as an internal concept, we have discussed that 
>> changelog time concept should be the third time concept(the other two are 
>> event-time and processing-time). It’s not easy for normal users(or to help 
>> normal users) understand the three concepts accurately, and I did not find a 
>> big enough scenario that user need to touch the changelog time for now, so I 
>> tend to do not expose the concept to users.
>> 
>> 
>> Best,
>> Leonard
>> 
>> 
>>> 
>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>> Thanks Konstantin,
>>>> Regarding your questions, hope my comments has address your questions and 
>>>> I also add a few explanation in the FLIP.
>>>> Thank you all for the feedback,
>>>> It seems everyone involved  in this thread has reached a consensus.
>>>> I will start a vote thread  later.
>>>> Best,
>>>> Leonard
>>>>> 在 2020年8月3日,19:35,godfrey he <godfre...@gmail.com 
>>>>> <mailto:godfre...@gmail.com>> 写道:
>>>>> 
>>>>> Thanks Lennard for driving this FLIP.
>>>>> Looks good to me.
>>>>> 
>>>>> Best,
>>>>> Godfrey
>>>>> 
>>>>> Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> 于2020年8月3日周一 
>>>>> 下午12:04写道:
>>>>> 
>>>>>> Thanks Leonard for the great FLIP. I think it is in very good shape.
>>>>>> +1 to start a vote.
>>>>>> 
>>>>>> Best,
>>>>>> Jark
>>>>>> 
>>>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhue...@gmail.com 
>>>>>> <mailto:fhue...@gmail.com>> wrote:
>>>>>> 
>>>>>>> Hi Leonard,
>>>>>>> 
>>>>>>> Thanks for this FLIP!
>>>>>>> Looks good from my side.
>>>>>>> 
>>>>>>> Cheers, Fabian
>>>>>>> 
>>>>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>>>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com>
>>>>>>>> :
>>>>>>> 
>>>>>>>> Hi Leondard,
>>>>>>>> 
>>>>>>>> Thank you for pushing this, I think the updated syntax looks really
>>>>>> good
>>>>>>>> and the semantics make sense to me.
>>>>>>>> 
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Seth
>>>>>>>> 
>>>>>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <xbjt...@gmail.com 
>>>>>>>> <mailto:xbjt...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>>> Hi, Konstantin
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 1) A  "Versioned Temporal Table DDL on source" can only be joined
>>>>>> on
>>>>>>>> the
>>>>>>>>>> PRIMARY KEY attribute, correct?
>>>>>>>>> Yes, the PRIMARY KEY would be join key.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2) Isn't it the time attribute in the ORDER BY clause of the VIEW
>>>>>>>>> definition that defines
>>>>>>>>>> whether a event-time or processing time temporal table join is
>>>>>> used?
>>>>>>>>> 
>>>>>>>>> I think event-time or processing-time temporal table join depends on
>>>>>>> fact
>>>>>>>>> table’s time attribute in temporal join rather than from temporal
>>>>>> table
>>>>>>>>> side, the event-time or processing time in temporal table is just
>>>>>> used
>>>>>>> to
>>>>>>>>> split the validity period of versioned snapshot of temporal table.
>>>>>> The
>>>>>>>>> processing time attribute is not  necessary for temporal table
>>>>>> without
>>>>>>>>> version, only the primary key is required, the following VIEW is also
>>>>>>>> valid
>>>>>>>>> for temporal table without version.
>>>>>>>>> CREATE VIEW latest_rates AS
>>>>>>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the latest
>>>>>>>>> version
>>>>>>>>> FROM rates
>>>>>>>>> GROUP BY currency;                           -- inferred primary key
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 3) A "Versioned Temporal Table DDL on source" is always versioned
>>>>>> on
>>>>>>>>>> operation_time regardless of the lookup table attribute (event-time
>>>>>>> or
>>>>>>>>>> processing time attribute), correct?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is using the
>>>>>>> o.time
>>>>>>>>> value to lookup the version of the temporal table.
>>>>>>>>> For fact table has the processing time attribute, it means only
>>>>>> lookup
>>>>>>>> the
>>>>>>>>> latest version of temporal table and we can do some optimization in
>>>>>>>>> implementation like only keep the latest version.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> Leonard
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>> 
> 

Reply via email to