Hi guys,

Just my two cents.

I agree with Jark that we should use "event-time/processing-time temporal
join" as the name for this join.

But I'm not sure about the definition of "temporal table".

Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
> non-temporal table, which also sounds like contradictions.
>

Why would we require "temporal join" only joins a "temporal table", if the
standard doesn't even define what is a "temporal table"?

What about defining the term "temporal table" to be "a table is
> changing over time", the same as the "dynamic table".
>

I'd prefer not to introduce another term if it has the same meaning as
"dynamic table".

I wonder whether it's possible to consider "temporal join" as a specific
kind of join that may work with regular or versioned tables, instead
of a join that requires a specific kind of table.

On Fri, Aug 21, 2020 at 1:45 PM Jark Wu <imj...@gmail.com> wrote:

> Hi everyone,
>
> Thank you for the great discussion @Leonard and @Fabian.
>
> *Regarding to choose a different name for this join:*
>
> From my point of view, I don't agree to introduce a new grammar called
> whatever "lookup join" or "version join", because:
> 1. "lookup" is a physical behavior not a logical concept.
> 2. The SQL standard has proposed temporal join and it fits Flink well with
> the event-time and processing-time attributes.
> 3. We already have so many different join grammer, e.g. "regular join",
> "interval join", "temporal join", and maybe "window join" in the future.
>     It may confuse users when having more join concepts.
>
> So I think the existing "event-time temporal join" and "processing-time
> temporal join" work well and we should still use them.
>
> *Regarding to the "temporal table without versions":*
>
> I agree there are contradictions between "temporal table" and "temporal
> table without versions" if we think "temporal table" tracks full history.
> However, if we call "temporal table without versions" as a "regular
> table", not a kind of "temporal table".
> Then the "temporal join" (i.e. FOR SYSTEM_TIME AS OF syntax) joins a
> non-temporal table, which also sounds like contradictions.
>
> I also notice that in SQL Server, the "system-versioned temporal table" is
> a combination of "Temporal Table" (current data) and "History Table" [1].
> The "Temporal Table" is the current data of the database which is the same
> as our definition of "temporal table without version". The documentation
> says:
>
> > *This additional table is referred to as the history table, while the
> main table that stores current (actual) row versions is referred to as the
> current table or simply as the temporal table. *
>
> Besides, SQL:2011 doesn't define the meaning of "temporal table" [2], so I
> think we can define the meaning ourselves.
>
> *> Interestingly, SQL:2011 manages to provide this (system-versioned
> table) support without actually defining or using the terms “temporal data”
> or “temporal table”.*
>
> What about defining the term "temporal table" to be "a table is
> changing over time", the same as the "dynamic table".
> A "versioned temporal table" is a special temporal table which tracks the
> full history of the changing table and supports point-in-time access.
> A regular temporal table only supports access to the current data (no
> earlier versions).
>
> In this way, the "temporal join" still makes sense, because the joined
> table is a "temporal table", and all the tables in Flink is "temporal
> table" (or dynamic table).
>
> Best,
> Jark
>
> [1]:
> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15#what-is-a-system-versioned-temporal-table
> [2]:
> https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
>
>
>
>
>
>
> On Thu, 20 Aug 2020 at 22:55, Fabian Hueske <fhue...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> Yes, just to help user distinguish the difference between versioned
>>> temporal table and  latest-only temporal table.
>>>
>>>
>> I don't think we help users to understand the differences if we invent
>> new (IMO confusing) terms ("temporal table without version" or "latest-only
>> temporal table") instead of using known terminology ("table").
>>
>>
>>> As I suggested before, it might make sense to choose a different name
>>> for this join instead of using a fancy name for the tables such that the
>>> name fits the name of the join.
>>>
>>> I think lookup join is a physical implementation of Processing-Time
>>> temporal table join, we can even offer a lookup join implementation for
>>> Event-time temporal table join if the external system support lookup
>>> history data.
>>>
>>> Versioned temporal table can be used in Event-time temporal table join,
>>> all temporal table(including versioned temporal tale and regular table) can
>>> be used in Processing-time temporal table join,
>>> lookup join is a kind of physical implementation of  Processing-time
>>> temporal table join which lookups the external system’s data.
>>>
>>>
>> Lookup Join was just a quick shot proposal without putting much thought
>> into it, but a different name might help to solve the naming issue with
>> "temporal table without versions".
>> Another proposal would be "Version Join" or "Table Version Join".
>>
>>
>>> If we agree regular tables are temporal table, I tend to keep
>>> latest-only temporal table to clarify the temporal and version concept in
>>> regular table.
>>>
>>>
>> In my understanding access to a table's history is the core property of
>> temporal tables (that's also how temporal tables are defined for SQL Server
>> [1] or PostgreSQL [2]).
>> A "temporal table without versions"or "latest-only temporal table" is
>> just a table. It does not have any other property than all the tables we've
>> always dealt with.
>>
>> So I would say that there are temporal tables (for which earlier versions
>> can be accessed) and just tables like those that our users have always used
>> so far (for which there are no earlier versions).
>>
>> Maybe it's just me, but "temporal table without versions" or "latest-only
>> temporal table" sound like contradictions to me, similar to "tiny
>> skyscraper".
>>
>> How about we try to get a few more opinions on this?
>> What do others think about this issue?
>>
>>
>>> Best
>>> Leonard
>>>
>>>
>> Best,
>> Fabian
>>
>> [1]
>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
>> [2] https://pgxn.org/dist/temporal_tables/
>>
>>
>> Cheers,
>>> Fabian
>>>
>>> Am Mi., 19. Aug. 2020 um 04:24 Uhr schrieb Leonard Xu <xbjt...@gmail.com
>>> >:
>>>
>>>> Thanks Fabian and Seth for the feedback
>>>>
>>>> I agree the name “temporal table without version” is less accurate
>>>> because this kind of temporal table has a latest version rather than has no
>>>> version.
>>>>
>>>> How about “Latest-only temporal table” ? The related concept section
>>>> updated as following:
>>>>
>>>> *Temporal Table:* Temporal table is a table that evolves over time, rows
>>>> in temporal table are associated with one or more temporal periods.
>>>>
>>>> *    Version:* A temporal table can split into a set of versioned
>>>> table snapshots, the *version* in table snapshots represents the valid
>>>> life circle of rows, the start time and the end time of the valid period
>>>> can be  assigned by users.
>>>>
>>>> *    Versioned temporal table*: If the row in temporal table can track
>>>> its history changes and visit its history versions, we called this kind of
>>>> temporal table as versioned temporal table.
>>>>
>>>> *    Latest-only temporal table**:* If the row in temporal table can
>>>> only track its latest version, we called this kind of temporal table as
>>>> latest-only temporal table. The temporal table in lookup join can only
>>>> track its latest version and thus it's also a latest-only temporal table.
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>> 在 2020年8月19日,04:46,Seth Wiesman <sjwies...@gmail.com> 写道:
>>>>
>>>> +1 to the updated design.
>>>>
>>>> I agree with Fabian that the naming of "temporal table without version"
>>>> is
>>>> a bit confusing but the actual semantics make sense to me. I think just
>>>> saying its a Flink managed lookup join makes sense.
>>>>
>>>> Seth
>>>>
>>>> On Tue, Aug 18, 2020 at 3:07 PM Fabian Hueske <fhue...@gmail.com>
>>>> wrote:
>>>>
>>>> Thanks for the updated FLIP Leonard!
>>>> In my opinion this was an improvement.
>>>> So +1 for this design.
>>>>
>>>> I have just one remark regarding the terminology.
>>>> I find the term "Temporal Table without Version" somewhat confusing.
>>>> IMO, versions are the core principle of temporal tables and temporal
>>>> tables without versions don't make much sense to me.
>>>>
>>>> What makes such a table a "Temporal" table? Isn't it just a regular
>>>> table?
>>>> If I understand the proposal correctly, "Temporal Tables without
>>>> Version"
>>>> can only be used in processing time temporal table joins, because this
>>>> join
>>>> only requests the current version.
>>>> But all regular tables can be used in processing time (temporal) table
>>>> joins as well.
>>>> It's basically the same as a lookup join, with the only difference that
>>>> the table is maintained in Flink and not accessed in an external system
>>>> (for example via JDBC).
>>>>
>>>> Are "Temporal Tables without Version" called "Temporal" because they can
>>>> be used in "processing time temporal table joins" and due to its name
>>>> this
>>>> join needs to join something that's called "Temporal"?
>>>> In that case, we might want to rename "processing time temporal table
>>>> joins" into something else that does not imply a versioning.
>>>> Maybe we can call them just lookup joins to avoid introducing another
>>>> term?
>>>>
>>>> Thanks, Fabian
>>>>
>>>> Am Di., 18. Aug. 2020 um 04:30 Uhr schrieb Rui Li <
>>>> lirui.fu...@gmail.com>:
>>>>
>>>> Thanks Leonard for the clarifications!
>>>>
>>>> On Mon, Aug 17, 2020 at 9:17 PM Leonard Xu <xbjt...@gmail.com> wrote:
>>>>
>>>>
>>>> But are we still able to track different views of such a
>>>> table through time, as rows are added/deleted to/from the table?
>>>>
>>>>
>>>> Yes, in fact we support temporal table from changlog which contains all
>>>> possible message types(INSERT/UPDATE/DELETE).
>>>>
>>>> For
>>>> example, suppose I have an append-only table source with event-time
>>>>
>>>> and PK,
>>>>
>>>> will I be allowed to do an event-time temporal join with this table?
>>>>
>>>> Yes, I list some examples in the doc, the example versioned_rates3  is
>>>> this case exactly.
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>>
>>>>
>>>> On Wed, Aug 12, 2020 at 3:31 PM Leonard Xu <xbjt...@gmail.com <mailto:
>>>>
>>>> xbjt...@gmail.com>> wrote:
>>>>
>>>>
>>>> Hi, all
>>>>
>>>> After a detailed offline discussion about the temporal table related
>>>> concept and behavior, we had a reliable solution and rejected several
>>>> alternatives.
>>>> Compared to rejected alternatives, the proposed approach is a more
>>>>
>>>> unified
>>>>
>>>> story and also friendly to user and current Flink framework.
>>>> I improved the FLIP[1] with the proposed approach and refactored the
>>>> document organization to make it clear enough.
>>>>
>>>> Please let me know if you have any concerns, I’m looking forward your
>>>> comments.
>>>>
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>> [1]
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>> <
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 在 2020年8月4日,21:25,Leonard Xu <xbjt...@gmail.com <mailto:
>>>>
>>>> xbjt...@gmail.com>> 写道:
>>>>
>>>>
>>>> Hi, all
>>>>
>>>> I’ve updated the FLIP[1] with the terminology `ChangelogTime`.
>>>>
>>>> Best
>>>> Leonard
>>>> [1]
>>>>
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>> <
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>> <
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL
>>>>
>>>>
>>>>
>>>>
>>>> 在 2020年8月4日,20:58,Leonard Xu <xbjt...@gmail.com <mailto:
>>>>
>>>> xbjt...@gmail.com> <mailto:
>>>>
>>>> xbjt...@gmail.com <mailto:xbjt...@gmail.com>>> 写道:
>>>>
>>>>
>>>> Hi, Timo
>>>>
>>>> Thanks for you response.
>>>>
>>>> 1) Naming: Is operation time a good term for this concept? If I
>>>>
>>>> read
>>>>
>>>> "The operation time is the time when the changes happened in system."
>>>>
>>>> or
>>>>
>>>> "The system time of DML execution in database", why don't we call it
>>>> `ChangelogTime` or `SystemTime`? Introducing another terminology of
>>>>
>>>> time in
>>>>
>>>> Flink should be thought through.
>>>>
>>>>
>>>> I agree that we should thought through. I have considered the name
>>>>
>>>> `ChangelogTime` and `SystemTime` too, I don’t have strong opinion on
>>>>
>>>> the
>>>>
>>>> name.
>>>>
>>>>
>>>> I proposed `operationTime` because most changelog comes from
>>>>
>>>> Database
>>>>
>>>> and we always called an action as `operation` rather than `change` in
>>>> Database, the operation time is  easier to understand  for database
>>>>
>>>> users,
>>>>
>>>> but it's more like a database terminology.
>>>>
>>>>
>>>> For `SystemTime`, user may confuse which one does the system in
>>>>
>>>> `SystemTime` represents?  Flink, Database or CDC tool.  Maybe it’s
>>>>
>>>> not a
>>>>
>>>> good name.
>>>>
>>>>
>>>> `ChangelogTime` is a pretty choice which is more unified with
>>>>
>>>> existed
>>>>
>>>> terminology `Changelog` and `ChangelogMode`, so let me use
>>>>
>>>> `ChangelogTime`
>>>>
>>>> and I’ll update the FLIP.
>>>>
>>>>
>>>>
>>>> 2) Exposing it through `org.apache.flink.types.Row`: Shall we also
>>>>
>>>> expose the concept of time through the user-level `Row` type? The
>>>>
>>>> FLIP does
>>>>
>>>> not mention this explictly. I think we can keep it as an internal
>>>>
>>>> concept
>>>>
>>>> but I just wanted to ask for clarification.
>>>>
>>>>
>>>> Yes, I want to keep it as an internal concept, we have discussed
>>>>
>>>> that
>>>>
>>>> changelog time concept should be the third time concept(the other two
>>>>
>>>> are
>>>>
>>>> event-time and processing-time). It’s not easy for normal users(or to
>>>>
>>>> help
>>>>
>>>> normal users) understand the three concepts accurately, and I did not
>>>>
>>>> find
>>>>
>>>> a big enough scenario that user need to touch the changelog time for
>>>>
>>>> now,
>>>>
>>>> so I tend to do not expose the concept to users.
>>>>
>>>>
>>>>
>>>> Best,
>>>> Leonard
>>>>
>>>>
>>>>
>>>> On 04.08.20 04:58, Leonard Xu wrote:
>>>>
>>>> Thanks Konstantin,
>>>> Regarding your questions, hope my comments has address your
>>>>
>>>> questions
>>>>
>>>> and I also add a few explanation in the FLIP.
>>>>
>>>> Thank you all for the feedback,
>>>> It seems everyone involved  in this thread has reached a
>>>>
>>>> consensus.
>>>>
>>>> I will start a vote thread  later.
>>>> Best,
>>>> Leonard
>>>>
>>>> 在 2020年8月3日,19:35,godfrey he <godfre...@gmail.com <mailto:
>>>>
>>>> godfre...@gmail.com> <mailto:
>>>>
>>>> godfre...@gmail.com <mailto:godfre...@gmail.com>>> 写道:
>>>>
>>>>
>>>> Thanks Lennard for driving this FLIP.
>>>> Looks good to me.
>>>>
>>>> Best,
>>>> Godfrey
>>>>
>>>> Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com> <mailto:
>>>>
>>>> imj...@gmail.com <mailto:imj...@gmail.com>>> 于2020年8月3日周一
>>>>
>>>> 下午12:04写道:
>>>>
>>>>
>>>> Thanks Leonard for the great FLIP. I think it is in very good
>>>>
>>>> shape.
>>>>
>>>> +1 to start a vote.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Fri, 31 Jul 2020 at 17:56, Fabian Hueske <fhue...@gmail.com
>>>>
>>>> <mailto:fhue...@gmail.com>
>>>>
>>>> <mailto:fhue...@gmail.com <mailto:fhue...@gmail.com>>> wrote:
>>>>
>>>>
>>>> Hi Leonard,
>>>>
>>>> Thanks for this FLIP!
>>>> Looks good from my side.
>>>>
>>>> Cheers, Fabian
>>>>
>>>> Am Do., 30. Juli 2020 um 22:15 Uhr schrieb Seth Wiesman <
>>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com> <mailto:
>>>>
>>>> sjwies...@gmail.com <mailto:sjwies...@gmail.com>>
>>>>
>>>> :
>>>>
>>>>
>>>> Hi Leondard,
>>>>
>>>> Thank you for pushing this, I think the updated syntax looks
>>>>
>>>> really
>>>>
>>>> good
>>>>
>>>> and the semantics make sense to me.
>>>>
>>>> +1
>>>>
>>>> Seth
>>>>
>>>> On Wed, Jul 29, 2020 at 11:36 AM Leonard Xu <
>>>>
>>>> xbjt...@gmail.com <mailto:xbjt...@gmail.com>
>>>>
>>>> <mailto:xbjt...@gmail.com <mailto:xbjt...@gmail.com>>> wrote:
>>>>
>>>>
>>>> Hi, Konstantin
>>>>
>>>>
>>>> 1) A  "Versioned Temporal Table DDL on source" can only be
>>>>
>>>> joined
>>>>
>>>> on
>>>>
>>>> the
>>>>
>>>> PRIMARY KEY attribute, correct?
>>>>
>>>> Yes, the PRIMARY KEY would be join key.
>>>>
>>>>
>>>> 2) Isn't it the time attribute in the ORDER BY clause of the
>>>>
>>>> VIEW
>>>>
>>>> definition that defines
>>>>
>>>> whether a event-time or processing time temporal table join
>>>>
>>>> is
>>>>
>>>> used?
>>>>
>>>>
>>>> I think event-time or processing-time temporal table join
>>>>
>>>> depends on
>>>>
>>>> fact
>>>>
>>>> table’s time attribute in temporal join rather than from
>>>>
>>>> temporal
>>>>
>>>> table
>>>>
>>>> side, the event-time or processing time in temporal table is
>>>>
>>>> just
>>>>
>>>> used
>>>>
>>>> to
>>>>
>>>> split the validity period of versioned snapshot of temporal
>>>>
>>>> table.
>>>>
>>>> The
>>>>
>>>> processing time attribute is not  necessary for temporal
>>>>
>>>> table
>>>>
>>>> without
>>>>
>>>> version, only the primary key is required, the following
>>>>
>>>> VIEW is
>>>>
>>>> also
>>>>
>>>> valid
>>>>
>>>> for temporal table without version.
>>>> CREATE VIEW latest_rates AS
>>>> SELECT currency, LAST_VALUE(rate)            -- only keep the
>>>>
>>>> latest
>>>>
>>>> version
>>>> FROM rates
>>>> GROUP BY currency;                           -- inferred
>>>>
>>>> primary
>>>>
>>>> key
>>>>
>>>>
>>>>
>>>>
>>>> 3) A "Versioned Temporal Table DDL on source" is always
>>>>
>>>> versioned
>>>>
>>>> on
>>>>
>>>> operation_time regardless of the lookup table attribute
>>>>
>>>> (event-time
>>>>
>>>> or
>>>>
>>>> processing time attribute), correct?
>>>>
>>>>
>>>>
>>>> Yes, the semantics of `FOR SYSTEM_TIME AS OF o.time` is
>>>>
>>>> using the
>>>>
>>>> o.time
>>>>
>>>> value to lookup the version of the temporal table.
>>>> For fact table has the processing time attribute, it means
>>>>
>>>> only
>>>>
>>>> lookup
>>>>
>>>> the
>>>>
>>>> latest version of temporal table and we can do some
>>>>
>>>> optimization
>>>>
>>>> in
>>>>
>>>> implementation like only keep the latest version.
>>>>
>>>>
>>>> Best
>>>> Leonard
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>>
>>>>
>>>>
>>>

-- 
Best regards!
Rui Li

Reply via email to