[ https://issues.apache.org/jira/browse/FLINK-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629091#comment-16629091 ]
ASF GitHub Bot commented on FLINK-9712: --------------------------------------- fhueske commented on a change in pull request #6741: [FLINK-9712][table,docs] Document processing time Temporal Table Joins URL: https://github.com/apache/flink/pull/6741#discussion_r220635686 ########## File path: docs/dev/table/streaming/joins.md ########## @@ -0,0 +1,93 @@ +--- +title: "Joins" +nav-parent_id: streaming_tableapi +nav-pos: 3 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In batch processing joins are relatively easy, since we are working on a bounded completed data sets. +In stream processing things are a little bit more complicated, especially when it comes to the issue how to handle that data can change over time. +Because of that, there are couple of ways to actually perform the join using either Table API or SQL. + +For more information regarding syntax please check Joins sections in [Table API](../tableApi.html#joins) and [SQL](../sql.html#joins). + +* This will be replaced by the TOC +{:toc} + +Regular Joins +------------- + +This is the most basic case in which any new records or changes to either side of the join input are visible and are affecting whole join result. +If there is a new record on the left side, it will be joined with all of the previous and future records on the other side. + +Such semantic has an important limitation: +it requires to keep both sides of the join input on the state indefinitely and resource usage will grow indefinitely as well. + +Example: +{% highlight sql %} +SELECT * FROM Orders +INNER JOIN Product +ON Orders.productId = Product.id +{% endhighlight %} + +Time-windowed Joins +------------------- + +In this case we are restricting scope of the join to some time window. +This allows Flink to remove old values from the state (using [watermarks](time_attributes.html) without affecting the correctness of the result. + +Example: +{% highlight sql %} +SELECT * +FROM + Orders o, + Shipments s +WHERE o.id = s.orderId AND + o.ordertime BETWEEN s.shiptime - INTERVAL '4' HOUR AND s.shiptime +{% endhighlight %} + +Temporal Table Joins +-------------------- + +Temporal Table Joins allow to join a stream (left/probe side) with a table (right/build side) that changes over time. +For each record from the probe side, it will be joined only with the latest version of the build side. +That means (in contrast to [Regular Joins](#regular-joins)) if there is a new record on the build side, +it will not affect the previous past results of the join. +This again allow Flink to limit the number of elements that must be kept on the state. +In order to support updates (overwrites) of previous values on the build side table, this table must define a primary key. + +Compared to [Time-windowed Joins](#time-windowed-joins), +Temporal Table Joins are not defining a time window within which bounds the records will be joined. +Records from the probe side are joined with the most recent versions of the build side and records on the build side might be arbitrary old. Review comment: We need to rephrase "most recent version" once we support event time. Maybe it makes sense to explain this a bit more general. This would also help to understand the syntax better (why do we need to pass a proc time attribute into the temporal table function?) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Support enrichment joins in Flink SQL/Table API > ----------------------------------------------- > > Key: FLINK-9712 > URL: https://issues.apache.org/jira/browse/FLINK-9712 > Project: Flink > Issue Type: New Feature > Components: Table API & SQL > Affects Versions: 1.5.0 > Reporter: Piotr Nowojski > Assignee: Piotr Nowojski > Priority: Major > Labels: pull-request-available > > As described here: > https://docs.google.com/document/d/1KaAkPZjWFeu-ffrC9FhYuxE6CIKsatHTTxyrxSBR8Sk/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)