leonardBang commented on a change in pull request #15642: URL: https://github.com/apache/flink/pull/15642#discussion_r616316388
########## File path: docs/content.zh/docs/dev/table/sql/queries/window-tvf.md ########## @@ -0,0 +1,264 @@ +--- +title: "窗口函数" +weight: 6 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Windowing table-valued functions (Windowing TVFs) + +{{< label Streaming >}} + +Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations. This document focuses on how windowing is performed in Flink SQL and how the programmer can benefit to the maximum from its offered functionality. + +Apache Flink provides several windowing table-valued functions (TVF) to divide the elements of your table into windows, including: + +- [Tumble Windows](#tumble) +- [Hop Windows](#hop) +- [Cumulate Windows](#cumulate) +- Session Windows (will be supported soon) + +Note that each element can logically belong to more than one window, depending on the windowing table-valued function you use. For example, HOP windowing creates overlapping windows wherein a single element can be assigned to multiple windows. + +Windowing TVFs are Flink defined Polymorphic Table Functions (abbreviated PTF). PTF is the part of the SQL 2016 standard which is a special table-function, but can have table as parameter. PTF is a powerful feature to change the shape of a table. Because PTFs are semantically used like tables, their invocation occurs in a `FROM` clause of a `SELECT` statement. + +Windowing TVFs is a replacement of legacy [Grouped Window Functions]({{< ref "docs/dev/table/sql/queries/window-agg" >}}#group-window-aggregation-deprecated). Windowing TVFs is more SQL standard compliant and more powerful to support complex window-based computations, e.g. Window TopN, Window Join. However, [Grouped Window Functions]({{< ref "docs/dev/table/sql/queries/window-agg" >}}#group-window-aggregation) can only support Window Aggregation. + +See more how to apply further computations based on windowing TVF: +- [Window Aggregation]({{< ref "docs/dev/table/sql/queries/window-agg" >}}) +- [Window TopN]({{< ref "docs/dev/table/sql/queries/window-topn">}}) +- Window Join (will be supported soon) + +## Window Functions + +Apache Flink provides 3 built-in windowing TVFs: TUMBLE, `HOP` and `CUMULATE`. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The "window_time" field is a [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) of the window after windowing TVF which can be used in subsequent time-based operations, e.g. another windowing TVF, or <a href="{{< ref "docs/dev/table/sql/queries/joins" >}}#interval-joins">interval joins</a>, <a href="{{< ref "docs/dev/table/sql/queries/over-agg" >}}">over aggregations</a>. The value of `window_time` always equal to `window_end - 1ms`. + +### TUMBLE + +The `TUMBLE` function assigns each element to a window of a specified window size. Tumbling windows have a fixed size and do not overlap. For example, if you specify a tumbling window with a size of 5 minutes, the current window will be evaluated and a new window will be started every five minutes as illustrated by the following figure. + +{{< img src="/fig/tumbling-windows.svg" alt="Tumbling Windows" width="70%">}} + +The `TUMBLE` function assigns a window for each row of a relation based on a [time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column. The return value of `TUMBLE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF. + +`TUMBLE` function takes three required parameters: + +```sql +TUMBLE(TABLE data, DESCRIPTOR(timecol), size) +``` + +- `data`: is a table parameter that can be any relation with a time attribute column. +- `timecol`: is a column descriptor indicating which [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column of data should be mapped to tumbling windows. +- `size`: is a duration specifying the width of the tumbling windows. + +Here is an example invocation on the `Bid` table (we simplify `bidtime` only to show the time part): + +```sql +> SELECT * FROM Bid; +--+---------+-------+------+ +--| bidtime | price | item | +--+---------+-------+------+ +--| 08:07 | $2 | A | +--| 08:11 | $3 | B | +--| 08:05 | $4 | C | +--| 08:09 | $5 | D | +--| 08:13 | $1 | E | +--| 08:17 | $6 | F | +--+---------+-------+------+ + +> SELECT * FROM TABLE( + TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES)); +-- or with the named params +-- note: the DATA param must be the first +> SELECT * FROM TABLE( + TUMBLE( + DATA => TABLE Bid, + TIMECOL => DESCRIPTOR(bidtime), + SIZE => INTERVAL '10' MINUTES)); Review comment: Please add note that `Note: Currently Flink doesn't support individual window table-valued function, window table-valued function should use with aggregate operation, here we just for explanation the data produced by table-valued function` ########## File path: docs/content.zh/docs/dev/table/sql/queries/window-tvf.md ########## @@ -0,0 +1,264 @@ +--- +title: "窗口函数" +weight: 6 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Windowing table-valued functions (Windowing TVFs) + +{{< label Streaming >}} + +Windows are at the heart of processing infinite streams. Windows split the stream into “buckets” of finite size, over which we can apply computations. This document focuses on how windowing is performed in Flink SQL and how the programmer can benefit to the maximum from its offered functionality. + +Apache Flink provides several windowing table-valued functions (TVF) to divide the elements of your table into windows, including: + +- [Tumble Windows](#tumble) +- [Hop Windows](#hop) +- [Cumulate Windows](#cumulate) +- Session Windows (will be supported soon) + +Note that each element can logically belong to more than one window, depending on the windowing table-valued function you use. For example, HOP windowing creates overlapping windows wherein a single element can be assigned to multiple windows. + +Windowing TVFs are Flink defined Polymorphic Table Functions (abbreviated PTF). PTF is the part of the SQL 2016 standard which is a special table-function, but can have table as parameter. PTF is a powerful feature to change the shape of a table. Because PTFs are semantically used like tables, their invocation occurs in a `FROM` clause of a `SELECT` statement. + +Windowing TVFs is a replacement of legacy [Grouped Window Functions]({{< ref "docs/dev/table/sql/queries/window-agg" >}}#group-window-aggregation-deprecated). Windowing TVFs is more SQL standard compliant and more powerful to support complex window-based computations, e.g. Window TopN, Window Join. However, [Grouped Window Functions]({{< ref "docs/dev/table/sql/queries/window-agg" >}}#group-window-aggregation) can only support Window Aggregation. + +See more how to apply further computations based on windowing TVF: +- [Window Aggregation]({{< ref "docs/dev/table/sql/queries/window-agg" >}}) +- [Window TopN]({{< ref "docs/dev/table/sql/queries/window-topn">}}) +- Window Join (will be supported soon) + +## Window Functions + +Apache Flink provides 3 built-in windowing TVFs: TUMBLE, `HOP` and `CUMULATE`. The return value of windowing TVF is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The "window_time" field is a [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) of the window after windowing TVF which can be used in subsequent time-based operations, e.g. another windowing TVF, or <a href="{{< ref "docs/dev/table/sql/queries/joins" >}}#interval-joins">interval joins</a>, <a href="{{< ref "docs/dev/table/sql/queries/over-agg" >}}">over aggregations</a>. The value of `window_time` always equal to `window_end - 1ms`. + +### TUMBLE + +The `TUMBLE` function assigns each element to a window of a specified window size. Tumbling windows have a fixed size and do not overlap. For example, if you specify a tumbling window with a size of 5 minutes, the current window will be evaluated and a new window will be started every five minutes as illustrated by the following figure. + +{{< img src="/fig/tumbling-windows.svg" alt="Tumbling Windows" width="70%">}} + +The `TUMBLE` function assigns a window for each row of a relation based on a [time attribute]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column. The return value of `TUMBLE` is a new relation that includes all columns of original relation as well as additional 3 columns named "window_start", "window_end", "window_time" to indicate the assigned window. The original time attribute "timecol" will be a regular timestamp column after window TVF. + +`TUMBLE` function takes three required parameters: + +```sql +TUMBLE(TABLE data, DESCRIPTOR(timecol), size) +``` + +- `data`: is a table parameter that can be any relation with a time attribute column. +- `timecol`: is a column descriptor indicating which [time attributes]({{< ref "docs/dev/table/concepts/time_attributes" >}}) column of data should be mapped to tumbling windows. +- `size`: is a duration specifying the width of the tumbling windows. + +Here is an example invocation on the `Bid` table (we simplify `bidtime` only to show the time part): + +```sql +> SELECT * FROM Bid; +--+---------+-------+------+ +--| bidtime | price | item | +--+---------+-------+------+ +--| 08:07 | $2 | A | +--| 08:11 | $3 | B | +--| 08:05 | $4 | C | +--| 08:09 | $5 | D | +--| 08:13 | $1 | E | +--| 08:17 | $6 | F | +--+---------+-------+------+ + +> SELECT * FROM TABLE( + TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' MINUTES)); +-- or with the named params +-- note: the DATA param must be the first +> SELECT * FROM TABLE( + TUMBLE( + DATA => TABLE Bid, + TIMECOL => DESCRIPTOR(bidtime), + SIZE => INTERVAL '10' MINUTES)); +--+---------+-------+------+--------------+------------+--------------+ +--| bidtime | price | item | window_start | window_end | window_time | +--+---------+-------+------+--------------+------------+--------------+ +--| 08:07 | $2 | A | 08:00 | 08:10 | 08:09:59.999 | +--| 08:11 | $3 | B | 08:10 | 08:20 | 08:19:59.999 | +--| 08:05 | $4 | C | 08:00 | 08:10 | 08:09:59.999 | +--| 08:09 | $5 | D | 08:00 | 08:10 | 08:09:59.999 | +--| 08:13 | $1 | E | 08:10 | 08:20 | 08:19:59.999 | +--| 08:17 | $6 | F | 08:10 | 08:20 | 08:19:59.999 | +--+---------+-------+------+--------------+------------+--------------+ + +-- apply aggregation on the tumbling windowed table +> SELECT window_start, window_end, SUM(price) Review comment: nit: we can use `TIMESTAMP` for `bidtime` and `DECIMAL` for price so that user can reproduce this case easily -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org