[jira] [Commented] (FLINK-4937) Add incremental group window aggregation for streaming Table API

ASF GitHub Bot (JIRA) Sun, 20 Nov 2016 01:12:08 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680835#comment-15680835
 ]


ASF GitHub Bot commented on FLINK-4937:
---------------------------------------

Github user wuchong commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2792#discussion_r88796985
  
    --- Diff: 
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/aggregate/IncrementalAggregateReduceFunction.scala
 ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.flink.api.table.runtime.aggregate
    +
    +import org.apache.flink.api.common.functions.ReduceFunction
    +import org.apache.flink.api.table.Row
    +import org.apache.flink.util.Preconditions
    +
    +/**
    +  * For Incremental intermediate aggregate Rows, merge every row into 
aggregate buffer.
    +  *
    +  * @param aggregates   The aggregate functions.
    +  * @param groupKeysMapping The index mapping of group keys between 
intermediate aggregate Row
    +  *                         and output Row.
    +  */
    +class IncrementalAggregateReduceFunction(
    +    private val aggregates: Array[Aggregate[_]],
    +    private val groupKeysMapping: Array[(Int, Int)],
    +    private val intermediateRowArity: Int)extends ReduceFunction[Row] {
    +
    +  Preconditions.checkNotNull(aggregates)
    +  Preconditions.checkNotNull(groupKeysMapping)
    +  @transient var accumulatorRow:Row = _
    +
    +  /**
    +    * For Incremental intermediate aggregate Rows, merge value1 and value2
    +    * into aggregate buffer, return aggregate buffer.
    +    *
    +    * @param value1 The first value to combined.
    +    * @param value2 The second value to combined.
    +    * @return The combined value of both input values.
    +    *
    +    */
    +  override def reduce(value1: Row, value2: Row): Row = {
    +
    +    if(null == accumulatorRow){
    +      accumulatorRow = new Row(intermediateRowArity)
    +    }
    +
    +    // Initiate intermediate aggregate value.
    +    aggregates.foreach(_.initiate(accumulatorRow))
    --- End diff --
    
    Hi @fhueske ,  you are right, in case of sliding window, the result will be 
incorrect. But the `accumulatorRow` way has the same problem, because the same 
`accumulatorRow` object is used in multiple windows as reduce state.
    
    Try this case 
    
    ```scala
    val data = List(
        (2L, 2, "Hello"),
        (3L, 2, "Hello"),
        (4L, 2, "Hello"))
    
    val stream = env
          .fromCollection(data)
          .assignTimestampsAndWatermarks(new TimestampWithEqualWatermark())
    val table = stream.toTable(tEnv, 'long, 'int, 'string)
    
    val windowedTable = table
          .groupBy('string)
          .window(Slide over 10.milli every 5.milli on 'rowtime as 'w)
          .select('string, 'int.count, 'w.start, 'w.end, 'w.start)
    ```
    
    The expected result should be 
    
    ```
    "Hello,3,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005,1969-12-31 
23:59:59.995",
    "Hello,3,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01,1970-01-01 00:00:00.0"
    ```
    
    But actually it is 
    
    ```
    "Hello,4,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005,1969-12-31 
23:59:59.995",
    "Hello,4,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01,1970-01-01 00:00:00.0"
    ```
    
    I think it is a bug of `HeapReducingState` that the element put into (or 
get) State should be always a copy.  @aljoscha  what do you think about this ? 
    
    
    



> Add incremental group window aggregation for streaming Table API
> ----------------------------------------------------------------
>
>                 Key: FLINK-4937
>                 URL: https://issues.apache.org/jira/browse/FLINK-4937
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>    Affects Versions: 1.2.0
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Group-window aggregates for streaming tables are currently not done in an 
> incremental fashion. This means that the window collects all records and 
> performs the aggregation when the window is closed instead of eagerly 
> updating a partial aggregate for every added record. Since records are 
> buffered, non-incremental aggregation requires more storage space than 
> incremental aggregation.
> The DataStream API which is used under the hood of the streaming Table API 
> features [incremental 
> aggregation|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#windowfunction-with-incremental-aggregation]
>  using a {{ReduceFunction}}.
> We should add support for incremental aggregation in group-windows.
> This is a follow-up task of FLINK-4691.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4937) Add incremental group window aggregation for streaming Table API

Reply via email to