[
https://issues.apache.org/jira/browse/FLINK-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680835#comment-15680835
]
ASF GitHub Bot commented on FLINK-4937:
---------------------------------------
Github user wuchong commented on a diff in the pull request:
https://github.com/apache/flink/pull/2792#discussion_r88796985
--- Diff:
flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/runtime/aggregate/IncrementalAggregateReduceFunction.scala
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.table.runtime.aggregate
+
+import org.apache.flink.api.common.functions.ReduceFunction
+import org.apache.flink.api.table.Row
+import org.apache.flink.util.Preconditions
+
+/**
+ * For Incremental intermediate aggregate Rows, merge every row into
aggregate buffer.
+ *
+ * @param aggregates The aggregate functions.
+ * @param groupKeysMapping The index mapping of group keys between
intermediate aggregate Row
+ * and output Row.
+ */
+class IncrementalAggregateReduceFunction(
+ private val aggregates: Array[Aggregate[_]],
+ private val groupKeysMapping: Array[(Int, Int)],
+ private val intermediateRowArity: Int)extends ReduceFunction[Row] {
+
+ Preconditions.checkNotNull(aggregates)
+ Preconditions.checkNotNull(groupKeysMapping)
+ @transient var accumulatorRow:Row = _
+
+ /**
+ * For Incremental intermediate aggregate Rows, merge value1 and value2
+ * into aggregate buffer, return aggregate buffer.
+ *
+ * @param value1 The first value to combined.
+ * @param value2 The second value to combined.
+ * @return The combined value of both input values.
+ *
+ */
+ override def reduce(value1: Row, value2: Row): Row = {
+
+ if(null == accumulatorRow){
+ accumulatorRow = new Row(intermediateRowArity)
+ }
+
+ // Initiate intermediate aggregate value.
+ aggregates.foreach(_.initiate(accumulatorRow))
--- End diff --
Hi @fhueske , you are right, in case of sliding window, the result will be
incorrect. But the `accumulatorRow` way has the same problem, because the same
`accumulatorRow` object is used in multiple windows as reduce state.
Try this case
```scala
val data = List(
(2L, 2, "Hello"),
(3L, 2, "Hello"),
(4L, 2, "Hello"))
val stream = env
.fromCollection(data)
.assignTimestampsAndWatermarks(new TimestampWithEqualWatermark())
val table = stream.toTable(tEnv, 'long, 'int, 'string)
val windowedTable = table
.groupBy('string)
.window(Slide over 10.milli every 5.milli on 'rowtime as 'w)
.select('string, 'int.count, 'w.start, 'w.end, 'w.start)
```
The expected result should be
```
"Hello,3,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005,1969-12-31
23:59:59.995",
"Hello,3,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01,1970-01-01 00:00:00.0"
```
But actually it is
```
"Hello,4,1969-12-31 23:59:59.995,1970-01-01 00:00:00.005,1969-12-31
23:59:59.995",
"Hello,4,1970-01-01 00:00:00.0,1970-01-01 00:00:00.01,1970-01-01 00:00:00.0"
```
I think it is a bug of `HeapReducingState` that the element put into (or
get) State should be always a copy. @aljoscha what do you think about this ?
> Add incremental group window aggregation for streaming Table API
> ----------------------------------------------------------------
>
> Key: FLINK-4937
> URL: https://issues.apache.org/jira/browse/FLINK-4937
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Affects Versions: 1.2.0
> Reporter: Fabian Hueske
> Assignee: sunjincheng
>
> Group-window aggregates for streaming tables are currently not done in an
> incremental fashion. This means that the window collects all records and
> performs the aggregation when the window is closed instead of eagerly
> updating a partial aggregate for every added record. Since records are
> buffered, non-incremental aggregation requires more storage space than
> incremental aggregation.
> The DataStream API which is used under the hood of the streaming Table API
> features [incremental
> aggregation|https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#windowfunction-with-incremental-aggregation]
> using a {{ReduceFunction}}.
> We should add support for incremental aggregation in group-windows.
> This is a follow-up task of FLINK-4691.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)