Re: Accessing columns from input stream table during Window operations

Guowei Ma Mon, 19 Apr 2021 00:40:23 -0700

Hi, Sumeet

Maybe you could try the Over Windows[1], which could keep the
"non-group-key" column.


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#over-windows

Best,
Guowei


On Mon, Apr 19, 2021 at 3:25 PM Sumeet Malhotra <sumeet.malho...@gmail.com>
wrote:

> Thanks Guowei! Regarding "input.c.avg" you're right, that doesn't cause
> any issues. It's only when I want to use "input.b".
>
> My use case is to basically emit "input.b" in the final sink as is, and
> not really perform any aggregation on that column - more like pass through
> from input to sink. What's the best way to achieve this? I was thinking
> that making it part of the select() clause would do it, but as you said
> there needs to be some aggregation performed on it.
>
> Thanks,
> Sumeet
>
>
> On Mon, Apr 19, 2021 at 12:26 PM Guowei Ma <guowei....@gmail.com> wrote:
>
>> Hi, Sumeet
>>       For "input.b" I think you should aggregate the non-group-key
>> column[1].
>> But I am not sure why the "input.c.avg.alias('avg_value')"  has resolved
>> errors. Would you mind giving more detailed error information?
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#group-windows
>>
>> Best,
>> Guowei
>>
>>
>> On Mon, Apr 19, 2021 at 2:24 PM Sumeet Malhotra <
>> sumeet.malho...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a use case where I'm creating a Tumbling window as follows:
>>>
>>> "input" table has columns [Timestamp, a, b, c]
>>>
>>> input \
>>>     .window(Tumble.over(lit(10).seconds).on(input.Timestamp).alias("w"))
>>> \
>>>     .group_by(col('w'), input.a) \
>>>     .select(
>>>         col('w').start.alias('window_start'),
>>>         col('w').end.alias('window_end'),
>>>         input.b,
>>>         input.c.avg.alias('avg_value')) \
>>>     .execute_insert('MySink') \
>>>     .wait()
>>>
>>> This throws an exception that it cannot resolve the fields "b" and "c"
>>> inside the select statement. If I mention these column names inside the
>>> group_by() statement as follows:
>>>
>>> .group_by(col('w'), input.a, input.b, input.c)
>>>
>>> then the column names in the subsequent select statement can be resolved.
>>>
>>> Basically, unless the column name is explicitly made part of the
>>> group_by() clause, the subsequent select() clause doesn't resolve it. This
>>> is very similar to the example from Flink's documentation here [1]:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#overview--examples,
>>> where a similar procedure works.
>>>
>>> Any idea how I can access columns from the input stream, without having
>>> to mention them in the group_by() clause? I really don't want to group the
>>> results by those fields, but they need to be written to the sink eventually.
>>>
>>> Thanks,
>>> Sumeet
>>>
>>

Re: Accessing columns from input stream table during Window operations

Reply via email to