Hi Vinod,
Sorry for the late reply.
Your approach looks good to me.
A few things to note:
* It is not possible to set different idle state retention timers for
different parts of a query. All operators that support idle state retention
use the same configuration.
* The inner query with the SESSIO
To be clear I want the outer grouping to have a longer retention time (of
the order of week or month - for which we are using 'idle state retention
time') and inner grouping to have a shorter retention period (1 hour max).
So hoping the session window will do the right thing.
Thanks,
Vinod
On Tue
Thanks a lot Fabian for the detailed response. I know all the duplicates
are going to arrive within an hour max of the actual event. So using a 1
hour running session window should be fine for me.
Is the following the right way to do it in apache-flink-1.4.2?
SELECT
CONCAT_WS(
'-',
Hi Vinod,
IIRC, support for DISTINCT aggregates was added in Flink 1.6.0 (released
August, 9th 2018) [1].
Also note that by default, this query will accumulate more and more state,
i.e., for each grouping key it will hold all unique event_ids.
You could configure an idle state retention time to cl
Another interesting thing is that if I add DISTINCT in the 2nd query it
doesn't complain. But because of the inner-select it is a no-op because the
inner select is doing the deduping:
SELECT
CONCAT_WS(
'-',
CAST(MONTH(row_datetime) AS VARCHAR),
CAST(YEAR(row_datetime) AS
More details on the error with query#1 that used COUNT(DISTINCT()):
org.apache.flink.table.api.TableException: Cannot generate a valid
execution plan for the given query:
FlinkLogicalCalc(expr#0..8=[{inputs}], expr#9=[_UTF-16LE'-'],
expr#10=[CAST($t1):VARCHAR(65536) CHARACTER SET "UTF-16LE" COLLA
Hi!
We are using apache-flink-1.4.2. It seems this version doesn't support
count(DISTINCT). I am trying to find a way to dedup the stream. So I tried:
SELECT
CONCAT_WS(
'-',
CAST(MONTH(longToDateTime(rowtime)) AS VARCHAR),
CAST(YEAR(longToDateTime(rowtime)) AS VARCHAR),