Re: count(DISTINCT) in flink SQL

2019-06-11 Thread Fabian Hueske
Hi Vinod, Sorry for the late reply. Your approach looks good to me. A few things to note: * It is not possible to set different idle state retention timers for different parts of a query. All operators that support idle state retention use the same configuration. * The inner query with the SESSIO

Re: count(DISTINCT) in flink SQL

2019-06-04 Thread Vinod Mehra
To be clear I want the outer grouping to have a longer retention time (of the order of week or month - for which we are using 'idle state retention time') and inner grouping to have a shorter retention period (1 hour max). So hoping the session window will do the right thing. Thanks, Vinod On Tue

Re: count(DISTINCT) in flink SQL

2019-06-04 Thread Vinod Mehra
Thanks a lot Fabian for the detailed response. I know all the duplicates are going to arrive within an hour max of the actual event. So using a 1 hour running session window should be fine for me. Is the following the right way to do it in apache-flink-1.4.2? SELECT CONCAT_WS( '-',

Re: count(DISTINCT) in flink SQL

2019-06-03 Thread Fabian Hueske
Hi Vinod, IIRC, support for DISTINCT aggregates was added in Flink 1.6.0 (released August, 9th 2018) [1]. Also note that by default, this query will accumulate more and more state, i.e., for each grouping key it will hold all unique event_ids. You could configure an idle state retention time to cl

Re: count(DISTINCT) in flink SQL

2019-05-29 Thread Vinod Mehra
Another interesting thing is that if I add DISTINCT in the 2nd query it doesn't complain. But because of the inner-select it is a no-op because the inner select is doing the deduping: SELECT CONCAT_WS( '-', CAST(MONTH(row_datetime) AS VARCHAR), CAST(YEAR(row_datetime) AS

Re: count(DISTINCT) in flink SQL

2019-05-29 Thread Vinod Mehra
More details on the error with query#1 that used COUNT(DISTINCT()): org.apache.flink.table.api.TableException: Cannot generate a valid execution plan for the given query: FlinkLogicalCalc(expr#0..8=[{inputs}], expr#9=[_UTF-16LE'-'], expr#10=[CAST($t1):VARCHAR(65536) CHARACTER SET "UTF-16LE" COLLA

count(DISTINCT) in flink SQL

2019-05-29 Thread Vinod Mehra
Hi! We are using apache-flink-1.4.2. It seems this version doesn't support count(DISTINCT). I am trying to find a way to dedup the stream. So I tried: SELECT CONCAT_WS( '-', CAST(MONTH(longToDateTime(rowtime)) AS VARCHAR), CAST(YEAR(longToDateTime(rowtime)) AS VARCHAR),