Re: Need help with window TopN query

2021-11-04 Thread JING ZHANG
Sorry for late response, Martijn and Francesco have already give good advises to find out the problem. I only have one minor supplementary information, window rank/join/aggregate would emit results after the end of the window. Now the window size is 24 hour, is there any possible the first window i

Re: Need help with window TopN query

2021-11-04 Thread Francesco Guardiani
As a rule of thumb, I would first try to check that Flink ingests correctly your csv. Perhaps try to run just a select on your input and see if the input is parsed as expected and is ordered. On Thu, Nov 4, 2021 at 12:47 PM Martijn Visser wrote: > Hi Pavel, > > There's a Flink SQL recipe in the

Re: Need help with window TopN query

2021-11-04 Thread Martijn Visser
Hi Pavel, There's a Flink SQL recipe in the Flink SQL Cookbook for a Window TopN, see https://github.com/ververica/flink-sql-cookbook/blob/main/aggregations-and-analytics/11_window_top_n/11_window_top_n.md. I think that could help you with your use case too. Best regards, Martijn On Thu, 4 Nov

Re: Need help with window TopN query

2021-11-04 Thread Pavel Penkov
When the query changed to SELECT user_id, ts, rownum FROM ( SELECT user_id, ts, ROW_NUMBER() OVER (PARTITION BY window_start, window_end, user_id ORDER BY ts ASC) as rownum FROM TABLE( TUMBLE(TABLE visits, DESCRIPTOR(ts), INTERVAL '24' HOURS)) ) WHERE rownum = 1 runs but doesn't produce a

Re: Need help with window TopN query

2021-11-04 Thread Francesco Guardiani
I think the issue here is that the nested select is selecting all the fields produced by the TVF, including window_time (which is implicitly added by the TVF as described here ). Because of

Need help with window TopN query

2021-11-04 Thread Pavel Penkov
I'm trying to express a supposedly simple query with Flink SQL - log the first visit a day for each user. Source table is defined like CREATE TABLE visits (user_id int, ts timestamp(3), WATERMARK FOR ts AS ts) WITH ('connector' = 'filesystem', 'path' = 'file:///visits.csv', 'format' = 'csv') The