I want to try using AWS Personalize <https://aws.amazon.com/personalize/> to get content recommendations. One of the fields on the input (click) event is a list of recent impressions.
E.g. { ... eventType: 'click', eventId: 'click-1', itemId: 'item-1' impression: ['item-2', 'item-3', 'item-4', 'item-5', ....], } Is there a way to produce this output using Flink SQK? I tried doing a version of this but get the following error: "Rowtime attributes must not be in the input rows of a regular join. As a workaround you can cast the time attributes of input tables to TIMESTAMP before." Here is a simplified version of the query. SELECT "user".user_id AS userId, "view".session_id AS sessionId, click.click_id AS eventId, CAST(click.ts AS BIGINT) AS sentAt, insertion.content_id AS itemId, impression_content_ids AS impression FROM "user" RIGHT JOIN "view" ON "user".log_user_id = "view".log_user_id AND "user".ts BETWEEN "view".ts - INTERVAL '30' DAY AND "view".ts + INTERVAL '1' HOUR JOIN insertion ON view.view_id = insertion.view_id AND view.ts BETWEEN insertion.ts - INTERVAL '1' HOUR AND insertion.ts + INTERVAL '1' HOUR JOIN impression ON insertion.insertion_id = impression.insertion_id AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND impression.ts + INTERVAL '1' HOUR JOIN ( SELECT log_user_id, CAST(COLLECT(DISTINCT impression_content_id) AS ARRAY<STRING>) AS impression_content_ids FROM ( SELECT insertion.log_user_id AS log_user_id, ROW_NUMBER() OVER (PARTITION BY insertion.log_user_id ORDER BY impression.ts DESC) AS row_num, insertion.content_id AS impression_content_id FROM insertion JOIN impression ON insertion.insertion_id = impression.insertion_id AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND impression.ts + INTERVAL '1' HOUR GROUP BY insertion.log_user_id, impression.ts, insertion.content_id ) WHERE row_num <= 25 GROUP BY log_user_id ) ON insertion.insertion_id = impression.insertion_id AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND impression.ts + INTERVAL '1' HOUR LEFT JOIN click ON impression.impression_id = click.impression_id AND impression.ts BETWEEN click.ts - INTERVAL '12' HOUR AND click.ts + INTERVAL '12' HOUR"