I want to try using AWS Personalize <https://aws.amazon.com/personalize/>
to get content recommendations.  One of the fields on the input (click)
event is a list of recent impressions.

E.g.
{
  ...
  eventType: 'click',
  eventId: 'click-1',
  itemId: 'item-1'
  impression: ['item-2', 'item-3', 'item-4', 'item-5', ....],
}

Is there a way to produce this output using Flink SQK?

I tried doing a version of this but get the following error:
"Rowtime attributes must not be in the input rows of a regular join. As a
workaround you can cast the time attributes of input tables to TIMESTAMP
before."

Here is a simplified version of the query.


SELECT

    "user".user_id AS userId,

    "view".session_id AS sessionId,  click.click_id AS eventId,

    CAST(click.ts AS BIGINT) AS sentAt,

    insertion.content_id AS itemId,

    impression_content_ids AS impression

FROM "user"

RIGHT JOIN "view"

    ON "user".log_user_id = "view".log_user_id

    AND "user".ts BETWEEN "view".ts - INTERVAL '30' DAY AND "view".ts +
INTERVAL '1' HOUR

JOIN insertion

    ON view.view_id = insertion.view_id

    AND view.ts BETWEEN insertion.ts - INTERVAL '1' HOUR   AND insertion.ts
+ INTERVAL '1' HOUR

JOIN impression  ON insertion.insertion_id = impression.insertion_id

    AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND
impression.ts + INTERVAL '1' HOUR

JOIN (

    SELECT log_user_id, CAST(COLLECT(DISTINCT impression_content_id) AS
ARRAY<STRING>) AS impression_content_ids

    FROM (

        SELECT insertion.log_user_id AS log_user_id,

            ROW_NUMBER() OVER (PARTITION BY insertion.log_user_id ORDER BY
impression.ts DESC) AS row_num,

      insertion.content_id AS impression_content_id

        FROM insertion

        JOIN impression

            ON insertion.insertion_id = impression.insertion_id

            AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND
impression.ts + INTERVAL '1' HOUR

        GROUP BY insertion.log_user_id, impression.ts, insertion.content_id

    ) WHERE row_num <= 25

    GROUP BY log_user_id

) ON insertion.insertion_id = impression.insertion_id

    AND insertion.ts BETWEEN impression.ts - INTERVAL '12' HOUR AND
impression.ts + INTERVAL '1' HOUR  LEFT JOIN click

ON impression.impression_id = click.impression_id

    AND impression.ts BETWEEN click.ts - INTERVAL '12' HOUR AND click.ts +
INTERVAL '12' HOUR"

Reply via email to