Hello folks, I would love to hear back your feedback on this.
Regards, Satyam On Wed, Mar 10, 2021 at 6:53 PM Satyam Shekhar <satyamshek...@gmail.com> wrote: > Hello folks, > > I am looking to enrich rows from an unbounded streaming table by > joining it with a bounded static table while preserving rowtime for the > streaming table. For example, let's consider table two tables F and D, > where F is unbounded and D is bounded. The schema for the two tables is the > following - > > F: > |-- C0: BIGINT > |-- C1: STRING > |-- R: TIMESTAMP(3) **rowtime** > |-- WATERMARK FOR R: TIMESTAMP(3) AS `R` - INTERVAL '0' SECONDS > > D: > |-- C0: BIGINT > |-- C1: STRING NOT NULL > > I'd like to run the following query on this schema - > > select sum(F.C0), D.C1, tumble_start(F.R, interval '1' second) > from F join D ON F.C1 = D.C1 > group by D.C1, tumble(F.R, interval '1' second) > > However, I run into the following error while running the above query - > > "Rowtime attributes must not be in the input rows of a regular join. As a > workaround you can cast the time attributes of input tables to TIMESTAMP > before." > > My understanding reading the docs is that Time Temporal Join is meant to > solve this problem. So I model table D as the following - > > D: > |-- C0: BIGINT > |-- C1: STRING NOT NULL > |-- R: TIMESTAMP(3) > |-- WATERMARK FOR R: TIMESTAMP(3) AS `R` - INTERVAL '0' SECONDS > |-- CONSTRAINT 2da2dd2e-9937-48cb-9dec-4f6055713004 PRIMARY KEY (C1) > > With column D.R always set to 0 and modify the query as follows - > > select sum(F.C0), D.C1, tumble_start(F.R, interval '1' second) > from F join D FOR SYSTEM_TIME AS OF F.R ON F.C1 = D.C1 > group by D.C1, tumble(F.R, interval '1' second) > > The above query runs but does not return any result. I have the following > data in D initially - > Emit D row=+I(0,"0",1970-01-01T00:00)@time=0 > Emit D row=+I(1,"1",1970-01-01T00:00)@time=0 > Emit D row=+I(2,"2",1970-01-01T00:00)@time=0 > Emit D watermark=0 > > And F streams the following rows - > Emit F row=+I(0,"0",1970-01-01T00:00)@time=0 > Emit F row=+I(1,"1",1970-01-01T00:00:10)@time=1000 > Emit F watermark=1000 > > I expect that two rows in F will join with matching rows (on C1) in D and > produce some output. But I do not see anything in the output. > > So I have the following questions - > > 1. Is time temporal join the correct tool to solve this problem? > 2. What could be the reason for not getting any output rows in the result? > > Thanks, > Satyam > >