Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Jark Wu
> > > *From: *Jark Wu > *Date: *Friday, April 16, 2021 at 5:10 AM > *To: *Dylan Forciea > *Cc: *Timo Walther , Piotr Nowojski < > pnowoj...@apache.org>, "user@flink.apache.org" > *Subject: *Re: Nondeterministic results with SQL job when para

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Dylan Forciea
Walther , Piotr Nowojski , "user@flink.apache.org" Subject: Re: Nondeterministic results with SQL job when parallelism is > 1 HI Dylan, I think this has the same reason as https://issues.apache.org/jira/browse/FLINK-20374. The root cause is that changelogs are shuffled by `attr` a

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-16 Thread Jark Wu
runcate the “sink” table before I run > the job, > > and this is a test environment where the source databases are > static. > > > > I removed my line for setting to Batch mode per Timo’s > suggestion, and > > am still running with

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
with MAX which should have deterministic output. > > Dylan > > *From: *Piotr Nowojski > *Date: *Wednesday, April 14, 2021 at 9:38 AM > *To: *Dylan Forciea > *Cc: *"user@flink.apache.org" > *Subject: *Re: Nondetermini

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
c output. > > Dylan > > *From: *Piotr Nowojski > *Date: *Wednesday, April 14, 2021 at 9:38 AM > *To: *Dylan Forciea > *Cc: *"user@flink.apache.org" > *Subject: *Re: Nondeterministic results with SQL job when parallelism is &g

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
14, 2021 at 9:38 AM To: Dylan Forciea Cc: "user@flink.apache.org" Subject: Re: Nondeterministic results with SQL job when parallelism is > 1 Hi Dylan, But if you are running your query in Streaming mode, aren't you counting retractions from the FULL JOIN? AFAIK in Streaming

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Piotr Nowojski
rom: *Piotr Nowojski > *Date: *Wednesday, April 14, 2021 at 9:06 AM > *To: *Dylan Forciea > *Cc: *"user@flink.apache.org" > *Subject: *Re: Nondeterministic results with SQL job when parallelism is > > 1 > > > > Hi, > > > > Yes, it looks li

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
consistent) of what comes out consistently when parallelism is set to 1. Dylan From: Dylan Forciea Date: Wednesday, April 14, 2021 at 9:08 AM To: Piotr Nowojski Cc: "user@flink.apache.org" Subject: Re: Nondeterministic results with SQL job when parallelism is > 1 Pitorek, I

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Timo Walther
shouldn’t affect the number of records coming out I wouldn’t think. Dylan *From: *Piotr Nowojski *Date: *Wednesday, April 14, 2021 at 9:06 AM *To: *Dylan Forciea *Cc: *"user@flink.apache.org" *Subject: *Re: Nondeterministic results with SQL job when parallelism is > 1 Hi, Yes, it lo

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
part of the key, that shouldn’t affect the number of records coming out I wouldn’t think. Dylan From: Piotr Nowojski Date: Wednesday, April 14, 2021 at 9:06 AM To: Dylan Forciea Cc: "user@flink.apache.org" Subject: Re: Nondeterministic results with SQL job when parallelism is > 1

Re: Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Piotr Nowojski
Hi, Yes, it looks like your query is non deterministic because of `FIRST_VALUE` used inside `GROUP BY`. If you have many different parallel sources, each time you run your query your first value might be different. If that's the case, you could try to confirm it with even smaller query: SE

Nondeterministic results with SQL job when parallelism is > 1

2021-04-14 Thread Dylan Forciea
I am running Flink 1.12.2, and I was trying to up the parallelism of my Flink SQL job to see what happened. However, once I did that, my results became nondeterministic. This happens whether I set the table.exec.resource.default-parallelism config option or I set the default local parallelism t