First could you check whether the added filter conditions are executed
before join operators? If they are
already pushed down and executed before join, it's should be some real join
keys generating data skew.
Best,
Kurt
On Tue, Jan 14, 2020 at 5:09 AM Eva Eva
wrote:
> Hi Kurt,
>
> Assuming I'm
Hi Kurt,
Assuming I'm joining two tables, "latestListings" and "latestAgents" like
below:
"SELECT * FROM latestListings l " +
"LEFT JOIN latestAgents aa ON l.listAgentKeyL = aa.ucPKA " +
"LEFT JOIN latestAgents ab ON l.buyerAgentKeyL = ab.ucPKA " +
"LEFT JOIN latestAgents
Hi,
You can try to filter NULL values with an explicit condition like " is
not NULL".
Best,
Kurt
On Sat, Jan 11, 2020 at 4:10 AM Eva Eva
wrote:
> Thank you both for the suggestions.
> I did a bit more analysis using UI and identified at least one
> problem that's occurring with the job rn
Thank you both for the suggestions.
I did a bit more analysis using UI and identified at least one
problem that's occurring with the job rn. Going to fix it first and then
take it from there.
*Problem that I identified:*
I'm running with 26 parallelism. For the checkpoints that are expiring, one
o
Hi
For expired checkpoint, you can find something like " Checkpoint xxx of job
xx expired before completing" in jobmanager.log, then you can go to the
checkpoint UI to find which tasks did not ack, and go to these tasks to see
what happened.
If checkpoint was been declined, you can find something
Hi Eva
If checkpoint failed, please view the web UI or jobmanager log to see why
checkpoint failed, might be declined by some specific task.
If checkpoint expired, you can also access the web UI to see which tasks did
not respond in time, some hot task might not be able to respond in time.
Gen