Re: Please suggest helpful tools

2020-01-13 Thread Kurt Young
First could you check whether the added filter conditions are executed before join operators? If they are already pushed down and executed before join, it's should be some real join keys generating data skew. Best, Kurt On Tue, Jan 14, 2020 at 5:09 AM Eva Eva wrote: > Hi Kurt, > > Assuming I'm

Re: Please suggest helpful tools

2020-01-13 Thread Eva Eva
Hi Kurt, Assuming I'm joining two tables, "latestListings" and "latestAgents" like below: "SELECT * FROM latestListings l " + "LEFT JOIN latestAgents aa ON l.listAgentKeyL = aa.ucPKA " + "LEFT JOIN latestAgents ab ON l.buyerAgentKeyL = ab.ucPKA " + "LEFT JOIN latestAgents

Re: Please suggest helpful tools

2020-01-12 Thread Kurt Young
Hi, You can try to filter NULL values with an explicit condition like " is not NULL". Best, Kurt On Sat, Jan 11, 2020 at 4:10 AM Eva Eva wrote: > Thank you both for the suggestions. > I did a bit more analysis using UI and identified at least one > problem that's occurring with the job rn

Re: Please suggest helpful tools

2020-01-10 Thread Eva Eva
Thank you both for the suggestions. I did a bit more analysis using UI and identified at least one problem that's occurring with the job rn. Going to fix it first and then take it from there. *Problem that I identified:* I'm running with 26 parallelism. For the checkpoints that are expiring, one o

Re: Please suggest helpful tools

2020-01-10 Thread Congxian Qiu
Hi For expired checkpoint, you can find something like " Checkpoint xxx of job xx expired before completing" in jobmanager.log, then you can go to the checkpoint UI to find which tasks did not ack, and go to these tasks to see what happened. If checkpoint was been declined, you can find something

Re: Please suggest helpful tools

2020-01-10 Thread Yun Tang
Hi Eva If checkpoint failed, please view the web UI or jobmanager log to see why checkpoint failed, might be declined by some specific task. If checkpoint expired, you can also access the web UI to see which tasks did not respond in time, some hot task might not be able to respond in time. Gen