[ https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645119#comment-15645119 ]
Adam Szita commented on PIG-5052: --------------------------------- You can try the following: {code} ./pig -x spark_local A = LOAD '../test/org/apache/pig/test/data/passwd' using PigStorage(); dump A dump A {code} The second dump will hang for me. The reason is that jobs 0 and 1 are returned (because of using the same job group id) in JobGraphBuilder#225: {code} sparkContext.statusTracker().getJobIdsForGroup(jobGroupID) {code} ..but JobMetricsListener will only have job 1 here in finishedJobIds: {code} public synchronized boolean waitForJobToEnd(int jobId) throws InterruptedException { if (finishedJobIds.contains(jobId)) { finishedJobIds.remove(jobId); return true; } wait(); return false; } {code} so we will never see job 0 after the second dump, but yet expect to. On top of this I think it's a clearer approach to use different job group IDs for different jobs. > Initialize MRConfiguration.JOB_ID in spark mode correctly > --------------------------------------------------------- > > Key: PIG-5052 > URL: https://issues.apache.org/jira/browse/PIG-5052 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Assignee: Adam Szita > Fix For: spark-branch > > Attachments: PIG-5052.2.patch, PIG-5052.patch > > > currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf. > we just set the value as a random string. > {code} > jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString()); > {code} > We need to find a spark api to initiliaze it correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)