[ 
https://issues.apache.org/jira/browse/PIG-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645119#comment-15645119
 ] 

Adam Szita commented on PIG-5052:
---------------------------------

You can try the following:
{code}
./pig -x spark_local

A = LOAD '../test/org/apache/pig/test/data/passwd' using PigStorage();
dump A
dump A
{code}

The second dump will hang for me. The reason is that jobs 0 and 1 are returned 
(because of using the same job group id) in JobGraphBuilder#225:
{code}
sparkContext.statusTracker().getJobIdsForGroup(jobGroupID)
{code}

..but JobMetricsListener will only have job 1 here in finishedJobIds:
{code}
public synchronized boolean waitForJobToEnd(int jobId) throws 
InterruptedException {
        if (finishedJobIds.contains(jobId)) {
            finishedJobIds.remove(jobId);
            return true;
        }

        wait();
        return false;
    }
{code}

so we will never see job 0 after the second dump, but yet expect to.
On top of this I think it's a clearer approach to use different job group IDs 
for different jobs. 

> Initialize MRConfiguration.JOB_ID in spark mode correctly
> ---------------------------------------------------------
>
>                 Key: PIG-5052
>                 URL: https://issues.apache.org/jira/browse/PIG-5052
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5052.2.patch, PIG-5052.patch
>
>
> currently, we initialize MRConfiguration.JOB_ID in SparkUtil#newJobConf.  
> we just set the value as a random string.
> {code}
>         jobConf.set(MRConfiguration.JOB_ID, UUID.randomUUID().toString());
> {code}
> We need to find a spark api to initiliaze it correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to