[
https://issues.apache.org/jira/browse/PIG-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274385#comment-16274385
]
Nandor Kollar commented on PIG-5318:
------------------------------------
Attached PIG-5318_2.patch, I addressed Rohini's comments there.
As of {{TestStoreInstances}} failure, it looks like Spark (unlike Tez and
MapReduce) creates multiple instances from {{PigOutputFormat}} while setting up
the output committers:
[setupCommitter|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L74]
is called from both
[setupJob|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L138]
and from
[setupTask|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L165],
and {{setupCommitter}} creates a new {{PigOutputFormat}} each time, saving in
a private variable. In addition, when Spark writes to files, a new
{{PigOutputFormat}} is [getting
created|https://github.com/apache/spark/blob/branch-2.2/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L75]
too, and since POStores are saved and deserialized in configuration, but
StoreFuncInterface inside stores are
[transient|https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POStore.java#L53],
a new instance of {{STFuncCheckInstances}} is getting created, each time, thus
{{putNext}} and {{commitTask}} will use different array instances. Not sure if
it is a bug in Pig, or in Spark, should Spark consistently use the same
OutputFormat instance in this case?
Making {{reduceStores}}, {{mapStores}}, {{currentConf}} static inside
{{TestStoreInstances}} would solve the problem, [~rohini], [~kellyzly] what do
you think about this solution?
> Unit test failures on Pig on Spark with Spark 2.2
> -------------------------------------------------
>
> Key: PIG-5318
> URL: https://issues.apache.org/jira/browse/PIG-5318
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Attachments: PIG-5318_1.patch, PIG-5318_2.patch
>
>
> There are several failing cases when executing the unit tests with Spark 2.2:
> {code}
> org.apache.pig.test.TestAssert#testNegativeWithoutFetch
> org.apache.pig.test.TestAssert#testNegative
> org.apache.pig.test.TestEvalPipeline2#testNonStandardDataWithoutFetch
> org.apache.pig.test.TestScalarAliases#testScalarErrMultipleRowsInInput
> org.apache.pig.test.TestStore#testCleanupOnFailureMultiStore
> org.apache.pig.test.TestStoreInstances#testBackendStoreCommunication
> org.apache.pig.test.TestStoreLocal#testCleanupOnFailureMultiStore
> {code}
> All of these are related to fixes/changes in Spark.
> TestAssert, TestScalarAliases and TestEvalPipeline2 failures could be fixed
> by asserting on the message of the exception's root cause, looks like on
> Spark 2.2 the exception is wrapped into an additional layer.
> TestStore and TestStoreLocal failure are also a test related problems: looks
> like SPARK-7953 is fixed in Spark 2.2
> The root cause of TestStoreInstances is yet to be found out.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)