I am running a spark application in standalone cluster on windows 7
environment.
Following are the details.
spark version = 1.4.0
Windows/Standalone mode
built the Hadoop 2.6.0 on windows and set the env params like so
HADOOP_HOME = E:\hadooptar260\hadoop-2.6.0
HADOOP_CONF_DIR =E:\hadooptar260\hadoop-2.6.0\etc\hadoop // where the
core-site.xml resides
added this to the path E:\hadooptar260\hadoop-2.6.0\bin
Note: I am not starting Hadoop. Wanted to ensure that hadoop libraries are
made available to Spark
especially ensuringe hdsf.jar and haddop-common.jar are in classpath and
winutils in system path
@rem startMaster
spark-class2.cmd org.apache.spark.deploy.master.Master --host
machine1.QQQ.HYD --port 7077
@rem startWorker.This worker runs on the same machine as the master
spark-class2.cmd org.apache.spark.deploy.worker.Worker
spark://machine1.QQQ.HYD:7077
@rem startWorker.This worker runs on a second machine
spark-class2.cmd org.apache.spark.deploy.worker.Worker
spark://machine1.QQQ.HYD:7077
@rem startApp.This command is run from the machine where master and first
worker are running
spark-submit2 --verbose --jars /app/lib/ojdbc7.jar --driver-class-path
/app/lib/ojdbc7.jar --driver-library-path
/programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class "org.ETLProcess"
--name MyETL --master spark://machine1.QQQ.HYD:7077 --deploy-mode client
/app/appjar/myapp-0.1.0.jar ETLProcess 1 51
@rem to avoid the NoSuchmethodException, tried the following
spark-submit2 --verbose --jars
/app/lib/ojdbc7.jar,/app/lib/hadoop-common-2.6.0.jar,/app/lib/hadoop-hdfs-2.6.0.jar
--driver-class-path /app/lib/ojdbc7.jar --driver-library-path
/programfiles/Hadoop/hadooptar260/hadoop-2.6.0/bin --class
"org.dwh.oem.transform.ETLProcess" --name SureETL --master
spark://machine1.QQQ.HYD:7077 --deploy-mode client
/app/appjar/myapp-0.1.0.jar ETLProcess 1 51
The above the ETL job is completing successfully by fetching the data from
db and storing as json files on each of the worker nodes.
*In the first node the files are proprly getting commited and I could see
the removal of _temporary folder and marking it as -SUCCESS*
*The issue is, files in the second node remain in the _temporary folder
making them as not usable for further jobs. Help required to overcome this
this issue*
*
This is line 176 from SparkHadoopUtil.scala where the below excetion is
occurring *
private def getFileSystemThreadStatistics(): Seq[AnyRef] = {
val stats = FileSystem.getAllStatistics()
* stats.map(Utils.invoke(classOf[Statistics], _, "getThreadStatistics"))
*=========================> Line 176
}
Following are the extracts from the log which also contains the below
exceptions:
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
-----------------------------------------------
2015-06-30 15:55:48 DEBUG NativeCodeLoader:46 - Trying to load the
custom-built native-hadoop library...
2015-06-30 15:55:48 DEBUG NativeCodeLoader:50 - Loaded the native-hadoop
library
2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMapping:50 - Using
JniBasedUnixGroupsMapping for Group resolution
2015-06-30 15:55:48 DEBUG JniBasedUnixGroupsMappingWithFallback:44 - Group
mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
2015-06-30 15:55:48 DEBUG Groups:80 - Group mapping
impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
cacheTimeout=300000; warningDeltaMs=5000
2015-06-30 15:55:48 DEBUG UserGroupInformation:193 - hadoop login
2015-06-30 15:55:48 DEBUG UserGroupInformation:142 - hadoop login commit
-----------------------------------------------
2015-06-30 15:55:50 DEBUG Master:56 - [actor] received message
RegisterApplication(ApplicationDescription(SureETL)) from
Actor[akka.tcp://[email protected]:59974/user/$a#-1360185865]
2015-06-30 15:55:50 INFO Master:59 - Registering app SureETL
2015-06-30 15:55:50 INFO Master:59 - Registered app SureETL with ID
app-20150630155550-0001
2015-06-30 15:55:50 INFO Master:59 - Launching executor
app-20150630155550-0001/0 on worker
worker-20150630154548-172.16.11.212-59791
2015-06-30 15:55:50 INFO Master:59 - Launching executor
app-20150630155550-0001/1 on worker
worker-20150630155002-172.16.11.133-61908
2015-06-30 15:55:50 DEBUG Master:62 - [actor] handled message (8.672752 ms)
RegisterApplication(ApplicationDescription(SureETL)) from
Actor[akka.tcp://[email protected]:59974/user/$a#-1360185865]
-----------------------------------------------
2015-06-30 15:56:02 DEBUG Server:228 - rpcKind=RPC_PROTOCOL_BUFFER,
rpcRequestWrapperClass=class
org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper,
rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@604d28c6
2015-06-30 15:56:02 DEBUG Client:63 - getting client out of cache:
org.apache.hadoop.ipc.Client@1511d157
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:56 - [actor] received
message AkkaMessage(ReviveOffers,false) from
Actor[akka://sparkDriver/deadLetters]
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:63 - Received RPC
message: AkkaMessage(ReviveOffers,false)
2015-06-30 15:56:03 DEBUG
AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1:62 - [actor] handled
message (1.73455 ms) AkkaMessage(ReviveOffers,false) from
Actor[akka://sparkDriver/deadLetters]
2015-06-30 15:56:03 DEBUG BlockReaderLocal:105 - Both short-circuit local
reads and UNIX domain socket are disabled.
2015-06-30 15:56:03 DEBUG PairRDDFunctions:63 - Saving as hadoop file of
type (NullWritable, Text)
2015-06-30 15:56:03 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
Hadoop classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
at
org.apache.spark.SparkHadoopWriter.preSetup(SparkHadoopWriter.scala:61)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1093)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1065)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1065)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:989)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:965)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:965)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:897)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:897)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at
org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:896)
at
org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1400)
at
org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
at
org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1379)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1379)
at
org.apache.spark.sql.json.DefaultSource.createRelation(JSONRelation.scala:99)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:305)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135)
at
org.dwh.oem.extract.OrderLookupExtractor$.orderLookupExtractionProcss(OrderingLookupExtractor.scala:61)
at org.dwh.oem.transform.ETLProcess$.main(ETLProcess.scala:33)
at org.dwh.oem.transform.ETLProcess.main(ETLProcess.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2015-06-30 15:56:03 INFO deprecation:1009 - mapred.tip.id is deprecated.
Instead, use mapreduce.task.id
2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.is.map is
deprecated. Instead, use mapreduce.task.ismap
2015-06-30 15:56:03 INFO deprecation:1009 - mapred.task.partition is
deprecated. Instead, use mapreduce.task.partition
2015-06-30 15:56:03 INFO deprecation:1009 - mapred.job.id is deprecated.
Instead, use mapreduce.job.id
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
<function2>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13})
+++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared fields: 4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public static final long
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.serialVersionUID
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.$outer
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final
org.apache.spark.SerializableWritable
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.wrappedConf$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final
org.apache.spark.SparkHadoopWriter
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.writer$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared methods: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final
java.lang.Object
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(java.lang.Object,java.lang.Object)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$anonfun$$$outer()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + inner classes: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer classes: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer objects: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - <function0>
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions@5d14e99e
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + populating accessed fields
because this is the starting closure
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + fields accessed by starting
closure: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class
org.apache.spark.rdd.PairRDDFunctions,Set())
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outermost object is not a
closure, so do not clone it: (class
org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + cloning the object
<function0> of class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + cleaning cloned closure
<function0> recursively
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ Cleaning closure
<function0>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1}) +++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared fields: 3
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public static final long
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.serialVersionUID
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final
org.apache.spark.rdd.PairRDDFunctions
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.$outer
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - private final
org.apache.hadoop.mapred.JobConf
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.conf$4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + declared methods: 4
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final
java.lang.Object
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public final void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public void
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - public
org.apache.spark.rdd.PairRDDFunctions
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.org$apache$spark$rdd$PairRDDFunctions$$anonfun$$$outer()
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + inner classes: 5
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$apply$mcV$sp$2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$6
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$56
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer classes: 1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outer objects: 1
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 -
org.apache.spark.rdd.PairRDDFunctions@5d14e99e
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + fields accessed by starting
closure: 2
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class
org.apache.spark.rdd.PairRDDFunctions,Set())
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - (class
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1,Set($outer))
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - + outermost object is not a
closure, so do not clone it: (class
org.apache.spark.rdd.PairRDDFunctions,org.apache.spark.rdd.PairRDDFunctions@5d14e99e)
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ closure <function0>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1) is
now cleaned +++
2015-06-30 15:56:03 DEBUG ClosureCleaner:63 - +++ closure <function2>
(org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13)
is now cleaned +++
2015-06-30 15:56:03 INFO SparkContext:59 - Starting job: save at
OrderingLookupExtractor.scala:61
-----------------------------------------------------------------------------------------
15-06-30 15:56:11 DEBUG SparkHadoopUtil:84 - Couldn't find method for
retrieving thread-level FileSystem output data
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
at java.lang.Class.getDeclaredMethod(Unknown Source)
at
org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:182)
at
org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:162)
at
org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2015-06-30 15:56:11 DEBUG HadoopRDD:84 - SplitLocationInfo and other new
Hadoop classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:386)
at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:396)
at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:395)
at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala)
at org.apache.spark.SparkHadoopWriter.setup(SparkHadoopWriter.scala:70)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1103)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2015-06-30 15:56:11 DEBUG NativeIO:191 - Initialized cache for IDs to
User/Group mapping with a cache timeout of 14400 seconds.
2015-06-30 15:56:11 INFO JDBCRDD:59 - closed connection
2015-06-30 15:56:11 INFO FileOutputCommitter:439 - Saved output of task
'attempt_201506301556_0000_m_000000_0' to
file:/sparketl/extract/icasdb_cl/oem/lookup51/dw_app_value_list/_temporary/0/task_201506301556_0000_m_000000
2015-06-30 15:56:11 INFO SparkHadoopMapRedUtil:59 -
attempt_201506301556_0000_m_000000_0: Committed
2015-06-30 15:56:11 INFO JDBCRDD:59 - closed connection
2015-06-30 15:56:11 INFO Executor:59 - Finished task 0.0 in stage 0.0 (TID
0). 624 bytes result sent to driver
--------------------------------------------------------------------------------------
2015-06-30 15:57:03 DEBUG SparkHadoopUtil:84 - Couldn't find method for
retrieving thread-level FileSystem output data
java.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()
at java.lang.Class.getDeclaredMethod(Unknown Source)
at org.apache.spark.util.Utils$.invoke(Utils.scala:2069)
at
org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
at
org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:176)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
at scala.collection.Iterator$class.foreach(Iterator.scala:750)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at
org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:176)
at
org.apache.spark.deploy.SparkHadoopUtil.getFSBytesWrittenOnThreadCallback(SparkHadoopUtil.scala:161)
at
org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$initHadoopOutputMetrics(PairRDDFunctions.scala:1129)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1101)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
-----------------------------------------------
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/output-folder-structure-not-getting-commited-and-remains-as-temporary-tp23557.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]