Hi all,
I am saving some hive- query results into the local directory:
val hdfsFilePath = "hdfs://master:ip/ tempFile ";
val localFilePath = "file:///home/hduser/tempFile";
hiveContext.sql(s"""my hql codes here""")
res.printSchema() --working
res.show() --working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath)
--still working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath)
--wrong!
then at last, I get the correct results in hdfsFilePath, but nothing in
localFilePath.
Btw, the localFilePath was created, but the folder was only with a _SUCCESS
file, no part**** file.
See the track: (any thougt?)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at
myApp.scala:112) with 1 output partitions (allowLocal=false)
// the 112 line is the place I am using saveAsTextFile function to save the
results locally.
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage
42(saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage:
List(ShuffleMapStage 41)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List()
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42
(MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no
missing parents
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with
curMem=3889533, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values
in memory (estimated size 156.9 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with
curMem=4050165, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as
bytes in memory (estimated size 54.8 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in
memory on 192.168.70.135:32836 (size: 54.8 KB, free: 266.8 MB)
15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast
at DAGScheduler.scala:874
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from
ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1
tasks
15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in
memory on 192.168.70.129:54062 (size: 54.8 KB, free: 1068.8 MB)
15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1)
15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose
tasks have all completed, from pool
15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile
at MyApp.scala:112) finished in 6.360 s
15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile
at MyApp.scala:112, took 6.588821 s
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/api,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/static,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/executors,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/environment,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/storage,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/pool,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/stage,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/stages,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/job/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/job,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped
o.s.j.s.ServletContextHandler{/jobs,null}
15/11/04 09:57:47 INFO ui.SparkUI: Stopped Spark web UI at
http://192.168.70.135:4040
15/11/04 09:57:47 INFO scheduler.DAGScheduler: Stopping DAGScheduler
15/11/04 09:57:47 INFO cluster.SparkDeploySchedulerBackend: Shutting down all
executors
15/11/04 09:57:47 INFO cluster.SparkDeploySchedulerBackend: Asking each
executor to shut down
15/11/04 09:57:47 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
15/11/04 09:57:47 INFO util.Utils: path =
/home/hduser/sparkTmp/spark-9b7a61ab-73a6-47af-87f6-fce4a5bbddb7/blockmgr-c5b7fdb9-f5ec-46b6-a1f0-d24287778c41,
already present as root for deletion.
15/11/04 09:57:47 INFO storage.MemoryStore: MemoryStore cleared
15/11/04 09:57:47 INFO storage.BlockManager: BlockManager stopped
15/11/04 09:57:47 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
15/11/04 09:57:47 INFO
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
15/11/04 09:57:47 INFO remote.RemoteActorRefProvider$RemotingTerminator:
Shutting down remote daemon.
15/11/04 09:57:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote
daemon shut down; proceeding with flushing remote transports.
15/11/04 09:57:48 INFO spark.SparkContext: Successfully stopped SparkContext
15/11/04 09:57:48 INFO util.Utils: Shutdown hook called
15/11/04 09:57:48 INFO util.Utils: Deleting directory
/tmp/spark-436a46ea-71fa-4b1b-ba39-06ed95a1af06
15/11/04 09:57:48 INFO util.Utils: Deleting directory
/home/hduser/sparkTmp/spark-9b7a61ab-73a6-47af-87f6-fce4a5bbddb7
Best regards,
Jack