Hi all, I am saving some hive- query results into the local directory:
val hdfsFilePath = "hdfs://master:ip/ tempFile "; val localFilePath = "file:///home/hduser/tempFile"; hiveContext.sql(s"""my hql codes here""") res.printSchema() --working res.show() --working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath) --still working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath) --wrong! then at last, I get the correct results in hdfsFilePath, but nothing in localFilePath. Btw, the localFilePath was created, but the folder was only with a _SUCCESS file, no part**** file. See the track: (any thougt?) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at myApp.scala:112) with 1 output partitions (allowLocal=false) // the 112 line is the place I am using saveAsTextFile function to save the results locally. 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 42(saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 41) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List() 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no missing parents 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with curMem=3889533, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values in memory (estimated size 156.9 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with curMem=4050165, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 54.8 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.135:32836 (size: 54.8 KB, free: 266.8 MB) 15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast at DAGScheduler.scala:874 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1 tasks 15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.129:54062 (size: 54.8 KB, free: 1068.8 MB) 15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1) 15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose tasks have all completed, from pool 15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile at MyApp.scala:112) finished in 6.360 s 15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile at MyApp.scala:112, took 6.588821 s 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 15/11/04 09:57:47 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.70.135:4040 15/11/04 09:57:47 INFO scheduler.DAGScheduler: Stopping DAGScheduler 15/11/04 09:57:47 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors 15/11/04 09:57:47 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down 15/11/04 09:57:47 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 15/11/04 09:57:47 INFO util.Utils: path = /home/hduser/sparkTmp/spark-9b7a61ab-73a6-47af-87f6-fce4a5bbddb7/blockmgr-c5b7fdb9-f5ec-46b6-a1f0-d24287778c41, already present as root for deletion. 15/11/04 09:57:47 INFO storage.MemoryStore: MemoryStore cleared 15/11/04 09:57:47 INFO storage.BlockManager: BlockManager stopped 15/11/04 09:57:47 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 15/11/04 09:57:47 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 15/11/04 09:57:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 15/11/04 09:57:47 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 15/11/04 09:57:48 INFO spark.SparkContext: Successfully stopped SparkContext 15/11/04 09:57:48 INFO util.Utils: Shutdown hook called 15/11/04 09:57:48 INFO util.Utils: Deleting directory /tmp/spark-436a46ea-71fa-4b1b-ba39-06ed95a1af06 15/11/04 09:57:48 INFO util.Utils: Deleting directory /home/hduser/sparkTmp/spark-9b7a61ab-73a6-47af-87f6-fce4a5bbddb7 Best regards, Jack