Thank you. Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org <http://www.backbutton.co.uk>
On Thu, 26 Mar 2020 at 19:18, Reynold Xin <r...@databricks.com> wrote: > bcc dev, +user > > You need to print out the result. Take itself doesn't print. You only got > the results printed to the console because the Scala REPL automatically > prints the returned value from take. > > > On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman <zahidr1...@gmail.com> > wrote: > >> I am running the same code with the same libraries but not getting same >> output. >> scala> case class flight (DEST_COUNTRY_NAME: String, >> | ORIGIN_COUNTRY_NAME:String, >> | count: BigInt) >> defined class flight >> >> scala> val flightDf = spark.read.parquet >> ("/data/flight-data/parquet/2010-summary.parquet/") >> flightDf: org.apache.spark.sql.DataFrame = [DEST_COUNTRY_NAME: string, >> ORIGIN_COUNTRY_NAME: string ... 1 more field] >> >> scala> val flights = flightDf.as <http://flightdf.as/>[flight] >> flights: org.apache.spark.sql.Dataset[flight] = [DEST_COUNTRY_NAME: >> string, ORIGIN_COUNTRY_NAME: string ... 1 more field] >> >> scala> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != >> "Canada").map(flight_row => flight_row).take(3) >> >> *res0: Array[flight] = Array(flight(United States,Romania,1), >> flight(United States,Ireland,264), flight(United States,India,69))* >> >> >> <!------------------------------------------------------------------------------------------------------------------------------ >> 20/03/26 19:09:00 INFO SparkContext: Running Spark version 3.0.0-preview2 >> 20/03/26 19:09:00 INFO ResourceUtils: >> ============================================================== >> 20/03/26 19:09:00 INFO ResourceUtils: Resources for spark.driver: >> >> 20/03/26 19:09:00 INFO ResourceUtils: >> ============================================================== >> 20/03/26 19:09:00 INFO SparkContext: Submitted application: chapter2 >> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls to: kub19 >> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls to: kub19 >> 20/03/26 19:09:00 INFO SecurityManager: Changing view acls groups to: >> 20/03/26 19:09:00 INFO SecurityManager: Changing modify acls groups to: >> 20/03/26 19:09:00 INFO SecurityManager: SecurityManager: authentication >> disabled; ui acls disabled; users with view permissions: Set(kub19); >> groups with view permissions: Set(); users with modify permissions: >> Set(kub19); groups with modify permissions: Set() >> 20/03/26 19:09:00 INFO Utils: Successfully started service 'sparkDriver' >> on port 46817. >> 20/03/26 19:09:00 INFO SparkEnv: Registering MapOutputTracker >> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMaster >> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: Using >> org.apache.spark.storage.DefaultTopologyMapper >> <http://org.apache.spark.storage.defaulttopologymapper/> for getting >> topology information >> 20/03/26 19:09:00 INFO BlockManagerMasterEndpoint: >> BlockManagerMasterEndpoint up >> 20/03/26 19:09:00 INFO SparkEnv: Registering BlockManagerMasterHeartbeat >> 20/03/26 19:09:00 INFO DiskBlockManager: Created local directory at >> /tmp/blockmgr-0baf5097-2595-4542-99e2-a192d7baf37c >> 20/03/26 19:09:00 INFO MemoryStore: MemoryStore started with capacity >> 127.2 MiB >> 20/03/26 19:09:01 INFO SparkEnv: Registering OutputCommitCoordinator >> 20/03/26 19:09:01 WARN Utils: Service 'SparkUI' could not bind on port >> 4040. Attempting port 4041. >> 20/03/26 19:09:01 INFO Utils: Successfully started service 'SparkUI' on >> port 4041. >> 20/03/26 19:09:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at >> http://localhost:4041 >> 20/03/26 19:09:01 INFO Executor: Starting executor ID driver on host >> localhost >> 20/03/26 19:09:01 INFO Utils: Successfully started service ' >> org.apache.spark.network.netty.NettyBlockTransferService >> <http://org.apache.spark.network.netty.nettyblocktransferservice/>' on >> port 38135. >> 20/03/26 19:09:01 INFO NettyBlockTransferService: Server created on >> localhost:38135 >> 20/03/26 19:09:01 INFO BlockManager: Using >> org.apache.spark.storage.RandomBlockReplicationPolicy >> <http://org.apache.spark.storage.randomblockreplicationpolicy/> for >> block replication policy >> 20/03/26 19:09:01 INFO BlockManagerMaster: Registering BlockManager >> BlockManagerId(driver, localhost, 38135, None) >> 20/03/26 19:09:01 INFO BlockManagerMasterEndpoint: Registering block >> manager localhost:38135 with 127.2 MiB RAM, BlockManagerId(driver, >> localhost, 38135, None) >> 20/03/26 19:09:01 INFO BlockManagerMaster: Registered BlockManager >> BlockManagerId(driver, localhost, 38135, None) >> 20/03/26 19:09:01 INFO BlockManager: Initialized BlockManager: >> BlockManagerId(driver, localhost, 38135, None) >> 20/03/26 19:09:01 INFO SharedState: Setting hive.metastore.warehouse.dir >> ('null') to the value of spark.sql.warehouse.dir >> ('file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'). >> 20/03/26 19:09:01 INFO SharedState: Warehouse path is >> 'file:/home/kub19/spark-3.0.0-preview2-bin-hadoop2.7/projects/spark-warehouse'. >> 20/03/26 19:09:02 INFO SparkContext: Starting job: parquet at >> chapter2.scala:18 >> 20/03/26 19:09:02 INFO DAGScheduler: Got job 0 (parquet at >> chapter2.scala:18) with 1 output partitions >> 20/03/26 19:09:02 INFO DAGScheduler: Final stage: ResultStage 0 (parquet >> at chapter2.scala:18) >> 20/03/26 19:09:02 INFO DAGScheduler: Parents of final stage: List() >> 20/03/26 19:09:02 INFO DAGScheduler: Missing parents: List() >> 20/03/26 19:09:02 INFO DAGScheduler: Submitting ResultStage 0 >> (MapPartitionsRDD[1] at parquet at chapter2.scala:18), which has no missing >> parents >> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0 stored as values in >> memory (estimated size 72.8 KiB, free 127.1 MiB) >> 20/03/26 19:09:02 INFO MemoryStore: Block broadcast_0_piece0 stored as >> bytes in memory (estimated size 25.9 KiB, free 127.1 MiB) >> 20/03/26 19:09:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in >> memory on localhost:38135 (size: 25.9 KiB, free: 127.2 MiB) >> 20/03/26 19:09:02 INFO SparkContext: Created broadcast 0 from broadcast >> at DAGScheduler.scala:1206 >> 20/03/26 19:09:02 INFO DAGScheduler: Submitting 1 missing tasks from >> ResultStage 0 (MapPartitionsRDD[1] at parquet at chapter2.scala:18) (first >> 15 tasks are for partitions Vector(0)) >> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks >> 20/03/26 19:09:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 >> (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7560 bytes) >> 20/03/26 19:09:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) >> 20/03/26 19:09:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). >> 1840 bytes result sent to driver >> 20/03/26 19:09:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 >> (TID 0) in 204 ms on localhost (executor driver) (1/1) >> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose >> tasks have all completed, from pool >> 20/03/26 19:09:02 INFO DAGScheduler: ResultStage 0 (parquet at >> chapter2.scala:18) finished in 0.304 s >> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 is finished. Cancelling >> potential speculative or zombie tasks for this job >> 20/03/26 19:09:02 INFO TaskSchedulerImpl: Killing all running tasks in >> stage 0: Stage finished >> 20/03/26 19:09:02 INFO DAGScheduler: Job 0 finished: parquet at >> chapter2.scala:18, took 0.332643 s >> 20/03/26 19:09:03 INFO BlockManagerInfo: Removed broadcast_0_piece0 on >> localhost:38135 in memory (size: 25.9 KiB, free: 127.2 MiB) >> 20/03/26 19:09:04 INFO V2ScanRelationPushDown: >> Pushing operators to parquet >> file:/data/flight-data/parquet/2010-summary.parquet >> Pushed Filters: >> Post-Scan Filters: >> Output: DEST_COUNTRY_NAME#0, ORIGIN_COUNTRY_NAME#1, count#2L >> >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1 stored as values in >> memory (estimated size 290.0 KiB, free 126.9 MiB) >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_1_piece0 stored as >> bytes in memory (estimated size 24.3 KiB, free 126.9 MiB) >> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in >> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB) >> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 1 from take at >> chapter2.scala:20 >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2 stored as values in >> memory (estimated size 290.1 KiB, free 126.6 MiB) >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_2_piece0 stored as >> bytes in memory (estimated size 24.3 KiB, free 126.6 MiB) >> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_2_piece0 in >> memory on localhost:38135 (size: 24.3 KiB, free: 127.2 MiB) >> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 2 from take at >> chapter2.scala:20 >> 20/03/26 19:09:04 INFO CodeGenerator: Code generated in 159.155401 ms >> 20/03/26 19:09:04 INFO SparkContext: Starting job: take at >> chapter2.scala:20 >> 20/03/26 19:09:04 INFO DAGScheduler: Got job 1 (take at >> chapter2.scala:20) with 1 output partitions >> 20/03/26 19:09:04 INFO DAGScheduler: Final stage: ResultStage 1 (take at >> chapter2.scala:20) >> 20/03/26 19:09:04 INFO DAGScheduler: Parents of final stage: List() >> 20/03/26 19:09:04 INFO DAGScheduler: Missing parents: List() >> 20/03/26 19:09:04 INFO DAGScheduler: Submitting ResultStage 1 >> (MapPartitionsRDD[5] at take at chapter2.scala:20), which has no missing >> parents >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3 stored as values in >> memory (estimated size 22.7 KiB, free 126.6 MiB) >> 20/03/26 19:09:04 INFO MemoryStore: Block broadcast_3_piece0 stored as >> bytes in memory (estimated size 8.1 KiB, free 126.6 MiB) >> 20/03/26 19:09:04 INFO BlockManagerInfo: Added broadcast_3_piece0 in >> memory on localhost:38135 (size: 8.1 KiB, free: 127.1 MiB) >> 20/03/26 19:09:04 INFO SparkContext: Created broadcast 3 from broadcast >> at DAGScheduler.scala:1206 >> 20/03/26 19:09:04 INFO DAGScheduler: Submitting 1 missing tasks from >> ResultStage 1 (MapPartitionsRDD[5] at take at chapter2.scala:20) (first 15 >> tasks are for partitions Vector(0)) >> 20/03/26 19:09:04 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks >> 20/03/26 19:09:04 INFO TaskSetManager: Starting task 0.0 in stage 1.0 >> (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7980 bytes) >> 20/03/26 19:09:04 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) >> 20/03/26 19:09:05 INFO FilePartitionReader: Reading file path: >> file:///data/flight-data/parquet/2010-summary.parquet/part-r-00000-1a9822ba-b8fb-4d8e-844a-ea30d0801b9e.gz.parquet, >> range: 0-3921, partition values: [empty row] >> 20/03/26 19:09:05 INFO ZlibFactory: Successfully loaded & initialized >> native-zlib library >> 20/03/26 19:09:05 INFO CodecPool: Got brand-new decompressor [.gz] >> 20/03/26 19:09:05 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). >> 1762 bytes result sent to driver >> 20/03/26 19:09:05 INFO TaskSetManager: Finished task 0.0 in stage 1.0 >> (TID 1) in 219 ms on localhost (executor driver) (1/1) >> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose >> tasks have all completed, from pool >> 20/03/26 19:09:05 INFO DAGScheduler: ResultStage 1 (take at >> chapter2.scala:20) finished in 0.235 s >> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 is finished. Cancelling >> potential speculative or zombie tasks for this job >> 20/03/26 19:09:05 INFO TaskSchedulerImpl: Killing all running tasks in >> stage 1: Stage finished >> 20/03/26 19:09:05 INFO DAGScheduler: Job 1 finished: take at >> chapter2.scala:20, took 0.238010 s >> 20/03/26 19:09:05 INFO CodeGenerator: Code generated in 17.77886 ms >> 20/03/26 19:09:05 INFO SparkUI: Stopped Spark web UI at >> http://localhost:4041 >> 20/03/26 19:09:05 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> 20/03/26 19:09:05 INFO MemoryStore: MemoryStore cleared >> 20/03/26 19:09:05 INFO BlockManager: BlockManager stopped >> 20/03/26 19:09:05 INFO BlockManagerMaster: BlockManagerMaster stopped >> 20/03/26 19:09:05 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >> OutputCommitCoordinator stopped! >> 20/03/26 19:09:05 INFO SparkContext: Successfully stopped SparkContext >> 20/03/26 19:09:05 INFO ShutdownHookManager: Shutdown hook called >> 20/03/26 19:09:05 INFO ShutdownHookManager: Deleting directory >> /tmp/spark-6d99677e-ae1b-4894-aa32-3a79fb0b4307 >> >> Process finished with exit code 0 >> >> <!---------------------------------------------------------------------------------------------------------- >> >> import org.apache.spark.sql.SparkSession >> >> object chapter2 { >> >> // define specific data type class then manipulate it using the filter >> and map functions >> // this is also known as an Encoder >> case class flight (DEST_COUNTRY_NAME: String, >> ORIGIN_COUNTRY_NAME:String, >> count: BigInt) >> >> def main(args: Array[String]): Unit = { >> >> // using an inter active shell, spark session needed here to avoid >> Intellij errors >> val spark = SparkSession.builder.master("local[*]").appName(" >> chapter2").getOrCreate >> >> // looks like a hard coded system work around >> import spark.implicits._ >> val flightDf = >> spark.read.parquet("/data/flight-data/parquet/2010-summary.parquet/") >> val flights = flightDf.as <http://flightdf.as/>[flight] >> flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != >> "Canada").map(flight_row => flight_row).take(3) >> >> spark.stop() >> } >> } >> >> >> >> >> >> >> >> Backbutton.co.uk >> ¯\_(ツ)_/¯ >> ♡۶Java♡۶RMI ♡۶ >> Make Use Method {MUM} >> makeuse.org >> > >