Programmatic: parquet file corruption error

2020-03-26 Thread Zahid Rahman
Hi, When I run the code for a user defined data type dataset using case class in scala and run the code in the interactive spark-shell against parquet file. The results are as expected. However I then the same code programmatically in IntelliJ IDE then spark is give a file corruption error. Step

Need to order iterator values in spark dataframe

2020-03-26 Thread Ranjan, Abhinav
Hi, I have a dataframe which has data like: key                         |    code    |    code_value 1                            |    c1        |    11 1                            |    c2        |    12 1                            |    c2        |    9 1                            |    c3   

Re: Need to order iterator values in spark dataframe

2020-03-26 Thread Enrico Minack
Abhinav, you can repartition by your key, then sortWithinPartition, and the groupByKey. Since data are already hash-partitioned by key, Spark should not shuffle the data hence change the sort wihtin each partition: ds.repartition($"key").sortWithinPartitions($"code").groupBy($"key") Enrico

Re: Need to order iterator values in spark dataframe

2020-03-26 Thread Zahid Rahman
I believe I logged an issue first and I should get a response first. I was ignored. Regards Did you know there are 8 million people in kashmir locked up in their homes by the Hindutwa (Indians) for 8 months. Now the whole planet is locked up in their homes. You didn't take notice of them either.

results of taken(3) not appearing in console window

2020-03-26 Thread Zahid Rahman
I am running the same code with the same libraries but not getting same output. scala> case class flight (DEST_COUNTRY_NAME: String, | ORIGIN_COUNTRY_NAME:String, | count: BigInt) defined class flight scala> val flightDf = spark.read.parquet

Re: results of taken(3) not appearing in console window

2020-03-26 Thread Reynold Xin
bcc dev, +user You need to print out the result. Take itself doesn't print. You only got the results printed to the console because the Scala REPL automatically prints the returned value from take. On Thu, Mar 26, 2020 at 12:15 PM, Zahid Rahman < zahidr1...@gmail.com > wrote: > > I am running

BUG: take with SparkSession.master[url]

2020-03-26 Thread Zahid Rahman
with the following sparksession configuration val spark = SparkSession.builder().master("local[*]").appName("Spark Session take").getOrCreate(); this line works flights.filter(flight_row => flight_row.ORIGIN_COUNTRY_NAME != "Canada").map(flight_row => flight_row).take(5) however if change the

Re: BUG: take with SparkSession.master[url]

2020-03-26 Thread Wenchen Fan
Which Spark/Scala version do you use? On Fri, Mar 27, 2020 at 1:24 PM Zahid Rahman wrote: > > with the following sparksession configuration > > val spark = SparkSession.builder().master("local[*]").appName("Spark Session > take").getOrCreate(); > > this line works > > flights.filter(flight_row

Re: BUG: take with SparkSession.master[url]

2020-03-26 Thread Zahid Rahman
I have configured in IntelliJ as external jars spark-3.0.0-preview2-bin-hadoop2.7/jar not pulling anything from maven. Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org On Fri, 27 Mar 2020 at 05:45, Wenchen Fan wrote: > Which Spark/Scal

Re: BUG: take with SparkSession.master[url]

2020-03-26 Thread Wenchen Fan
Your Spark cluster, spark://192.168.0.38:7077, how is it deployed if you just include Spark dependency in IntelliJ? On Fri, Mar 27, 2020 at 1:54 PM Zahid Rahman wrote: > I have configured in IntelliJ as external jars > spark-3.0.0-preview2-bin-hadoop2.7/jar > > not pulling anything from maven.

Re: BUG: take with SparkSession.master[url]

2020-03-26 Thread Zahid Rahman
sbin/start-master.sh sbin/start-slave.sh spark://192.168.0.38:7077 Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org On Fri, 27 Mar 2020 at 05:59, Wenchen Fan wrote: > Your Spark cluster, spark://192.168.0.38:7077, how is it deployed if y

Re: BUG: take with SparkSession.master[url]

2020-03-26 Thread Zahid Rahman
~/spark-3.0.0-preview2-bin-hadoop2.7$ sbin/start-slave.sh spark:// 192.168.0.38:7077 ~/spark-3.0.0-preview2-bin-hadoop2.7$ sbin/start-master.sh Backbutton.co.uk ¯\_(ツ)_/¯ ♡۶Java♡۶RMI ♡۶ Make Use Method {MUM} makeuse.org On Fri, 27 Mar 2020 at 06:12, Zahid Rahman wr