RE: Performance Degradation in Spark 3.0.2 compared to Spark 3.0.1

2021-09-05 Thread Sharma, Prakash (Nokia - IN/Bangalore)
Hi, We figured out the issue it was due to higher value of spark.network.timeout in our configuration after reducing this value of this parameter results are inline with spark 3.0.1 . thank-you for the support. Thank-you Prakash From: Mich Talebzadeh Sent: Tuesday, August 31, 2021 1:

Spark Pair RDD write to Hive

2021-09-05 Thread Anil Dasari
Hello, I have a use case where users of group id are persisted to hive table. // pseudo code looks like below usersRDD = sc.parallelize(..) usersPairRDD = usersRDD.map(u => (u.groupId, u)) groupedUsers = usersPairRDD.groupByKey() Can I save groupedUsers RDD into hive tables where table name is k

Re: Appending a static dataframe to a stream create Parquet file fails

2021-09-05 Thread eugen . wintersberger
Hi Jungtaek,   thanks for your reply. I was afraid that the problem is not only on my side but rather of conceptual nature. I guess I have to rethink my approach. However, because you mentioned DeltaLake. I have the same problem, but the other way around, with DeltaLake. I cannot write with a strea

Re: Spark Stream on Kubernetes Cannot Set up JavaSparkContext

2021-09-05 Thread Jacek Laskowski
Hi, No idea still, but noticed "org.apache.spark.streaming.kafka010.KafkaRDDPartition" and "--jars "spark-yarn_2.12-3.1.2.jar,spark-core_2.12-3.1.2.jar,kafka-clients-2.8.0.jar,spark-streaming-kafka-0-10_2.12-3.1.2.jar,spark-token-provider-kafka-0-10_2.12-3.1.2.jar" \" that bothers me quite a lot.