I am testing on Spark 3.0 preview release

Punya Maremalla Tue, 19 Nov 2019 07:28:15 -0800

Hi All,


I download spark-3.0.0-preview-bin-hadoop2.7.tgz and trying to  access new 
api’s like from_csv, map_entries. Don’t know for some reason I am getting these 
error.


For some reason dont see the from_csv function at all.

scala> import org.apache.spark.sql.functions.from_
from_json   from_unixtime   from_utc_timestamp


I am working on finding on the below information. I don’t see any helpful 
documentation on this. Please advise

Currently, all our applications are in Spark 2.2. We are planning to update our 
applications to Spark 3.0.I am doing a feature analysis on Spark 3.0 and 
document suggestions on what can we use from Spark 3.
Following are the features that we heavily use from Spark 2.2
RDD
Dataframe SQL
YARN Scheduler
Streaming
updateStateByKey for Stateful Processing
Zookeeper based Kafka Integration
spark-avro package for Serialization/Deserialization and custom features built 
over it
I would like to understand whether any of the above features are impacted 
significantly / or alternative efficient features available in Spark 3.0
We are planning to migrate from YARN Scheduler to Kubernetes Scheduler. What 
are the new features available there?
Any new significant SQL optimization available there?

Please let know thanks in advance.


Regards,
Ajay

I am testing on Spark 3.0 preview release

Reply via email to