Hi,
I have a production job that is registering four different dataframes as
tables in pyspark 1.6.2 . when we upgraded to spark 2.0 only three of the
four dataframes are getting registered. the fourth dataframe is not getting
registered. There are no code changes whatsoever. The only change is th
Hi
I have a job which saves a dataframe as parquet file to s3.
The built a jar using your repository https://github.com/rdblue/s3committer.
I added the following config in the to the Spark Session
config("spark.hadoop.spark.sql.parquet.output.committer.class",
"com.netflix.bdp.s3.S3Partitioned
Hi Ryan and Steve,
Thanks very much for your reply.
I was finally able to get Ryan's repo work for me by changing the output
committer to FileOutputFormat instead of ParquetOutputCommitter in spark as
Steve suggested.
However, It is not working for append mode while saving the data frame.
Hi,
as @Venkata krishnan pointed out spark does not allow DFOC when append mode
is enabled.
in the following class in spark, there is a small check
org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol
if (isAppend) {
// If we are appending data to an existing d
Is there anything similar to s3 connector for Google cloud storage?
Since Google cloud Storage is also an object store rather than a file
system, I imagine the same problem that the s3 connector is trying to solve
arises with google cloud storage as well.
Thanks,
rishi
--
View this message in c