sqlContext.registerDataFrameAsTable is not working properly in pyspark 2.0

2016-09-14 Thread sririshindra
Hi, I have a production job that is registering four different dataframes as tables in pyspark 1.6.2 . when we upgraded to spark 2.0 only three of the four dataframes are getting registered. the fourth dataframe is not getting registered. There are no code changes whatsoever. The only change is th

Re: Output Committers for S3

2017-03-27 Thread sririshindra
Hi I have a job which saves a dataframe as parquet file to s3. The built a jar using your repository https://github.com/rdblue/s3committer. I added the following config in the to the Spark Session config("spark.hadoop.spark.sql.parquet.output.committer.class", "com.netflix.bdp.s3.S3Partitioned

Re: Output Committers for S3

2017-06-16 Thread sririshindra
Hi Ryan and Steve, Thanks very much for your reply. I was finally able to get Ryan's repo work for me by changing the output committer to FileOutputFormat instead of ParquetOutputCommitter in spark as Steve suggested. However, It is not working for append mode while saving the data frame.

Re: Output Committers for S3

2017-06-17 Thread sririshindra
Hi, as @Venkata krishnan pointed out spark does not allow DFOC when append mode is enabled. in the following class in spark, there is a small check org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol if (isAppend) { // If we are appending data to an existing d

Re: Output Committers for S3

2017-06-19 Thread sririshindra
Is there anything similar to s3 connector for Google cloud storage? Since Google cloud Storage is also an object store rather than a file system, I imagine the same problem that the s3 connector is trying to solve arises with google cloud storage as well. Thanks, rishi -- View this message in c