Re: Output Committers for S3

2017-06-20 Thread Steve Loughran
> On 20 Jun 2017, at 07:49, sririshindra wrote: > > Is there anything similar to s3 connector for Google cloud storage? > Since Google cloud Storage is also an object store rather than a file > system, I imagine the same problem that the s3 connector is trying to solve > arises with google cloud

Re: Output Committers for S3

2017-06-20 Thread Steve Loughran
On 19 Jun 2017, at 16:55, Ryan Blue mailto:rb...@netflix.com.INVALID>> wrote: I agree, the problem is that Spark is trying to be safe and avoid the direct committer. We also modify Spark to avoid its logic. We added a property that causes Spark to always use the output committer if the destina

Re: Output Committers for S3

2017-06-19 Thread sririshindra
context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21803.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr

Re: Output Committers for S3

2017-06-19 Thread Ryan Blue
hrc file. > > Please add anything that I might have missed. > > Also please look at ryan's talk at spark summit a few days ago > ( Imporoving Apache spark with s3 by ryan blue > <https://www.youtube.com/watch?v=BgHrff5yAQo> ) > > > > > > >

Re: Output Committers for S3

2017-06-17 Thread sririshindra
;s talk at spark summit a few days ago ( Imporoving Apache spark with s3 by ryan blue <https://www.youtube.com/watch?v=BgHrff5yAQo> ) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21

Re: Output Committers for S3

2017-06-17 Thread Venkatakrishnan Sowrirajan
make spark call the PartitionedOutputCommiter even when the file already > exists in s3? > > > > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Output-Committers- > for-S3-tp21033p21776.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >

Re: Output Committers for S3

2017-06-16 Thread sririshindra
iew this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21776.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Output Committers for S3

2017-03-28 Thread Ryan Blue
it$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) > >at > > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) > >at > > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) > >at org.apac

Re: Output Committers for S3

2017-03-28 Thread Steve Loughran
ation.getClass(Configuration.java:2221) >... 28 more > > can you please point out my mistake. > > If possible can you give a working example of saving a dataframe as a > parquet file in s3. > > > > > >

Re: Output Committers for S3

2017-03-27 Thread sririshindra
on.getClass(Configuration.java:2221) ... 28 more can you please point out my mistake. If possible can you give a working example of saving a dataframe as a parquet file in s3. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Commit

Re: Output Committers for S3

2017-02-22 Thread Matthew Schauer
afety. Even if there were, I couldn't use it, because I'm stuck on Spark 1.5 and there doesn't seem to be a way to force the use of a given output committer. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p2

Re: Output Committers for S3

2017-02-21 Thread Ryan Blue
d. I will be > interested in following and possibly contributing to S3Guard in the future. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p2104

Re: Output Committers for S3

2017-02-21 Thread Ryan Blue
On Tue, Feb 21, 2017 at 6:15 AM, Steve Loughran wrote: > On 21 Feb 2017, at 01:00, Ryan Blue wrote: > > You'd have to encode the task ID in the output file name to identify files > > to roll back in the event you need to revert a task, but if you have > > partitioned output, you have to do a lo

Re: Output Committers for S3

2017-02-21 Thread Matthew Schauer
ead. I will be interested in following and possibly contributing to S3Guard in the future. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033p21041.html Sent from the Apache Spark Developers List mailing list archiv

Re: Output Committers for S3

2017-02-21 Thread Steve Loughran
On 21 Feb 2017, at 14:15, Steve Loughran mailto:ste...@hortonworks.com>> wrote: What your patch has made me realise is that I could also do a delayed-commit copy by reading in a file, doing a multipart put to its final destination, and again, postponing the final commit. this is something whic

Re: Output Committers for S3

2017-02-21 Thread Steve Loughran
any feedback on my proposed "safe" append strategy, and 2) is there any way to circumvent the restriction on append committers without editing and recompiling Spark? Discussion of solutions in Spark 2.1 is also welcome. -- View this message in context: http://apache-spark-devel

Re: Output Committers for S3

2017-02-21 Thread Steve Loughran
ttp://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com<http://Nabble.com>. - To unsubscribe

Re: Output Committers for S3

2017-02-20 Thread Ryan Blue
1 is also welcome. > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Output-Committers- > for-S3-tp21033.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Ryan Blue Software Engineer Netflix

Output Committers for S3

2017-02-20 Thread Matthew Schauer
f solutions in Spark 2.1 is also welcome. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Output-Committers-for-S3-tp21033.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. ---