Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
Sorry actually my last message is not true for anti join, I was thinking of semi join. -TJ On Sun, Jun 3, 2018 at 14:57 Tayler Lawrence Jones wrote: > A left join with null filter is only the same as a left anti join if the > join keys can be guaranteed unique in the existing data. Sinc

Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
On Mon, 4 Jun 2018 at 6:42 am, Tayler Lawrence Jones < > t.jonesd...@gmail.com> wrote: > >> The issue is not the append vs overwrite - perhaps those responders do >> not know Anti join semantics. Further, Overwrite on s3 is a bad pattern due >> to s3 eventual consiste

Re: Append In-Place to S3

2018-06-03 Thread Tayler Lawrence Jones
The issue is not the append vs overwrite - perhaps those responders do not know Anti join semantics. Further, Overwrite on s3 is a bad pattern due to s3 eventual consistency issues. First, your sql query is wrong as you don’t close the parenthesis of the CTE (“with” part). In fact, it looks like y

Re: Writing files to s3 with out temporary directory

2017-11-20 Thread Tayler Lawrence Jones
It is an open issue with Hadoop file committer, not spark. The simple workaround is to write to hdfs then copy to s3. Netflix did a talk about their custom output committer at the last spark summit which is a clever efficient way of doing that - I’d check it out on YouTube. They have open sourced t