Re: community feedback on RedShift with Spark

2017-04-24 Thread Aakash Basu
Hey afshin, Your point 1 is innumerably faster than the latter. It further shoots up the speed if you know how to properly use distKey and sortKey on the tables being loaded. Thanks, Aakash. https://www.linkedin.com/in/aakash-basu-5278b363 On 24-Apr-2017 10:37 PM, "Afshin, Bardia" wrote: I w

Re: community feedback on RedShift with Spark

2017-04-24 Thread Matt Deaver
Redshift COPY is immensely faster than trying to do insert statements. I did some rough testing of inserting data using INSERT and COPY and COPY is vastly superior to the point that if speed is at all an issue to your process you shouldn't even consider using INSERT. On Mon, Apr 24, 2017 at 11:07

community feedback on RedShift with Spark

2017-04-24 Thread Afshin, Bardia
I wanted to reach out to the community to get a understanding of what everyones experience is in regardst to maximizing performance as in decreasing load time on loading multiple large datasets to RedShift. Two approaches: 1. Spark writes file to S3, RedShift COPY INTO from S3 bucket. 2.