Re: DynamoDB data source questions

2016-04-14 Thread Travis Crawford
Hi Reynold, Thanks for the tips. I made some changes based on your suggestion, and now the table scan happens on executors. https://github.com/traviscrawford/spark-dynamodb/blob/master/src/main/scala/com/github/traviscrawford/spark/dynamodb/DynamoDBRelation.scala sqlContext.sparkContext .para

Re: DynamoDB data source questions

2016-04-13 Thread Reynold Xin
Responses inline On Wed, Apr 13, 2016 at 7:45 AM, Travis Crawford wrote: > Hi Spark gurus, > > At Medium we're using Spark for an ETL job that scans DynamoDB tables and > loads into Redshift. Currently I use a parallel scanner implementation that > writes files to local disk, then have Spark rea

DynamoDB data source questions

2016-04-13 Thread Travis Crawford
Hi Spark gurus, At Medium we're using Spark for an ETL job that scans DynamoDB tables and loads into Redshift. Currently I use a parallel scanner implementation that writes files to local disk, then have Spark read them as a DataFrame. Ideally we could read the DynamoDB table directly as a DataFr