Hi Reynold,
Thanks for the tips. I made some changes based on your suggestion, and now
the table scan happens on executors.
https://github.com/traviscrawford/spark-dynamodb/blob/master/src/main/scala/com/github/traviscrawford/spark/dynamodb/DynamoDBRelation.scala
sqlContext.sparkContext
.para
Responses inline
On Wed, Apr 13, 2016 at 7:45 AM, Travis Crawford
wrote:
> Hi Spark gurus,
>
> At Medium we're using Spark for an ETL job that scans DynamoDB tables and
> loads into Redshift. Currently I use a parallel scanner implementation that
> writes files to local disk, then have Spark rea
Hi Spark gurus,
At Medium we're using Spark for an ETL job that scans DynamoDB tables and
loads into Redshift. Currently I use a parallel scanner implementation that
writes files to local disk, then have Spark read them as a DataFrame.
Ideally we could read the DynamoDB table directly as a DataFr