Re: DynamoDB input source

2014-07-21 Thread Ian Wilkinson
Hi, I am invoking the spark-shell (Spark 1.0.0) with: spark-shell --jars \ libs/aws-java-sdk-1.3.26.jar,\ libs/httpclient-4.1.1.jar,\ libs/httpcore-nio-4.1.jar,\ libs/gson-2.1.jar,\ libs/httpclient-cache-4.1.1.jar,\ libs/httpmime-4.1.1.jar,\ libs/hive-dynamodb-handler-0.11.0.jar,\ libs/httpcore-4

Re: DynamoDB input source

2014-07-04 Thread Nick Pentreath
Interesting - I would have thought they would make that available publicly. Unfortunately, unless you can use Spark on EMR, I guess your options are to hack it by spinning up an EMR cluster and getting the JAR, or maybe fall back to using boto and rolling your own :( On Fri, Jul 4, 2014 at 9:28

Re: DynamoDB input source

2014-07-04 Thread Ian Wilkinson
Trying to discover source for the DynamoDBInputFormat. Not appearing in: - https://github.com/aws/aws-sdk-java - https://github.com/apache/hive Then came across http://stackoverflow.com/questions/1704/jar-containing-org-apache-hadoop-hive-dynamodb. Unsure whether this represents the latest s

Re: DynamoDB input source

2014-07-04 Thread Nick Pentreath
I should qualify by saying there is boto support for dynamodb - but not for the inputFormat. You could roll your own python-based connection but this involves figuring out how to split the data in dynamo - inputFormat takes care of this so should be the easier approach — Sent from Mailbox On Fr

Re: DynamoDB input source

2014-07-04 Thread Ian Wilkinson
Excellent. Let me get browsing on this. Huge thanks, ian On 4 Jul 2014, at 16:47, Nick Pentreath wrote: > No boto support for that. > > In master there is Python support for loading Hadoop inputFormat. Not sure if > it will be in 1.0.1 or 1.1 > > I master docs under the programming guide a

Re: DynamoDB input source

2014-07-04 Thread Nick Pentreath
No boto support for that. In master there is Python support for loading Hadoop inputFormat. Not sure if it will be in 1.0.1 or 1.1 I master docs under the programming guide are instructions and also under examples project there are pyspark examples of using Cassandra and HBase. These should h

Re: DynamoDB input source

2014-07-04 Thread Ian Wilkinson
Hi Nick, I’m going to be working with python primarily. Are you aware of comparable boto support? ian On 4 Jul 2014, at 16:32, Nick Pentreath wrote: > You should be able to use DynamoDBInputFormat (I think this should be part of > AWS libraries for Java) and create a HadoopRDD from that. > >

Re: DynamoDB input source

2014-07-04 Thread Nick Pentreath
You should be able to use DynamoDBInputFormat (I think this should be part of AWS libraries for Java) and create a HadoopRDD from that. On Fri, Jul 4, 2014 at 8:28 AM, Ian Wilkinson wrote: > Hi, > > I noticed mention of DynamoDB as input source in > > http://ampcamp.berkeley.edu/wp-content/uplo

DynamoDB input source

2014-07-04 Thread Ian Wilkinson
Hi, I noticed mention of DynamoDB as input source in http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf. Unfortunately, Google is not coming to my rescue on finding further mention for this support. Any pointers would be well received. Big than