Hi,
I am invoking the spark-shell (Spark 1.0.0) with:
spark-shell --jars \
libs/aws-java-sdk-1.3.26.jar,\
libs/httpclient-4.1.1.jar,\
libs/httpcore-nio-4.1.jar,\
libs/gson-2.1.jar,\
libs/httpclient-cache-4.1.1.jar,\
libs/httpmime-4.1.1.jar,\
libs/hive-dynamodb-handler-0.11.0.jar,\
libs/httpcore-4
Interesting - I would have thought they would make that available publicly.
Unfortunately, unless you can use Spark on EMR, I guess your options are to
hack it by spinning up an EMR cluster and getting the JAR, or maybe fall
back to using boto and rolling your own :(
On Fri, Jul 4, 2014 at 9:28
Trying to discover source for the DynamoDBInputFormat.
Not appearing in:
- https://github.com/aws/aws-sdk-java
- https://github.com/apache/hive
Then came across
http://stackoverflow.com/questions/1704/jar-containing-org-apache-hadoop-hive-dynamodb.
Unsure whether this represents the latest s
I should qualify by saying there is boto support for dynamodb - but not for the
inputFormat. You could roll your own python-based connection but this involves
figuring out how to split the data in dynamo - inputFormat takes care of this
so should be the easier approach —
Sent from Mailbox
On Fr
Excellent. Let me get browsing on this.
Huge thanks,
ian
On 4 Jul 2014, at 16:47, Nick Pentreath wrote:
> No boto support for that.
>
> In master there is Python support for loading Hadoop inputFormat. Not sure if
> it will be in 1.0.1 or 1.1
>
> I master docs under the programming guide a
No boto support for that.
In master there is Python support for loading Hadoop inputFormat. Not sure if
it will be in 1.0.1 or 1.1
I master docs under the programming guide are instructions and also under
examples project there are pyspark examples of using Cassandra and HBase. These
should h
Hi Nick,
I’m going to be working with python primarily. Are you aware of
comparable boto support?
ian
On 4 Jul 2014, at 16:32, Nick Pentreath wrote:
> You should be able to use DynamoDBInputFormat (I think this should be part of
> AWS libraries for Java) and create a HadoopRDD from that.
>
>
You should be able to use DynamoDBInputFormat (I think this should be part
of AWS libraries for Java) and create a HadoopRDD from that.
On Fri, Jul 4, 2014 at 8:28 AM, Ian Wilkinson wrote:
> Hi,
>
> I noticed mention of DynamoDB as input source in
>
> http://ampcamp.berkeley.edu/wp-content/uplo
Hi,
I noticed mention of DynamoDB as input source in
http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf.
Unfortunately, Google is not coming to my rescue on finding
further mention for this support.
Any pointers would be well received.
Big than