Interesting - I would have thought they would make that available publicly.
Unfortunately, unless you can use Spark on EMR, I guess your options are to
hack it by spinning up an EMR cluster and getting the JAR, or maybe fall
back to using boto and rolling your own :(



On Fri, Jul 4, 2014 at 9:28 AM, Ian Wilkinson <ia...@me.com> wrote:

> Trying to discover source for the DynamoDBInputFormat.
> Not appearing in:
>
> - https://github.com/aws/aws-sdk-java
> - https://github.com/apache/hive
>
> Then came across
> http://stackoverflow.com/questions/17077774/jar-containing-org-apache-hadoop-hive-dynamodb
> .
> Unsure whether this represents the latest situation…
>
> ian
>
>
> On 4 Jul 2014, at 16:58, Nick Pentreath <nick.pentre...@gmail.com> wrote:
>
> I should qualify by saying there is boto support for dynamodb - but not
> for the inputFormat. You could roll your own python-based connection but
> this involves figuring out how to split the data in dynamo - inputFormat
> takes care of this so should be the easier approach
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Fri, Jul 4, 2014 at 8:51 AM, Ian Wilkinson <ia...@me.com> wrote:
>
>> Excellent. Let me get browsing on this.
>>
>> Huge thanks,
>> ian
>>
>>
>>  On 4 Jul 2014, at 16:47, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>> No boto support for that.
>>
>> In master there is Python support for loading Hadoop inputFormat. Not
>> sure if it will be in 1.0.1 or 1.1
>>
>> I master docs under the programming guide are instructions and also under
>> examples project there are pyspark examples of using Cassandra and HBase.
>> These should hopefully give you enough to get started.
>>
>> Depending on how easy it is to use the dynamo DB format, you may have to
>> write a custom converter (see the mentioned examples for storm details).
>>
>> Sent from my iPhone
>>
>> On 4 Jul 2014, at 08:38, Ian Wilkinson <ia...@me.com> wrote:
>>
>> Hi Nick,
>>
>> I’m going to be working with python primarily. Are you aware of
>> comparable boto support?
>>
>> ian
>>
>>  On 4 Jul 2014, at 16:32, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>> You should be able to use DynamoDBInputFormat (I think this should be
>> part of AWS libraries for Java) and create a HadoopRDD from that.
>>
>>
>> On Fri, Jul 4, 2014 at 8:28 AM, Ian Wilkinson <ia...@me.com> wrote:
>>
>>> Hi,
>>>
>>> I noticed mention of DynamoDB as input source in
>>>
>>> http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-amp-camp-2012-advanced-spark.pdf
>>> .
>>>
>>> Unfortunately, Google is not coming to my rescue on finding
>>> further mention for this support.
>>>
>>> Any pointers would be well received.
>>>
>>> Big thanks,
>>> ian
>>>
>>
>>
>>
>>
>
>

Reply via email to