I know it isn't exactly what you are asking for, but you could solve it
like this:

Driver program queries dynamo for the s3 file keys.
sc.textFile each of the file keys and .union them all together to make your
RDD.

You could wrap that up in a function and it wouldn't be too painful to
reuse. I don't personally know about creating custom RDDs in Java.

On Mon, May 25, 2015 at 10:37 PM, Swaranga Sarma <[email protected]>
wrote:

> My data is in S3 and is indexed in Dynamo. For example, If I want to load
> data given a time range, I will first need to query Dynamo for the S3 file
> keys for the corresponding time range and then load them in Spark. The
> files may not always be in the same S3 path prefix, hence 
> sc.testFile("s3://directory_path/") won't
> work. I am looking for pointers on how to implement something analogous to
> HadoopRDD or JdbcRDD but in Java.
>
> I am looking to do something similar to what they have done here:
> https://github.com/lagerspetz/TimeSeriesSpark/blob/master/src/spark/timeseries/dynamodb/DynamoDbRDD.scala.
> This one reads data from Dynamo, my custom RDD would query DynamoDB for the
> S3 file keys, and then load them from S3.
>
> On Mon, May 25, 2015 at 8:19 PM, Alex Robbins <
> [email protected]> wrote:
>
>> If a Hadoop InputFormat already exists for your data source, you can load
>> it from there. Otherwise, maybe you can dump your data source out as text
>> and load it from there. Without more detail on what your data source is,
>> it'll be hard for anyone to help.
>>
>> On Mon, May 25, 2015 at 5:00 PM, swaranga <[email protected]>
>> wrote:
>>
>>> Hello,
>>>
>>> I have a custom data source and I want to load the data into Spark to
>>> perform some computations. For this I see that I might need to implement
>>> a
>>> new RDD for my data source.
>>>
>>> I am a complete Scala noob and I am hoping that I can implement the RDD
>>> in
>>> Java only. I looked around the internet and could not find any resources.
>>> Any pointers?
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-custom-RDD-in-Java-tp23026.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>
>
> --
> Sent from my Lumia thumb-typed with errors.
>

Reply via email to