By the way - if you're going this route, see
https://github.com/datastax/spark-cassandra-connector
On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim
wrote:
> You'll probably have to install it separately.
>
> On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote:
>
>>
ne database which is installed separately?
>
> Thanks,
>
> Jem
>
> On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim
> wrote:
>
>> Hi,
>>
>> Not sure how IndexedRDD is persisted, but perhaps you're better off using
>> a NOSQL database for lookup
Hi,
Not sure how IndexedRDD is persisted, but perhaps you're better off using a
NOSQL database for lookups (perhaps using Cassandra, with the Cassandra
connector)? That should give you good performance on lookups, but
persisting those billion records sounds like something that will take some
time
On Thu, Jul 16, 2015 at 7:37 AM Brandon White
wrote:
> Hello,
>
> I have a list of rdds
>
> List(rdd1, rdd2, rdd3,rdd4)
>
> I would like to save these rdds in parallel. Right now, it is running each
> operation sequentially. I tried using a rdd of rdd but that does not work.
>
> list.foreach { rd