Hi Vetle,

IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are
you aware if Cassandra can be built into my application or has to be a
stand alone database which is installed separately?

Thanks,

Jem

On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim <[email protected]>
wrote:

> Hi,
>
> Not sure how IndexedRDD is persisted, but perhaps you're better off using
> a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra
> connector)? That should give you good performance on lookups, but
> persisting those billion records sounds like something that will take some
> time in any case.
>
> Regards,
> Vetle
>
>
> On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <[email protected]> wrote:
>
>> Hello,
>>
>> I have been using IndexedRDD as a large lookup (1 billion records) to
>> join with small tables (1 million rows). The performance of indexedrdd is
>> great until it has to be persisted on disk. Are there any alternatives to
>> IndexedRDD or any changes to how I use it to improve performance with big
>> data volumes?
>>
>> Kindest Regards,
>>
>> Jem
>>
>

Reply via email to