Hi,

Not sure how IndexedRDD is persisted, but perhaps you're better off using a
NOSQL database for lookups (perhaps using Cassandra, with the Cassandra
connector)? That should give you good performance on lookups, but
persisting those billion records sounds like something that will take some
time in any case.

Regards,
Vetle


On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <jem.tuc...@gmail.com> wrote:

> Hello,
>
> I have been using IndexedRDD as a large lookup (1 billion records) to join
> with small tables (1 million rows). The performance of indexedrdd is great
> until it has to be persisted on disk. Are there any alternatives to
> IndexedRDD or any changes to how I use it to improve performance with big
> data volumes?
>
> Kindest Regards,
>
> Jem
>

Reply via email to