Hi, Not sure how IndexedRDD is persisted, but perhaps you're better off using a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra connector)? That should give you good performance on lookups, but persisting those billion records sounds like something that will take some time in any case.
Regards, Vetle On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <jem.tuc...@gmail.com> wrote: > Hello, > > I have been using IndexedRDD as a large lookup (1 billion records) to join > with small tables (1 million rows). The performance of indexedrdd is great > until it has to be persisted on disk. Are there any alternatives to > IndexedRDD or any changes to how I use it to improve performance with big > data volumes? > > Kindest Regards, > > Jem >