Hi Vetle, IndexedRDD is persisted in the same way RDDs are as far as I am aware. Are you aware if Cassandra can be built into my application or has to be a stand alone database which is installed separately?
Thanks, Jem On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim <[email protected]> wrote: > Hi, > > Not sure how IndexedRDD is persisted, but perhaps you're better off using > a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra > connector)? That should give you good performance on lookups, but > persisting those billion records sounds like something that will take some > time in any case. > > Regards, > Vetle > > > On Thu, Jul 16, 2015 at 10:02 AM Jem Tucker <[email protected]> wrote: > >> Hello, >> >> I have been using IndexedRDD as a large lookup (1 billion records) to >> join with small tables (1 million rows). The performance of indexedrdd is >> great until it has to be persisted on disk. Are there any alternatives to >> IndexedRDD or any changes to how I use it to improve performance with big >> data volumes? >> >> Kindest Regards, >> >> Jem >> >
