Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
By the way - if you're going this route, see https://github.com/datastax/spark-cassandra-connector On Thu, Jul 16, 2015 at 2:40 PM Vetle Leinonen-Roeim wrote: > You'll probably have to install it separately. > > On Thu, Jul 16, 2015 at 2:29 PM Jem Tucker wrote: > >>

Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
ne database which is installed separately? > > Thanks, > > Jem > > On Thu, Jul 16, 2015 at 12:59 PM Vetle Leinonen-Roeim > wrote: > >> Hi, >> >> Not sure how IndexedRDD is persisted, but perhaps you're better off using >> a NOSQL database for lookup

Re: Indexed Store for lookup table

2015-07-16 Thread Vetle Leinonen-Roeim
Hi, Not sure how IndexedRDD is persisted, but perhaps you're better off using a NOSQL database for lookups (perhaps using Cassandra, with the Cassandra connector)? That should give you good performance on lookups, but persisting those billion records sounds like something that will take some time

Re: Running foreach on a list of rdds in parallel

2015-07-15 Thread Vetle Leinonen-Roeim
On Thu, Jul 16, 2015 at 7:37 AM Brandon White wrote: > Hello, > > I have a list of rdds > > List(rdd1, rdd2, rdd3,rdd4) > > I would like to save these rdds in parallel. Right now, it is running each > operation sequentially. I tried using a rdd of rdd but that does not work. > > list.foreach { rd