Re: sc.parallelize with defaultParallelism=1

2015-09-30 Thread Marcelo Vanzin
; > > > From: Andy Dang > Sent: Wednesday, September 30, 2015 8:17 PM > To: Nicolae Marasoiu > Cc: user@spark.apache.org > Subject: Re: sc.parallelize with defaultParallelism=1 > > Can't you just load the data from HBase first, and then

Re: sc.parallelize with defaultParallelism=1

2015-09-30 Thread Nicolae Marasoiu
m/r part. From: Andy Dang Sent: Wednesday, September 30, 2015 8:17 PM To: Nicolae Marasoiu Cc: user@spark.apache.org Subject: Re: sc.parallelize with defaultParallelism=1 Can't you just load the data from HBase first, and then call sc.parallelize on your dataset? -Andy ---

Re: sc.parallelize with defaultParallelism=1

2015-09-30 Thread Andy Dang
Can't you just load the data from HBase first, and then call sc.parallelize on your dataset? -Andy --- Regards, Andy (Nam) Dang On Wed, Sep 30, 2015 at 12:52 PM, Nicolae Marasoiu < nicolae.maras...@adswizz.com> wrote: > Hi, > > > When calling sc.parallelize(data,1), is there a preference wh

sc.parallelize with defaultParallelism=1

2015-09-30 Thread Nicolae Marasoiu
Hi, When calling sc.parallelize(data,1), is there a preference where to put the data? I see 2 possibilities: sending it to a worker node, or keeping it on the driver program. I would prefer to keep the data local to the driver. The use case is when I need just to load a bit of data from HBas