NPE in Parquet

2015-01-20 Thread Alessandro Baretta
All, I strongly suspect this might be caused by a glitch in the communication with Google Cloud Storage where my job is writing to, as this NPE exception shows up fairly randomly. Any ideas? Exception in thread "Thread-126" java.lang.NullPointerException at scala.collection.mutable.ArrayO

Scaladoc

2014-10-30 Thread Alessandro Baretta
How do I build the scaladoc html files from the spark source distribution? Alex Bareta

Still struggling with building documentation

2014-11-07 Thread Alessandro Baretta
I finally came to realize that there is a special maven target to build the scaladocs, although arguably a very unintuitive on: mvn verify. So now I have scaladocs for each package, but not for the whole spark project. Specifically, build/docs/api/scala/index.html is missing. Indeed the whole build

Re: Still struggling with building documentation

2014-11-11 Thread Alessandro Baretta
have a separate thing with new dependencies in order to > build the web docs, but that's how it is at the moment. > > Nick > > On Fri, Nov 7, 2014 at 3:39 PM, Alessandro Baretta > wrote: > >> I finally came to realize that there is a special maven target to build >

Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
All, I'm using the Spark shell to interact with a small test deployment of Spark, built from the current master branch. I'm processing a dataset comprising a few thousand objects on Google Cloud Storage, split into a half dozen directories. My code constructs an object--let me call it the Dataset

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
Lee wrote: > > I'm curious if you're seeing the same thing when using bdutil against > GCS? I'm wondering if this may be an issue concerning the transfer rate of > Spark -> Hadoop -> GCS Connector -> GCS. > > > On Wed Dec 17 2014 at 10:09:17 PM Alessa

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
n just as fast > scans? > > > On Wed Dec 17 2014 at 10:44:45 PM Alessandro Baretta < > alexbare...@gmail.com> wrote: > >> Denny, >> >> No, gsutil scans through the listing of the bucket quickly. See the >> following. >> >> alex@had

Re: Spark Shell slowness on Google Cloud

2014-12-17 Thread Alessandro Baretta
n Wed, Dec 17, 2014 at 11:24 PM, Alessandro Baretta wrote: > > Well, what do you suggest I run to test this? But more importantly, what > information would this give me? > > On Wed, Dec 17, 2014 at 10:46 PM, Denny Lee wrote: >> >> Oh, it makes sense of gsutil scans

/tmp directory fills up

2015-01-09 Thread Alessandro Baretta
Gents, I'm building spark using the current master branch and deploying in to Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop cluster provisioning tool. bdutils configures Spark with spark.local.dir=/hadoop/spark/tmp, but this option is ignored in combination with YAR

Re: Job priority

2015-01-10 Thread Alessandro Baretta
v, +user > > http://spark.apache.org/docs/latest/job-scheduling.html > > > On Sat, Jan 10, 2015 at 4:40 PM, Alessandro Baretta > wrote: > >> Is it possible to specify a priority level for a job, such that the active >> jobs might be scheduled in order of priority? >> >> Alex >> > >

Re: Job priority

2015-01-10 Thread Alessandro Baretta
/docs/latest/job-scheduling.html#configuring-pool-properties > > "Setting a high weight such as 1000 also makes it possible to implement > *priority* between pools—in essence, the weight-1000 pool will always get > to launch tasks first whenever it has jobs active." > > On Sat,

Re: Job priority

2015-01-11 Thread Alessandro Baretta
> > On Sunday, January 11, 2015, Alessandro Baretta > wrote: > >> Cody, >> >> Maybe I'm not getting this, but it doesn't look like this page is >> describing a priority queue scheduling policy. What this section discusses >> is how resources