how does replicate() method in BlockManager.scala aquires resources for rdd replication

2014-09-15 Thread rapelly kartheek
HI, I was tracing the flow of replicate method in BlockManager.scala. I am trying to find out as to where exactly in the code, the resources are aquired for rdd replication. I find that the BlockManagerMaster.getPeers() method returns only one BlockManagerId for all the rdd partitions. But, the

Re: NullWritable not serializable

2014-09-15 Thread Matei Zaharia
Can you post the exact code for the test that worked in 1.0? I can't think of much that could've changed. The one possibility is if  we had some operations that were computed locally on the driver (this happens with things like first() and take(), which will try to do the first partition locally

Re: PARSING_ERROR from kryo

2014-09-15 Thread npanj
Hi Andrew, No I could not figure out the root cause. This seems to be non-deterministic error... I didn't see same error after rerunning same program. But I noticed same error on a different program. First I thought that this may be related to SPARK-2878, but @Graham replied that this looks irre

Wiki page for Operations/Monitoring tools?

2014-09-15 Thread Otis Gospodnetic
Hi, I'm looking for a suitable place on the Wiki to add some info about a Spark monitoring we've built. The Wiki looks nice and orderly, so I didn't want to go in and mess it up without asking where to put such info. I don't see an existing "Operations" or "Monitoring" or similar pages. Should

Re: why does BernoulliSampler class use a lower and upper bound?

2014-09-15 Thread Xiangrui Meng
It is also used in RDD.randomSplit. -Xiangrui On Mon, Sep 15, 2014 at 4:23 PM, Erik Erlandson wrote: > I'm climbing under the hood in there for SPARK-3250, and I see this: > > override def sample(items: Iterator[T]): Iterator[T] = { > items.filter { item => > val x = rng.nextDouble() >

why does BernoulliSampler class use a lower and upper bound?

2014-09-15 Thread Erik Erlandson
I'm climbing under the hood in there for SPARK-3250, and I see this: override def sample(items: Iterator[T]): Iterator[T] = { items.filter { item => val x = rng.nextDouble() (x >= lb && x < ub) ^ complement } } The clause (x >= lb && x < ub) is equivalent to (x < ub-lb), which is fas

Re: PARSING_ERROR from kryo

2014-09-15 Thread Andrew Ash
I should clarify: I'm not using GraphX, it's a different application-specific Kryo registrator that causes the same stacktrace ending in PARSING_ERROR: com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) com.esotericsoftware.kryo.io.Input.

Re: CoHadoop Papers

2014-09-15 Thread Colin McCabe
This feature is called "block affinity groups" and it's been under discussion for a while, but isn't fully implemented yet. HDFS-2576 is not a complete solution because it doesn't change the way the balancer works, just the initial placement of blocks. Once heterogeneous storage management (HDFS-

Re: PARSING_ERROR from kryo

2014-09-15 Thread Ankur Dave
At 2014-09-15 08:59:48 -0700, Andrew Ash wrote: > I'm seeing the same exception now on the Spark 1.1.0 release. Did you ever > get this figured out? > > [...] > > On Thu, Aug 21, 2014 at 2:14 PM, npanj wrote: >> I am getting PARSING_ERROR while running my job on the code checked out up >> to com

Re: PARSING_ERROR from kryo

2014-09-15 Thread Andrew Ash
Hi npanj, I'm seeing the same exception now on the Spark 1.1.0 release. Did you ever get this figured out? Andrew On Thu, Aug 21, 2014 at 2:14 PM, npanj wrote: > Hi All, > > I am getting PARSING_ERROR while running my job on the code checked out up > to commit# db56f2df1b8027171da1b8d2571d1f2

Re: Spark authenticate enablement

2014-09-15 Thread Tom Graves
Spark authentication does work in standalone mode (atleast it did, I haven't tested it in a while). The same shared secret has to be set on all the daemons (master and workers) and then also in the configs of any applications submitted. Since everyone shares the same secret its by no means idea

Re: Adding abstraction in MLlib

2014-09-15 Thread Reynold Xin
Hi Egor, Thanks for the suggestion. It is definitely our intention and practice to post design docs as soon as they are ready, and short iteration cycles. As a matter of fact, we encourage design docs for major features posted before implementation starts, and WIP pull requests before they are ful