Re: CoHadoop Papers

2014-09-15 Thread Colin McCabe
This feature is called "block affinity groups" and it's been under discussion for a while, but isn't fully implemented yet. HDFS-2576 is not a complete solution because it doesn't change the way the balancer works, just the initial placement of blocks. Once heterogeneous storage management (HDFS-

Re: Exception in Spark 1.0.1: com.esotericsoftware.kryo.KryoException: Buffer underflow

2014-08-01 Thread Colin McCabe
On Fri, Aug 1, 2014 at 2:45 PM, Andrew Ash wrote: > After several days of debugging, we think the issue is that we have > conflicting versions of Guava. Our application was running with Guava 14 > and the Spark services (Master, Workers, Executors) had Guava 16. We had > custom Kryo serializers

Re: Suggestion for SPARK-1825

2014-07-25 Thread Colin McCabe
situation. best, Colin On Fri, Jul 25, 2014 at 11:23 AM, Colin McCabe wrote: > I have a similar issue with SPARK-1767. There are basically three ways to > resolve the issue: > > 1. Use reflection to access classes newer than 0.21 (or whatever the > oldest version of Hadoop is that S

Re: Suggestion for SPARK-1825

2014-07-25 Thread Colin McCabe
I have a similar issue with SPARK-1767. There are basically three ways to resolve the issue: 1. Use reflection to access classes newer than 0.21 (or whatever the oldest version of Hadoop is that Spark supports) 2. Add a build variant (in Maven this would be a profile) that deals with this. 3. Aut

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-06-03 Thread Colin McCabe
e in 1.0.1 as > > well. For other people running into this, you can export SCALA_HOME to > > any value and it will work. > > > > - Patrick > > > > On Sat, May 31, 2014 at 8:34 PM, Colin McCabe > wrote: > >> Spark currently supports two build systems, sbt

Re: SCALA_HOME or SCALA_LIBRARY_PATH not set during build

2014-05-31 Thread Colin McCabe
Spark currently supports two build systems, sbt and maven. sbt will download the correct version of scala, but with Maven you need to supply it yourself and set SCALA_HOME. It sounds like the instructions need to be updated-- perhaps create a JIRA? best, Colin On Sat, May 31, 2014 at 7:06 PM,

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-31 Thread Colin McCabe
ed... if so what about all it's dependencies. > I wonder if it would be possible to put Hadoop and its dependencies "in a box," (as it were) by using a separate classloader for them. That might solve this without requiring an uber-jar. It would be nice to not have to transfe

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
ioned, there might be a few edge cases where this breaks reflection, but I don't think that's an issue for most libraries. So at worst case we could end up needing apps to follow us in lockstep for Kryo or maybe Akka, but not the whole kit and caboodle like with Hadoop. best, Colin - P

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-30 Thread Colin McCabe
First of all, I think it's great that you're thinking about this. API stability is super important and it would be good to see Spark get on top of this. I want to clarify a bit about Hadoop. The problem that Hadoop faces is that the Java package system isn't very flexible. If you have a method

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
t Another solution would be to use newInstance and build your own FS cache, essentially. I don't think it would be that much code. This might be nicer because you could implement things like closing FileSystem objects that haven't been used in a while. cheers, Colin > On Thu, May 22

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-22 Thread Colin McCabe
The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options wi

Re: [VOTE] Release Apache Spark 1.0.0 (RC10)

2014-05-21 Thread Colin McCabe
Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release ver