Re: spark 1.0 standalone application

2014-05-19 Thread Shivaram Venkataraman
On a related note there is also a staging Apache repository where the latest rc gets pushed to https://repository.apache.org/content/repositories/staging/org/apache/spark/spark-core_2.10/-- The artifact here is just named "1.0.0" (similar to the rc specific repository that Patrick mentioned). So i

Re: spark 1.0 standalone application

2014-05-19 Thread Nan Zhu
First time to know there is a temporary maven repository……. -- Nan Zhu On Monday, May 19, 2014 at 10:10 PM, Patrick Wendell wrote: > Whenever we publish a release candidate, we create a temporary maven > repository that host the artifacts. We do this precisely for the case > you are running i

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Patrick Wendell
We're cancelling this RC in favor of rc10. There were two blockers: an issue with Windows run scripts and an issue with the packaging for Hadoop 1 when hive support is bundled. https://issues.apache.org/jira/browse/SPARK-1875 https://issues.apache.org/jira/browse/SPARK-1876 Thanks everyone for th

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
In 1.0, there is a new option for users to choose which classloader has higher priority via spark.files.userClassPathFirst, I decided to submit the PR for 0.9 first. We use this patch in our lab and we can use those jars added by sc.addJar without reflection. https://github.com/apache/spark/pull/8

Re: spark 1.0 standalone application

2014-05-19 Thread nit
Thanks everyone. I followed Patrick's suggestion and it worked like a charm. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-1-0-standalone-application-tp6698p6710.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: TorrentBroadcast aka Cornet?

2014-05-19 Thread Andrew Ash
Thanks for the info Matei. Andrew On Mon, May 19, 2014 at 12:38 AM, Matei Zaharia wrote: > TorrentBroadcast is actually slightly simpler, but it’s based on that. It > has similar performance. I’d like to make it the default in a future > version, we just haven’t had a ton of testing with it yet

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
Good summary! We fixed it in branch 0.9 since our production is still in 0.9. I'm porting it to 1.0 now, and hopefully will submit PR for 1.0 tonight. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sandy Ryza
It just hit me why this problem is showing up on YARN and not on standalone. The relevant difference between YARN and standalone is that, on YARN, the app jar is loaded by the system classloader instead of Spark's custom URL classloader. On YARN, the system classloader knows about [the classes in

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
reduce always return a single element - maybe you are misunderstanding what the reduce function in collections does. On Mon, May 19, 2014 at 3:32 PM, GlennStrycker wrote: > I tried adding .copy() everywhere, but still only get one element returned, > not even an RDD object. > > orig_graph.edges.

Re: spark 1.0 standalone application

2014-05-19 Thread Sujeet Varakhedi
Threads like these are great candidates to be part of the "Contributors guide". I will create a JIRA to update the guide with data past threads like these. Sujeet On Mon, May 19, 2014 at 7:10 PM, Patrick Wendell wrote: > Whenever we publish a release candidate, we create a temporary maven > re

Re: spark 1.0 standalone application

2014-05-19 Thread Patrick Wendell
Whenever we publish a release candidate, we create a temporary maven repository that host the artifacts. We do this precisely for the case you are running into (where a user wants to build an application against it to test). You can build against the release candidate by just adding that repositor

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Patrick Wendell
Having a user add define a custom class inside of an added jar and instantiate it directly inside of an executor is definitely supported in Spark and has been for a really long time (several years). This is something we do all the time in Spark. DB - I'd hold off on a re-architecting of this until

Re: spark 1.0 standalone application

2014-05-19 Thread Mark Hamstra
That's the crude way to do it. If you run `sbt/sbt publishLocal`, then you can resolve the artifact from your local cache in the same way that you would resolve it if it were deployed to a remote cache. That's just the build step. Actually running the application will require the necessary jars

Re: [VOTE] Release Apache Spark 1.0.0 (rc9)

2014-05-19 Thread Nan Zhu
just rerun my test on rc5 everything works build applications with sbt and the spark-*.jar which is compiled with Hadoop 2.3 +1 -- Nan Zhu On Sunday, May 18, 2014 at 11:07 PM, witgo wrote: > How to reproduce this bug? > > > -- Original -- > From: "Patric

Re: spark 1.0 standalone application

2014-05-19 Thread Nan Zhu
en, you have to put spark-assembly-*.jar to the lib directory of your application Best, -- Nan Zhu On Monday, May 19, 2014 at 9:48 PM, nit wrote: > I am not much comfortable with sbt. I want to build a standalone application > using spark 1.0 RC9. I can build sbt assembly for my applicatio

spark and impala , which is more fitter for MPP

2014-05-19 Thread liuguodong
Hi, ALL My question is that spark and impala , which is more fitter for MPP . The motivition as below case: 1. three big table need make join operation; (about 100 field per table, more than 1TB per table) 2. beside above tables, it is very possible to they need

spark 1.0 standalone application

2014-05-19 Thread nit
I am not much comfortable with sbt. I want to build a standalone application using spark 1.0 RC9. I can build sbt assembly for my application with Spark 0.9.1, and I think in that case spark is pulled from Aka Repository? Now if I want to use 1.0 RC9 for my application; what is the process ? (FYI

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
I tried adding .copy() everywhere, but still only get one element returned, not even an RDD object. orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) => { if (A._1.copy().dstId == B._1.copy().srcI

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
Yea unfortunately you need that as well. When 1.0 is released, you wouldn't need to do that anymore. BTW - you can also just check out the source code from github to build 1.0. The current branch-1.0 branch is very already at release candidate status - so it should be almost identical to the actua

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
Thanks, rxin, this worked! I am having a similar problem with .reduce... do I need to insert .copy() functions in that statement as well? This part works: orig_graph.edges.map(_.copy()).flatMap(edge => Seq(edge) ).map(edge => (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).coll

Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned. It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to

BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
graph.triplets does not work -- it returns incorrect results I have a graph with the following edges: orig_graph.edges.collect = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1), Edge(

Re: queston about Spark repositories in GitHub

2014-05-19 Thread Matei Zaharia
“master” is where development happens, while branch-1.0, branch-0.9, etc are for maintenance releases in those versions. Most likely if you want to contribute you should use master. Some of the other named branches were for big features in the past, but none are actively used now. Matei On May

queston about Spark repositories in GitHub

2014-05-19 Thread Gil Vernik
Hello, I am new to the Spark community, so I apologize if I ask something obvious. I follow the document about contribution to Spark where it's written that I need to fork the https://github.com/apache/spark repository. I got a little bit confused since the repository https://github.com/apach

Re: TorrentBroadcast aka Cornet?

2014-05-19 Thread Matei Zaharia
TorrentBroadcast is actually slightly simpler, but it’s based on that. It has similar performance. I’d like to make it the default in a future version, we just haven’t had a ton of testing with it yet (kind of an oversight in this release unfortunately). Matei On May 19, 2014, at 12:07 AM, And

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Sean Owen
I don't think a customer classloader is necessary. Well, it occurs to me that this is no new problem. Hadoop, Tomcat, etc all run custom user code that creates new user objects without reflection. I should go see how that's done. Maybe it's totally valid to set the thread's context classloader for

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread Andrew Ash
Sounds like the problem is that classloaders always look in their parents before themselves, and Spark users want executors to pick up classes from their custom code before the ones in Spark plus its dependencies. Would a custom classloader that delegates to the parent after first checking itself

Re: Calling external classes added by sc.addJar needs to be through reflection

2014-05-19 Thread DB Tsai
Hi Sean, It's true that the issue here is classloader, and due to the classloader delegation model, users have to use reflection in the executors to pick up the classloader in order to use those classes added by sc.addJars APIs. However, it's very inconvenience for users, and not documented in spa

TorrentBroadcast aka Cornet?

2014-05-19 Thread Andrew Ash
Hi Spark devs, Is the algorithm for TorrentBroadcastthe same as Cornet from the below paper? http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf If so it would be nic