Re: spark_classpath in core/pom.xml and yarn/pom.xml

2014-09-25 Thread Ye Xianjin
Hi Sandy, Sorry for the bothering. The tests run ok even the SPARK_CLASS setting is there now, but It gives a config warning and will potential interfere other settings like Marcelo said. The warning goes away if I remove it out. And Marcelo, I believe the setting in core/pom should not be u

Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Marcelo Vanzin
BTW I removed it from the yarn pom since it was not used (and actually interfered with a test I was writing). I did not touch the core pom, but I wouldn't be surprised if it's not needed there either. On Thu, Sep 25, 2014 at 3:29 PM, Sandy Ryza wrote: > Hi Ye, > > I think git blame shows me beca

Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Sandy Ryza
Hi Ye, I think git blame shows me because I fixed the formatting in core/pom.xml, but I don't actually know the original reason for setting SPARK_CLASSPATH there. Do the tests run OK if you take it out? -Sandy On Thu, Sep 25, 2014 at 1:59 AM, Ye Xianjin wrote: > hi, Sandy Ryza: > I beli

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Patrick Wendell
Yeah we can also move it first. Wouldn't hurt. On Thu, Sep 25, 2014 at 6:39 AM, Nicholas Chammas wrote: > It might still make sense to make this change if MIMA checks are always > relatively quick, for the same reason we do style checks first. > > On Thu, Sep 25, 2014 at 12:25 AM, Nan Zhu wrote:

Code reading tips Spark source

2014-09-25 Thread Mozumder, Monir
Folks, I am starting to explore Spark framework and hopefully contribute to it in future. I was wondering if you have any documentation or tips to get understanding the inner workings of the code quickly. I am new to both Spark and Scala and am taking a look at the *Rdd*.scala files in the so

VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao
Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read Shuf

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Du Li
Thanks, Yanbo and Nicholas. Now it makes more sense — query optimization is the answer. /Du From: Nicholas Chammas mailto:nicholas.cham...@gmail.com>> Date: Thursday, September 25, 2014 at 6:43 AM To: Yanbo Liang mailto:yanboha...@gmail.com>> Cc: Du Li mailto:l...@yahoo-inc.com.invalid>>, "dev@

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Egor Pahomov, thanks for your suggestions. I think I will do the dirty workaround because I don't want to maintain my own version of spark for now. Maybe I will do later when I feel ready to contribute to the project. Kind Regards, Niklas Wilcke On 25.09.2014 16:27, Egor Pahomov wrote: > I ag

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Yu Ishikawa, I'm sorry but I can't share my code via github at the moment. Hopefully in some months I can. I don't want to change the type of the label but that would be also a very nice improvement. Making LabeledPoint abstract is exactly what I need. That enables me to create a class like

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Egor Pahomov
I agree with Yu, that you should tell more about your intentions, but possible dirty workaround is create wrapper class for LabeledPoint with all additional information you need and unwrap values before train, and wrap them again after. (look at zipWithIndex - it helps match back additional informa

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Egor Pahomov, Thank you for your comment! - -- Yu Ishikawa -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MLlib-enable-extension-of-the-LabeledPoint-class-tp8546p8551.html Sent from the Apache Spark Developers List mailing list archive at Nab

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Egor Pahomov
@Yu Ishikawa, *I think the right place for such discussion - https://issues.apache.org/jira/browse/SPARK-3573 * 2014-09-25 18:02 GMT+04:00 Yu Ishikawa : > Hi Niklas Wilcke, > > As you said, it is difficult to extend LabeledPoint class in > mlli

Re: MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Yu Ishikawa
Hi Niklas Wilcke, As you said, it is difficult to extend LabeledPoint class in mllib.regression. Do you want to extend LabeledPoint class in order to use any other type exclude Double type? If you have your code on Github, could you show us it? I want to know what you want to do. > Community By t

Re: Spark SQL use of alias in where clause

2014-09-25 Thread Nicholas Chammas
That is correct. Aliases in the SELECT clause can only be referenced in the ORDER BY and HAVING clauses. Otherwise, you'll have to just repeat the statement, like concat() in this case. A more elegant alternative, which is probably not available in Spark SQL yet, is to use Common Table Expressions

Re: do MIMA checking before all test cases start?

2014-09-25 Thread Nicholas Chammas
It might still make sense to make this change if MIMA checks are always relatively quick, for the same reason we do style checks first. On Thu, Sep 25, 2014 at 12:25 AM, Nan Zhu wrote: > yeah, I tried that, but there is always an issue when I ran dev/mima, > > it always gives me some binary comp

MLlib enable extension of the LabeledPoint class

2014-09-25 Thread Niklas Wilcke
Hi Spark developers, I try to implement a framework with Spark and MLlib to do duplicate detection. I'm not familiar with Spark and Scala so please be patient with me. In order to enrich the LabeledPoint class with some information I tried to extend it and added some properties. But the ML algorit

Re: spark_classpath in core/pom.xml and yarn/porm.xml

2014-09-25 Thread Ye Xianjin
hi, Sandy Ryza: I believe It's you originally added the SPARK_CLASSPATH in core/pom.xml in the org.scalatest section. Does this still needed in 1.1? I noticed this setting because when I looked into the unit-tests.log, It shows something below: > 14/09/24 23:57:19.246 WARN SparkConf: >