Re: NoClassDefFoundError with Spark 1.3

2015-05-08 Thread Ganelin, Ilya
-ABB0-94CDC3D88A09] From: Olivier Girardot mailto:ssab...@gmail.com>> Date: Friday, May 8, 2015 at 6:40 AM To: Akhil Das mailto:ak...@sigmoidanalytics.com>>, "Ganelin, Ilya" mailto:ilya.gane...@capitalone.com>> Cc: dev mailto:dev@spark.apache.org>> Subject: Re: N

NoClassDefFoundError with Spark 1.3

2015-05-07 Thread Ganelin, Ilya
Hi all – I’m attempting to build a project with SBT and run it on Spark 1.3 (this previously worked before we upgraded to CDH 5.4 with Spark 1.3). I have the following in my build.sbt: scalaVersion := "2.10.4" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.3.0" % "prov

Re: Should we let everyone set Assignee?

2015-04-22 Thread Ganelin, Ilya
As a contributor, I¹ve never felt shut out from the Spark community, nor have I seen any examples of territorial behavior. A few times I¹ve expressed interest in more challenging work and the response I received was generally ³go ahead and give it a shot, just understand that this is sensitive code

Problems with cleanup throughout code base

2015-03-30 Thread Ganelin, Ilya
Hi all, when looking into a fix for a deadlock in the SparkContext shutdown code for https://issues.apache.org/jira/browse/SPARK-6492, I noticed that the “isStopped” flag is set to true before executing the actual shutdown code. This is a problem since it means that if the shutdown sequence does

Spark tests hang on local machine due to "testGuavaOptional" in JavaAPISuite

2015-03-10 Thread Ganelin, Ilya
Hi all – building Spark on my local machine with build/mvn clean package test runs until it hits the JavaAPISuite where it hangs indefinitely. Through some experimentation, I’ve narrowed it down to the following test: /** * Test for SPARK-3647. This test needs to use the maven-built assembly t

Re: Highly interested in contributing to spark

2015-01-02 Thread Ganelin, Ilya
I might be seeing a similar error - I¹m trying to build behind a proxy. I was able to build until recently, but now when I run mvn clean package, I get the following errors: I would love to know what¹s going on here. Exception in thread "pool-1-thread-1" Exception in thread "main" java.lang.Excep

RE: Newest ML-Lib on Spark 1.1

2014-12-12 Thread Ganelin, Ilya
hanks! Sent with Good (www.good.com) -Original Message- From: Sean Owen [so...@cloudera.com<mailto:so...@cloudera.com>] Sent: Friday, December 12, 2014 04:54 PM Eastern Standard Time To: Ganelin, Ilya Cc: dev Subject: Re: Newest ML-Lib on Spark 1.1 Could you specify what problems yo

Newest ML-Lib on Spark 1.1

2014-12-12 Thread Ganelin, Ilya
Hi all – we’re running CDH 5.2 and would be interested in having the latest and greatest ML Lib version on our cluster (with YARN). Could anyone help me out in terms of figuring out what build profiles to use to get this to play well? Will I be able to update ML-Lib independently of updating the

Adding RDD function to segment an RDD (like substring)

2014-12-09 Thread Ganelin, Ilya
Hi all – a utility that I’ve found useful several times now when working with RDDs is to be able to reason about segments of the RDD. For example, if I have two large RDDs and I want to combine them in a way that would be intractable in terms of memory or disk storage (e.g. A cartesian) but a p

Re: Handling stale PRs

2014-12-08 Thread Ganelin, Ilya
Thank you for pointing this out, Nick. I know that for myself and my colleague who are starting to contribute to Spark, it¹s definitely discouraging to have fixes sitting in the pipeline. Could you recommend any other ways that we can facilitate getting these PRs accepted? Clean, well-tested code i

Re: Spurious test failures, testing best practices

2014-11-30 Thread Ganelin, Ilya
Hi, Patrick - with regards to testing on Jenkins, is the process for this to submit a pull request for the branch or is there another interface we can use to submit a build to Jenkins for testing? On 11/30/14, 6:49 PM, "Patrick Wendell" wrote: >Hey Ryan, > >A few more things here. You should fee

Re: Trouble testing after updating to latest master

2014-11-29 Thread Ganelin, Ilya
"sbt/sbt compile"? Also, if you can, can you reproduce >it if you checkout only the spark master branch and not merged with >your own code? Finally, if you can reproduce it on master, can you >perform a bisection to find out which commit caused it? > >- Patrick > >On

Trouble testing after updating to latest master

2014-11-29 Thread Ganelin, Ilya
Hi all – I’ve just merged in the latest changes from the Spark master branch to my local branch. I am able to build just fine with mvm clean package However, when I attempt to run dev/run-tests, I get the following error: Using /Library/Java/JavaVirtualMachines/jdk1.8.0_20.jdk/Contents/Home as d

Re: Skipping Bad Records in Spark

2014-11-14 Thread Ganelin, Ilya
Hi Quizhuang - you have two options: 1) Within the map step define a validation function that will be executed on every record. 2) Use the filter function to create a filtered dataset prior to processing. On 11/14/14, 10:28 AM, "Qiuzhuang Lian" wrote: >Hi, > >MapReduce has the feature of skippi

RE: Spark- How can I run MapReduce only on one partition in an RDD?

2014-11-13 Thread Ganelin, Ilya
want to use one partition but create a large RDD. I just want to map each partition one by one. So I can quickly get the early map result from the RDD. That's why I want to read a file on HDFS to create multiple RDDs. Any suggestions? Thanks, Tim 2014-11-13 17:05 GMT-06:00 Ganelin, Ilya

RE: Appropriate way to add a debug flag

2014-11-07 Thread Ganelin, Ilya
PM Eastern Standard Time To: Ganelin, Ilya; dev Subject: Re: Appropriate way to add a debug flag (Whoops, forgot to copy dev@ in my original reply; adding it back) Yeah, the GraphViz part was mostly for fun and for understanding cyclic object graphs. In general, an object graph might contain cyc

Appropriate way to add a debug flag

2014-11-05 Thread Ganelin, Ilya
Hello all – I am working on https://issues.apache.org/jira/browse/SPARK-3694 and would like to understand the appropriate mechanism by which to check for a debug flag before printing a graph traversal of dependencies of an RDD or Task. I understand that I can use the logging utility and use logD