Re: SQLQuerySuite error

2014-07-24 Thread Michael Armbrust
Thanks for reporting back. I was pretty confused trying to reproduce the error :) On Thu, Jul 24, 2014 at 1:09 PM, Stephen Boesch wrote: > OK I did find my error. The missing step: > > mvn install > > I should have republished (mvn install) all of the other modules . > > The mvn -pl will r

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
That would work well for me! Do you think it would be necessary to specify which accumulators should be available in the registry, or would we just broadcast all named accumulators registered in SparkContext and make them available in the registry? Anyway, I'm happy to make the necessary cha

Re: "Dynamic variables" in Spark

2014-07-24 Thread Patrick Wendell
What if we have a registry for accumulators, where you can access them statically by name? - Patrick On Thu, Jul 24, 2014 at 1:51 PM, Neil Ferguson wrote: > I realised that my last reply wasn't very clear -- let me try and clarify. > > The patch for named accumulators looks very useful, however

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
I realised that my last reply wasn't very clear -- let me try and clarify. The patch for named accumulators looks very useful, however in Shivaram's example he was able to retrieve the named task metrics (statically) from a TaskMetrics object, as follows: TaskMetrics.get("f1-time") However, I do

Re: SQLQuerySuite error

2014-07-24 Thread Stephen Boesch
OK I did find my error. The missing step: mvn install I should have republished (mvn install) all of the other modules . The mvn -pl will rely on the modules locally published and so the latest code that I had git pull'ed was not being used (except the sql/core module code). The tests are

SQLQuerySuite error

2014-07-24 Thread Stephen Boesch
Are other developers seeing the following error for the recently added substr() method? If not, any ideas why the following invocation of tests would be failing for me - i.e. how the given invocation would need to be tweaked? mvn -Pyarn -Pcdh5 test -pl sql/core -DwildcardSuites=org.apache.spark.

Re: continuing processing when errors occur

2014-07-24 Thread Art Peel
Sorry, I sent this to the dev list instead of user. Please ignore. I'll re-post to the correct list. Regards, Art On Thu, Jul 24, 2014 at 11:09 AM, Art Peel wrote: > Our system works with RDDs generated from Hadoop files. It processes each > record in a Hadoop file and for a subset of those

continuing processing when errors occur

2014-07-24 Thread Art Peel
Our system works with RDDs generated from Hadoop files. It processes each record in a Hadoop file and for a subset of those records generates output that is written to an external system via RDD.foreach. There are no dependencies between the records that are processed. If writing to the external s

Re: Configuring Spark Memory

2014-07-24 Thread Aaron Davidson
More documentation on this would be undoubtedly useful. Many of the properties changed or were deprecated in Spark 1.0, and I'm not sure our current set of documentation via userlists is up to par, since many of the previous suggestions are deprecated. On Thu, Jul 24, 2014 at 10:14 AM, Martin Goo

Re: Configuring Spark Memory

2014-07-24 Thread Martin Goodson
Great - thanks for the clarification Aaron. The offer stands for me to write some documentation and an example that covers this without leaving *any* room for ambiguity. -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:09 PM, Aaron David

Re: Configuring Spark Memory

2014-07-24 Thread Aaron Davidson
Whoops, I was mistaken in my original post last year. By default, there is one executor per node per Spark Context, as you said. "spark.executor.memory" is the amount of memory that the application requests for each of its executors. SPARK_WORKER_MEMORY is the amount of memory a Spark Worker is wil

pre-filtered hadoop RDD use case

2014-07-24 Thread Eugene Cheipesh
Hello, I have an interesting use case for a pre-filtered RDD. I have two solutions that I am not entirly happy with and would like to get some feedback and thoughts. Perhaps it is a use case that could be more explicitly supported in Spark. My data has well defined semantics for they key values t

GraphX graph partitioning strategy

2014-07-24 Thread Larry Xiao
Hi all, I'm implementing graph partitioning strategy for GraphX, learning from researches on graph computing. I have two questions: - a specific implement question: In current design, only vertex ID of src and dst are provided (PartitionStrategy.scala). And some strategies require knowledge