Thanks for reporting back. I was pretty confused trying to reproduce the
error :)
On Thu, Jul 24, 2014 at 1:09 PM, Stephen Boesch wrote:
> OK I did find my error. The missing step:
>
> mvn install
>
> I should have republished (mvn install) all of the other modules .
>
> The mvn -pl will r
That would work well for me! Do you think it would be necessary to specify
which accumulators should be available in the registry, or would we just
broadcast all named accumulators registered in SparkContext and make them
available in the registry?
Anyway, I'm happy to make the necessary cha
What if we have a registry for accumulators, where you can access them
statically by name?
- Patrick
On Thu, Jul 24, 2014 at 1:51 PM, Neil Ferguson wrote:
> I realised that my last reply wasn't very clear -- let me try and clarify.
>
> The patch for named accumulators looks very useful, however
I realised that my last reply wasn't very clear -- let me try and clarify.
The patch for named accumulators looks very useful, however in Shivaram's
example he was able to retrieve the named task metrics (statically) from a
TaskMetrics object, as follows:
TaskMetrics.get("f1-time")
However, I do
OK I did find my error. The missing step:
mvn install
I should have republished (mvn install) all of the other modules .
The mvn -pl will rely on the modules locally published and so the latest
code that I had git pull'ed was not being used (except the sql/core module
code).
The tests are
Are other developers seeing the following error for the recently added
substr() method? If not, any ideas why the following invocation of tests
would be failing for me - i.e. how the given invocation would need to be
tweaked?
mvn -Pyarn -Pcdh5 test -pl sql/core
-DwildcardSuites=org.apache.spark.
Sorry, I sent this to the dev list instead of user. Please ignore. I'll
re-post to the correct list.
Regards,
Art
On Thu, Jul 24, 2014 at 11:09 AM, Art Peel wrote:
> Our system works with RDDs generated from Hadoop files. It processes each
> record in a Hadoop file and for a subset of those
Our system works with RDDs generated from Hadoop files. It processes each
record in a Hadoop file and for a subset of those records generates output
that is written to an external system via RDD.foreach. There are no
dependencies between the records that are processed.
If writing to the external s
More documentation on this would be undoubtedly useful. Many of the
properties changed or were deprecated in Spark 1.0, and I'm not sure our
current set of documentation via userlists is up to par, since many of the
previous suggestions are deprecated.
On Thu, Jul 24, 2014 at 10:14 AM, Martin Goo
Great - thanks for the clarification Aaron. The offer stands for me to
write some documentation and an example that covers this without leaving
*any* room for ambiguity.
--
Martin Goodson | VP Data Science
(0)20 3397 1240
[image: Inline image 1]
On Thu, Jul 24, 2014 at 6:09 PM, Aaron David
Whoops, I was mistaken in my original post last year. By default, there is
one executor per node per Spark Context, as you said.
"spark.executor.memory" is the amount of memory that the application
requests for each of its executors. SPARK_WORKER_MEMORY is the amount of
memory a Spark Worker is wil
Hello,
I have an interesting use case for a pre-filtered RDD. I have two solutions
that I am not entirly happy with and would like to get some feedback and
thoughts. Perhaps it is a use case that could be more explicitly supported
in Spark.
My data has well defined semantics for they key values t
Hi all,
I'm implementing graph partitioning strategy for GraphX, learning from
researches on graph computing.
I have two questions:
- a specific implement question:
In current design, only vertex ID of src and dst are provided
(PartitionStrategy.scala).
And some strategies require knowledge
13 matches
Mail list logo