Re: "Dynamic variables" in Spark

2014-08-15 Thread Neil Ferguson
I've opened SPARK-3051 (https://issues.apache.org/jira/browse/SPARK-3051) based on this thread. Neil On Thu, Jul 24, 2014 at 10:30 PM, Neil Ferguson wrote: > That would work well for me! Do you think it would be necessary to specify > which accumulators should be available in the registry, or

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
That would work well for me! Do you think it would be necessary to specify which accumulators should be available in the registry, or would we just broadcast all named accumulators registered in SparkContext and make them available in the registry? Anyway, I'm happy to make the necessary cha

Re: "Dynamic variables" in Spark

2014-07-24 Thread Patrick Wendell
What if we have a registry for accumulators, where you can access them statically by name? - Patrick On Thu, Jul 24, 2014 at 1:51 PM, Neil Ferguson wrote: > I realised that my last reply wasn't very clear -- let me try and clarify. > > The patch for named accumulators looks very useful, however

Re: "Dynamic variables" in Spark

2014-07-24 Thread Neil Ferguson
I realised that my last reply wasn't very clear -- let me try and clarify. The patch for named accumulators looks very useful, however in Shivaram's example he was able to retrieve the named task metrics (statically) from a TaskMetrics object, as follows: TaskMetrics.get("f1-time") However, I do

Re: "Dynamic variables" in Spark

2014-07-23 Thread Neil Ferguson
Hi Patrick. That looks very useful. The thing that seems to be missing from Shivaram's example is the ability to access TaskMetrics statically (this is the same problem that I am trying to solve with dynamic variables). You mention defining an accumulator on the RDD. Perhaps I am missin

Re: "Dynamic variables" in Spark

2014-07-22 Thread Patrick Wendell
Shivaram, You should take a look at this patch which adds support for naming accumulators - this is likely to get merged in soon. I actually started this patch by supporting named TaskMetrics similar to what you have there, but then I realized there is too much semantic overlap with accumulators,

Re: "Dynamic variables" in Spark

2014-07-22 Thread Neil Ferguson
Hi Christopher Thanks for your reply. I'll try and address your points -- please let me know if I missed anything. Regarding clarifying the problem statement, let me try and do that with a real-world example. I have a method that I want to measure the performance of, which has the following signa

Re: "Dynamic variables" in Spark

2014-07-22 Thread Shivaram Venkataraman
>From reading Neil's first e-mail, I think the motivation is to get some metrics in ADAM ? -- I've run into a similar use-case with having user-defined metrics in long-running tasks and I think a nice way to solve this would be to have user-defined TaskMetrics. To state my problem more clearly, l

Re: "Dynamic variables" in Spark

2014-07-22 Thread Neil Ferguson
Hi Reynold Thanks for your reply. Accumulators are, of course, stored in the Accumulators object as thread-local variables. However, the Accumulators object isn't public, so when a Task is executing there's no way to get the set of accumulators for the current thread -- accumulators still have to

Re: "Dynamic variables" in Spark

2014-07-21 Thread Reynold Xin
Thanks for the thoughtful email, Neil and Christopher. If I understand this correctly, it seems like the dynamic variable is just a variant of the accumulator (a static one since it is a global object). Accumulators are already implemented using thread-local variables under the hood. Am I misunder

Re: "Dynamic variables" in Spark

2014-07-21 Thread Christopher Nguyen
Hi Neil, first off, I'm generally a sympathetic advocate for making changes to Spark internals to make it easier/better/faster/more awesome. In this case, I'm (a) not clear about what you're trying to accomplish, and (b) a bit worried about the proposed solution. On (a): it is stated that you wan