Pinging again - any thoughts? On Wed, 23 Mar 2016 at 09:17 Mike Sukmanowsky <mike.sukmanow...@gmail.com> wrote:
> Thanks Ted and Silvio. I think I'll need a bit more hand holding here, > sorry. The way we use ES Hadoop is in pyspark via > org.elasticsearch.hadoop.mr.EsOutputFormat in a saveAsNewAPIHadoopFile > call. Given the Hadoop interop, I wouldn't assume that the EsOutputFormat > class > <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java> > could be modified to define a new Source and register it via > MetricsSystem.createMetricsSystem. This feels like a good feature request > for Spark actually: "Support Hadoop Counters in Input/OutputFormats as > Spark metrics" but I wanted some feedback first to see if that makes sense. > > That said, some of the custom RDD classes > <https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/core/main/scala/org/elasticsearch/spark/rdd> > could > probably be modified to register a new Source when they perform > reading/writing from/to Elasticsearch. > > On Tue, 22 Mar 2016 at 15:17 Silvio Fiorito <silvio.fior...@granturing.com> > wrote: > >> Hi Mike, >> >> It’s been a while since I worked on a custom Source but I think all you >> need to do is make your Source in the org.apache.spark package. >> >> Thanks, >> Silvio >> >> From: Mike Sukmanowsky <mike.sukmanow...@gmail.com> >> Date: Tuesday, March 22, 2016 at 3:13 PM >> To: Silvio Fiorito <silvio.fior...@granturing.com>, " >> user@spark.apache.org" <user@spark.apache.org> >> Subject: Re: Spark Metrics Framework? >> >> The Source class is private >> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/source/Source.scala#L22-L25> >> to the spark package and any new Sources added to the metrics registry must >> be of type Source >> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L144-L152>. >> So unless I'm mistaken, we can't define a custom source. I linked to 1.4.1 >> code, but the same is true in 1.6.1. >> >> On Mon, 21 Mar 2016 at 12:05 Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> You could use the metric sources and sinks described here: >>> http://spark.apache.org/docs/latest/monitoring.html#metrics >>> >>> If you want to push the metrics to another system you can define a >>> custom sink. You can also extend the metrics by defining a custom source. >>> >>> From: Mike Sukmanowsky <mike.sukmanow...@gmail.com> >>> Date: Monday, March 21, 2016 at 11:54 AM >>> To: "user@spark.apache.org" <user@spark.apache.org> >>> Subject: Spark Metrics Framework? >>> >>> We make extensive use of the elasticsearch-hadoop library for >>> Hadoop/Spark. In trying to troubleshoot our Spark applications, it'd be >>> very handy to have access to some of the many metrics >>> <https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html> >>> that the library makes available when running in map reduce mode. The >>> library's >>> author noted >>> <https://discuss.elastic.co/t/access-es-hadoop-stats-from-spark/44913> >>> that Spark doesn't offer any kind of a similar metrics API where by these >>> metrics could be reported or aggregated on. >>> >>> Are there any plans to bring a metrics framework similar to Hadoop's >>> Counter system to Spark or is there an alternative means for us to grab >>> metrics exposed when using Hadoop APIs to load/save RDDs? >>> >>> Thanks, >>> Mike >>> >>