Thanks Silvio, JIRA submitted https://issues.apache.org/jira/browse/SPARK-14332.
On Fri, 25 Mar 2016 at 12:46 Silvio Fiorito <silvio.fior...@granturing.com> wrote: > Hi Mike, > > Sorry got swamped with work and didn’t get a chance to reply. > > I misunderstood what you were trying to do. I thought you were just > looking to create custom metrics vs looking for the existing Hadoop Output > Format counters. > > I’m not familiar enough with the Hadoop APIs but I think it would require > a change to the SparkHadoopWriter > <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala> > class since it generates the JobContext which is required to read the > counters. Then it could publish the counters to the Spark metrics system. > > I would suggest going ahead and submitting a JIRA request if there isn’t > one already. > > Thanks, > Silvio > > From: Mike Sukmanowsky <mike.sukmanow...@gmail.com> > Date: Friday, March 25, 2016 at 10:48 AM > > To: Silvio Fiorito <silvio.fior...@granturing.com>, "user@spark.apache.org" > <user@spark.apache.org> > Subject: Re: Spark Metrics Framework? > > Pinging again - any thoughts? > > On Wed, 23 Mar 2016 at 09:17 Mike Sukmanowsky <mike.sukmanow...@gmail.com> > wrote: > >> Thanks Ted and Silvio. I think I'll need a bit more hand holding here, >> sorry. The way we use ES Hadoop is in pyspark via >> org.elasticsearch.hadoop.mr.EsOutputFormat in a saveAsNewAPIHadoopFile >> call. Given the Hadoop interop, I wouldn't assume that the EsOutputFormat >> class >> <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java> >> could be modified to define a new Source and register it via >> MetricsSystem.createMetricsSystem. This feels like a good feature request >> for Spark actually: "Support Hadoop Counters in Input/OutputFormats as >> Spark metrics" but I wanted some feedback first to see if that makes sense. >> >> That said, some of the custom RDD classes >> <https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/core/main/scala/org/elasticsearch/spark/rdd> >> could >> probably be modified to register a new Source when they perform >> reading/writing from/to Elasticsearch. >> >> On Tue, 22 Mar 2016 at 15:17 Silvio Fiorito < >> silvio.fior...@granturing.com> wrote: >> >>> Hi Mike, >>> >>> It’s been a while since I worked on a custom Source but I think all you >>> need to do is make your Source in the org.apache.spark package. >>> >>> Thanks, >>> Silvio >>> >>> From: Mike Sukmanowsky <mike.sukmanow...@gmail.com> >>> Date: Tuesday, March 22, 2016 at 3:13 PM >>> To: Silvio Fiorito <silvio.fior...@granturing.com>, " >>> user@spark.apache.org" <user@spark.apache.org> >>> Subject: Re: Spark Metrics Framework? >>> >>> The Source class is private >>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/source/Source.scala#L22-L25> >>> to the spark package and any new Sources added to the metrics registry must >>> be of type Source >>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L144-L152>. >>> So unless I'm mistaken, we can't define a custom source. I linked to 1.4.1 >>> code, but the same is true in 1.6.1. >>> >>> On Mon, 21 Mar 2016 at 12:05 Silvio Fiorito < >>> silvio.fior...@granturing.com> wrote: >>> >>>> You could use the metric sources and sinks described here: >>>> http://spark.apache.org/docs/latest/monitoring.html#metrics >>>> >>>> If you want to push the metrics to another system you can define a >>>> custom sink. You can also extend the metrics by defining a custom source. >>>> >>>> From: Mike Sukmanowsky <mike.sukmanow...@gmail.com> >>>> Date: Monday, March 21, 2016 at 11:54 AM >>>> To: "user@spark.apache.org" <user@spark.apache.org> >>>> Subject: Spark Metrics Framework? >>>> >>>> We make extensive use of the elasticsearch-hadoop library for >>>> Hadoop/Spark. In trying to troubleshoot our Spark applications, it'd be >>>> very handy to have access to some of the many metrics >>>> <https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html> >>>> that the library makes available when running in map reduce mode. The >>>> library's >>>> author noted >>>> <https://discuss.elastic.co/t/access-es-hadoop-stats-from-spark/44913> >>>> that Spark doesn't offer any kind of a similar metrics API where by these >>>> metrics could be reported or aggregated on. >>>> >>>> Are there any plans to bring a metrics framework similar to Hadoop's >>>> Counter system to Spark or is there an alternative means for us to grab >>>> metrics exposed when using Hadoop APIs to load/save RDDs? >>>> >>>> Thanks, >>>> Mike >>>> >>>