How about using `SparkListener`? You can collect IO statistics thru TaskMetrics#inputMetrics by yourself.
// maropu On Mon, Jul 4, 2016 at 11:46 AM, Pedro Rodriguez <ski.rodrig...@gmail.com> wrote: > Hi All, > > I noticed on some Spark jobs it shows you input/output read size. I am > implementing a custom RDD which reads files and would like to report these > metrics to Spark since they are available to me. > > I looked through the RDD source code and a couple different > implementations and the best I could find were some Hadoop metrics. Is > there a way to simply report the number of bytes a partition read so Spark > can put it on the UI? > > Thanks, > — > Pedro Rodriguez > PhD Student in Large-Scale Machine Learning | CU Boulder > Systems Oriented Data Scientist > UC Berkeley AMPLab Alumni > > pedrorodriguez.io | 909-353-4423 > github.com/EntilZha | LinkedIn > <https://www.linkedin.com/in/pedrorodriguezscience> > -- --- Takeshi Yamamuro