Hello, Spark collect HDFS read/write metrics per application/job see details http://spark.apache.org/docs/latest/monitoring.html.
I have connected spark metrics to Graphite and then doing nice graphs display on Graphana. BR, Arek On Thu, Dec 31, 2015 at 2:00 PM, Steve Loughran <ste...@hortonworks.com> wrote: > >> On 30 Dec 2015, at 13:19, alvarobrandon <alvarobran...@gmail.com> wrote: >> >> Hello: >> >> Is there anyway of monitoring the number of Bytes or blocks read and written >> by an Spark application?. I'm running Spark with YARN and I want to measure >> how I/O intensive a set of applications are. Closest thing I have seen is >> the HDFS DataNode Logs in YARN but they don't seem to have Spark >> applications specific reads and writes. >> >> 2015-12-21 18:29:15,347 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: >> /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE, >> cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID: >> a9edc8ad-fb09-4621-b469-76de587560c0, blockid: >> BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration: >> 2619119 >> hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21 >> 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d >> >> Is there any trace about this kind of operations to be found in any log? > > > 1. the HDFS namenode and datanodes all collect metrics of their use, with > org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most > interesting on IO. > 2. FileSystem.Statistics is a static structure collecting data on operations > and data for each thread in a client process. > 3. The HDFS input streams also supports some read statistics (ReadStatistics > via getReadReadStatistics) > 4. the recent versions of HDFS are also adding htrace support, to trace > end-to-end performance. > > I'd start with FileSystem.Statistics; if that's not being collected across > spark jobs, it should be possible > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org