[ https://issues.apache.org/jira/browse/PIG-5135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967670#comment-15967670 ]
Adam Szita commented on PIG-5135: --------------------------------- [~kellyzly] I've checked this, it seems that {{assertEquals(30, inputStats.get(0).getBytes());}} is fine, but {{assertEquals(18, inputStats.get(1).getBytes());}} is not true, Spark returns -1 here. The plan generated for spark consists of 4 jobs, last one being the responsible for replicated join. This latter does 3 loads, and thus SparkPigStats handle this as -1. (Even after adding together all the bytes from all load ops in this job I got different result than 18.) I guess compression is also at work here on the tmp file part generation that further alters the number of bytes being read. I would say we should leave the exclusion for Spark as is, but update the comment section since we don't get the expected numbers for a different reason. What do you think? > HDFS bytes read stats are always 0 in Spark mode > ------------------------------------------------ > > Key: PIG-5135 > URL: https://issues.apache.org/jira/browse/PIG-5135 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: liyunzhang_intel > Assignee: Adam Szita > Fix For: spark-branch > > Attachments: PIG-5135.0.patch, PIG-5135.1.patch > > > I discovered this while running TestOrcStoragePushdown unit test in Spark > mode where the test depends on the value of this stat. -- This message was sent by Atlassian JIRA (v6.3.15#6346)