[ https://issues.apache.org/jira/browse/HIVE-24717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277535#comment-17277535 ]
Mustafa İman commented on HIVE-24717: ------------------------------------- This caused so many precommit tests to fail. Why is below # Majority of tests run on local file system which is a ChecksumFileSystem. # ChecksumFileSystem#listStatusIterator() does not ignore checksum files (.A.crc files) # listStatusIterator's next returns A first, .A.crc second. # .A.crc files are automatically deleted by ChecksumFileSystem when file A is deleted. # Since we delete file A before iterator processes .A.crc, iterator cannot find .A.crc when it checks its permissions. # On Linux and Macos, LocalFileSystem invokes "ls -ld .A.crc" to get the permissions. # This exits with code 2 on Linux and iterator just throws. Interestingly, the tests run fine on my computer. The reason is I use a Mac. "ls -ld" returns 1 for the same error on Macos. Hadoop ignores exit code 1 but not 2. So the tests pass on Mac but fail on Linux. Apparently this is fixed on Hadoop 3.2 via https://issues.apache.org/jira/browse/HADOOP-12502 . We are using Hadoop 3.1.0. We need to either upgrade to Hadoop 3.2 or get the fix backported to Hadoop 3.1 line. > Migrate to listStatusIterator in moving files > --------------------------------------------- > > Key: HIVE-24717 > URL: https://issues.apache.org/jira/browse/HIVE-24717 > Project: Hive > Issue Type: Improvement > Reporter: Mustafa İman > Assignee: Mustafa İman > Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Hive.java has various calls to hdfs listStatus call when moving > files/directories around. These codepaths are used for insert overwrite > table/partition queries. > listStatus It is blocking call whereas listStatusIterator is backed by a > RemoteIterator and fetches pages in the background. Hive should take > advantage of that since Hadoop has implemented listStatusIterator for S3 > recently https://issues.apache.org/jira/browse/HADOOP-17074 -- This message was sent by Atlassian Jira (v8.3.4#803005)