[jira] [Commented] (HADOOP-19139) [ABFS]: No GetPathStatus call for opening AbfsInputStream

ASF GitHub Bot (Jira) Fri, 19 Jul 2024 01:59:47 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-19139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867196#comment-17867196
 ]


ASF GitHub Bot commented on HADOOP-19139:
-----------------------------------------

saxenapranav commented on code in PR #6699:
URL: https://github.com/apache/hadoop/pull/6699#discussion_r1683883811


##########
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java:
##########
@@ -385,32 +434,74 @@ private int readLastBlock(final byte[] b, final int off, 
final int len)
     // data need to be copied to user buffer from index bCursor,
     // AbfsInutStream buffer is going to contain data from last block start. In
     // that case bCursor will be set to fCursor - lastBlockStart
-    long lastBlockStart = max(0, contentLength - footerReadSize);
+    if (!getFileStatusInformationPresent()) {
+      long lastBlockStart = max(0, (fCursor + len) - footerReadSize);
+      bCursor = (int) (fCursor - lastBlockStart);
+      return optimisedRead(b, off, len, lastBlockStart, min(fCursor + len, 
footerReadSize), true);
+    }
+    long lastBlockStart = max(0, getContentLength() - footerReadSize);
     bCursor = (int) (fCursor - lastBlockStart);
     // 0 if contentlength is < buffersize
-    long actualLenToRead = min(footerReadSize, contentLength);
-    return optimisedRead(b, off, len, lastBlockStart, actualLenToRead);
+    long actualLenToRead = min(footerReadSize, getContentLength());
+    return optimisedRead(b, off, len, lastBlockStart, actualLenToRead, false);
   }
 
   private int optimisedRead(final byte[] b, final int off, final int len,
-      final long readFrom, final long actualLen) throws IOException {
+      final long readFrom, final long actualLen,
+      final boolean isOptimizedReadWithoutContentLengthInformation) throws 
IOException {
     fCursor = readFrom;
     int totalBytesRead = 0;
     int lastBytesRead = 0;
     try {
       buffer = new byte[bufferSize];
+      boolean fileStatusInformationPresentBeforeRead = 
getFileStatusInformationPresent();
+      /*
+       * Content length would not be available for the first optimized read in 
case
+       * of lazy head optimization in inputStream. In such case, read of the 
first optimized read
+       * would be done without the contentLength constraint. Post first call, 
the contentLength
+       * would be present and should be used for further reads.
+       */
       for (int i = 0;
-           i < MAX_OPTIMIZED_READ_ATTEMPTS && fCursor < contentLength; i++) {
+           i < MAX_OPTIMIZED_READ_ATTEMPTS && 
(!getFileStatusInformationPresent()
+               || fCursor < getContentLength()); i++) {
         lastBytesRead = readInternal(fCursor, buffer, limit,
             (int) actualLen - limit, true);
         if (lastBytesRead > 0) {
           totalBytesRead += lastBytesRead;
           limit += lastBytesRead;
           fCursor += lastBytesRead;
           fCursorAfterLastRead = fCursor;
+
+          /*
+           * In non-lazily opened inputStream, the contentLength would be 
available before
+           * opening the inputStream. In such case, optimized read would 
always be done
+           * on the last part of the file.
+           *
+           * In lazily opened inputStream, the contentLength would not be 
available before
+           * opening the inputStream. In such case, contentLength conditioning 
would not be
+           * applied to execute optimizedRead. Hence, the optimized read may 
not be done on the
+           * last part of the file. If the optimized read is done on the 
non-last part of the
+           * file, inputStream should read only the amount of data requested 
by optimizedRead,
+           * as the buffer supplied would be only of the size of the data 
requested by optimizedRead.
+           */
+          boolean shouldBreak = !fileStatusInformationPresentBeforeRead
+              && totalBytesRead == (int) actualLen;
+          if (shouldBreak) {
+            break;
+          }
         }
       }
     } catch (IOException e) {
+      if (e instanceof FileNotFoundException) {

Review Comment:
   Good point. Taken.





> [ABFS]: No GetPathStatus call for opening AbfsInputStream
> ---------------------------------------------------------
>
>                 Key: HADOOP-19139
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19139
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>            Reporter: Pranav Saxena
>            Assignee: Pranav Saxena
>            Priority: Major
>              Labels: pull-request-available
>
> Read API gives contentLen and etag of the path. This information would be 
> used in future calls on that inputStream. Prior information of eTag is of not 
> much importance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-19139) [ABFS]: No GetPathStatus call for opening AbfsInputStream

Reply via email to