[ 
https://issues.apache.org/jira/browse/HADOOP-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012186#comment-18012186
 ] 

Steve Loughran commented on HADOOP-19641:
-----------------------------------------

are you using the openFile seek policy as suggested? parquet will tell you when 
its a parquet file and its read policy is common: 8 byte footer, reall footer, 
rowgroups. 

> ABFS: [ReadAheadV2] First Read should bypass ReadBufferManager
> --------------------------------------------------------------
>
>                 Key: HADOOP-19641
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19641
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.4.1
>            Reporter: Anuj Modi
>            Assignee: Anuj Modi
>            Priority: Major
>              Labels: Performance
>
> We have observed this across multiple workload runs that when we start 
> reading data from input stream. The first read which came to input stream has 
> to be read synchronously even if we trigger prefetch request for that 
> particular offset. Most of the times we end up doing extra work of checking 
> if the prefetch is trigerred, removing prefetch from the pending queue and go 
> ahead to do a direct remote read in workload thread itself.
> To avoid all this overhead, we will always bypass read ahead for the very 
> first read of each input stream and trigger read aheads for second read 
> onwards.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to