[ https://issues.apache.org/jira/browse/FLINK-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16047668#comment-16047668 ]
ASF GitHub Bot commented on FLINK-6776: --------------------------------------- Github user StefanRRichter commented on a diff in the pull request: https://github.com/apache/flink/pull/4019#discussion_r121632647 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopDataInputStream.java --- @@ -89,4 +99,14 @@ public long skip(long n) throws IOException { public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream() { return fsDataInputStream; } + + public void forceSeek(long seekPos) throws IOException { --- End diff -- I agree that doc wouldn't hurt. This class as a whole was rather undocumented, but it is also internal and user will only interact through`FSDataInputStream`, which is not exposing those methods. Can write something anyways :) > Use skip instead of seek for small forward repositioning in DFS streams > ----------------------------------------------------------------------- > > Key: FLINK-6776 > URL: https://issues.apache.org/jira/browse/FLINK-6776 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing > Reporter: Stefan Richter > Assignee: Stefan Richter > Priority: Minor > > Reading checkpoint meta data and finding key-groups in restores sometimes > require to seek in input streams. Currently, we always use a seek, even for > small position changes. As small true seeks are far more expensive than small > reads/skips, we should just skip over small gaps instead of performing the > seek. -- This message was sent by Atlassian JIRA (v6.4.14#64029)