pan3793 commented on code in PR #50765: URL: https://github.com/apache/spark/pull/50765#discussion_r2068334750
########## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java: ########## @@ -89,24 +90,29 @@ public abstract class SpecificParquetRecordReaderBase<T> extends RecordReader<Vo @Override public void initialize(InputSplit inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException { - initialize(inputSplit, taskAttemptContext, Option.empty()); + initialize(inputSplit, taskAttemptContext, Option.empty(), Option.empty(), Option.empty()); } public void initialize( InputSplit inputSplit, TaskAttemptContext taskAttemptContext, + Option<HadoopInputFile> inputFile, + Option<SeekableInputStream> inputStream, Option<ParquetMetadata> fileFooter) throws IOException, InterruptedException { Configuration configuration = taskAttemptContext.getConfiguration(); FileSplit split = (FileSplit) inputSplit; this.file = split.getPath(); + ParquetReadOptions options = HadoopReadOptions + .builder(configuration, file) + .withRange(split.getStart(), split.getStart() + split.getLength()) + .build(); ParquetFileReader fileReader; - if (fileFooter.isDefined()) { - fileReader = new ParquetFileReader(configuration, file, fileFooter.get()); Review Comment: This constructor internally calls `HadoopInputFile.fromPath(file, configuration)`, which produces an unnecessary `GetFileInfo` RPC ``` public static HadoopInputFile fromPath(Path path, Configuration conf) throws IOException { FileSystem fs = path.getFileSystem(conf); return new HadoopInputFile(fs, fs.getFileStatus(path), conf); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org