udaysagar2177 opened a new pull request, #17871: URL: https://github.com/apache/pinot/pull/17871
## Summary The Arrow plugin previously only had `ArrowMessageDecoder` for streaming ingestion (e.g. Kafka). This PR adds `ArrowRecordReader` implementing the `RecordReader` interface, enabling batch ingestion from Arrow IPC files, consistent with how other formats (Avro, JSON, ORC, Parquet, etc.) support both streaming and batch. **Changes:** - Add `ArrowRecordReader` that reads Arrow IPC files using `ArrowFileReader`, iterating row-by-row across batches - Add `ARROW` to `FileFormat` enum and register in `RecordReaderFactory` - Generalize `ArrowToGenericRowConverter` to accept `ArrowReader` (parent of both `ArrowStreamReader` and `ArrowFileReader`) instead of `ArrowStreamReader` - Make `convertSingleRow` public with a reuse overload for row recycling - Add `fieldsToRead` filtering support to `ArrowToGenericRowConverter` - Add `ArrowRecordReaderTest` extending `AbstractRecordReaderTest` (10,000 random records across all Pinot field types, multi-batch writing, field filtering test) **Note:** Arrow IPC file format requires seekable channels, so gzip compression is not supported (test overridden to skip). ## Test plan - [x] `ArrowRecordReaderTest.testRecordReader` - 10,000 random records with all Pinot SV/MV field types, verifies read + rewind - [x] `ArrowRecordReaderTest.testFieldsToReadFiltering` - verifies only requested fields are extracted - [x] `ArrowRecordReaderTest.testGzipRecordReader` - overridden (Arrow IPC doesn't support gzip) - [x] All 16 existing `ArrowMessageDecoderTest` tests pass (no regressions from converter changes) - [x] `RecordReaderFactoryTest` passes with new ARROW registration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
