udaysagar2177 opened a new pull request, #17871:
URL: https://github.com/apache/pinot/pull/17871

   ## Summary
   
   The Arrow plugin previously only had `ArrowMessageDecoder` for streaming 
ingestion (e.g. Kafka). This PR adds `ArrowRecordReader` implementing the 
`RecordReader` interface, enabling batch ingestion from Arrow IPC files, 
consistent with how other formats (Avro, JSON, ORC, Parquet, etc.) support both 
streaming and batch.
   
   **Changes:**
   - Add `ArrowRecordReader` that reads Arrow IPC files using 
`ArrowFileReader`, iterating row-by-row across batches
   - Add `ARROW` to `FileFormat` enum and register in `RecordReaderFactory`
   - Generalize `ArrowToGenericRowConverter` to accept `ArrowReader` (parent of 
both `ArrowStreamReader` and `ArrowFileReader`) instead of `ArrowStreamReader`
   - Make `convertSingleRow` public with a reuse overload for row recycling
   - Add `fieldsToRead` filtering support to `ArrowToGenericRowConverter`
   - Add `ArrowRecordReaderTest` extending `AbstractRecordReaderTest` (10,000 
random records across all Pinot field types, multi-batch writing, field 
filtering test)
   
   **Note:** Arrow IPC file format requires seekable channels, so gzip 
compression is not supported (test overridden to skip).
   
   ## Test plan
   - [x] `ArrowRecordReaderTest.testRecordReader` - 10,000 random records with 
all Pinot SV/MV field types, verifies read + rewind
   - [x] `ArrowRecordReaderTest.testFieldsToReadFiltering` - verifies only 
requested fields are extracted
   - [x] `ArrowRecordReaderTest.testGzipRecordReader` - overridden (Arrow IPC 
doesn't support gzip)
   - [x] All 16 existing `ArrowMessageDecoderTest` tests pass (no regressions 
from converter changes)
   - [x] `RecordReaderFactoryTest` passes with new ARROW registration


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to