[ 
https://issues.apache.org/jira/browse/ARROW-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17662338#comment-17662338
 ] 

Rok Mihevc commented on ARROW-5317:
-----------------------------------

This issue has been migrated to [issue 
#16729|https://github.com/apache/arrow/issues/16729] on GitHub. Please see the 
[migration documentation|https://github.com/apache/arrow/issues/14542] for 
further details.

> [Rust] [Parquet] impl IntoIterator for SerializedFileReader
> -----------------------------------------------------------
>
>                 Key: ARROW-5317
>                 URL: https://issues.apache.org/jira/browse/ARROW-5317
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Fabio Silva
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This is a follow up to [https://github.com/apache/arrow/issues/4301].
> The current implementation of a row iterator *RowIter* borrows the 
> *FileReader*
>  which the user has to keep the file reader alive for as long as the iterator 
> is alive..
> And make is hard to iterate over multiple *FileReader* / *RowIter*..
> {code:java}
> fn main() {
>     let path1 = Path::new("path-to/1.snappy.parquet");
>     let path2 = Path::new("path-to/2.snappy.parquet");
>     let vec = vec![path1, path2];
>     let it = vec.iter()
>         .map(|p| {
>             File::open(p).unwrap()
>         })
>         .map(|f| {
>             SerializedFileReader::new(f).unwrap()
>         })
>         .flat_map(|reader| -> RowIter {
>             RowIter::from_file(None, &reader).unwrap()
> //|             |                        |
> //|             |                        `reader` is borrowed here
> //|             returns a value referencing data owned by the current function
>         })
>     ;
>     for r in it {
>         println!("{}", r);
>     }
> }
> {code}
> One solution could be to implement a row iterator that takes owners of the 
> reader.
> Perhaps implementing *std::iter::IntoIterator* for the *SerializedFileReader*
> {code:java}
> ....
> .map(|p| {
>     File::open(p).unwrap()
> })
> .map(|f| {
>     SerializedFileReader::new(f).unwrap()
> })
> .flat_map(|r| -> r.into_iter())
> ....
> {code}
>  
> Happy to put a PR out with this..
>  Please let me know if this makes sense and you guys already have some way of 
> doing this..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to