Re: Creating filesystems that read local files

2020-08-26 Thread Weston Pace
Ok. I think I have it figured out as: num_rows = 0 dataset = pa.dataset.dataset(short_files, filesystem=subtree_filesystem) for fragment in dataset.get_fragments(): fragment.ensure_complete_metadata() if fragment.row_groups: for row_group in fragment.row_groups: num_ro

Re: Creating filesystems that read local files

2020-08-26 Thread Weston Pace
Thanks Joris / Antoine, It appears I will have to learn the new datasets API. I can confirm that SubTreeFileSystem is working for me. In case there is still interest here is the code I had from before reproducing the issue: https://gist.github.com/westonpace/4107c1c492cdd78d611595d43e72964d It

Re: Creating filesystems that read local files

2020-08-26 Thread Joris Van den Bossche
Hi Weston, Currently there are two filesystems interfaces in pyarrow, a legacy one in `pyarrow.filesystem` and a new one in `pyarrow.fs` (see https://issues.apache.org/jira/browse/ARROW-9645 and https://arrow.apache.org/docs/python/filesystems_deprecated.html, docs are still a bit scarce). Based

Re: Creating filesystems that read local files

2020-08-26 Thread Antoine Pitrou
Hi Weston, Can you show the code for your experiment? (or post equivalent code) Regards Antoine. Le 25/08/2020 à 23:38, Weston Pace a écrit : > I created a RelativeFileSystem that extended FileSystem and proxied > calls to a LocalFileSystem instance. This filesystem allowed me to > specify

Re: Creating filesystems that read local files

2020-08-25 Thread Weston Pace
Actually my workaround (extending LocalFileSystem) does not work since `open` is never called in this case and the path is not normalized to the base directory. On Tue, Aug 25, 2020 at 11:38 AM Weston Pace wrote: > > I created a RelativeFileSystem that extended FileSystem and proxied > calls to a

Creating filesystems that read local files

2020-08-25 Thread Weston Pace
I created a RelativeFileSystem that extended FileSystem and proxied calls to a LocalFileSystem instance. This filesystem allowed me to specify a base directory and then all paths were resolved relative to that base directory (so fs.open("foo.parquet") became self.target.open("C:\Datadir\foo.parque