Hi Weston, Currently there are two filesystems interfaces in pyarrow, a legacy one in `pyarrow.filesystem` and a new one in `pyarrow.fs` (see https://issues.apache.org/jira/browse/ARROW-9645 and https://arrow.apache.org/docs/python/filesystems_deprecated.html, docs are still a bit scarce).
Based on your description, I assume you are using the "legacy" LocalFileSystem. In the new filesystems, however, I think there is already the feature you are looking for, called "SubTreeFileSystem", created from a base directory and other filesystem instance. Best, Joris On Tue, 25 Aug 2020 at 23:38, Weston Pace <weston.p...@gmail.com> wrote: > I created a RelativeFileSystem that extended FileSystem and proxied > calls to a LocalFileSystem instance. This filesystem allowed me to > specify a base directory and then all paths were resolved relative to > that base directory (so fs.open("foo.parquet") became > self.target.open("C:\Datadir\foo.parquet"). > > However, because it was not a LocalFileSystem instance it was treated > differently by arrow at: > > > https://github.com/apache/arrow/blob/de8bfddae8704a998d910f2a84bd1e2f7bd934d1/python/pyarrow/parquet.py#L1043 > > Instead of using a native file reader the open method was called and > it read from a python file object. Besides the performance impact I > also received a "ResourceWarning: unclosed file" when running `read` > on a dataset piece. > > To avoid these warnings I changed RelativeFileSystem to subclass > LocalFileSystem instead of proxy to it. > > Is this the recommended approach for reading local files? If so I can > probably add something to the filesystems docs. Part of the problem > is that the undesired behavior can be difficult to detect. Had I not > been running with warnings on I would not have noticed the > ResourceWarning or, if that ResourceWarning is patched away, I > probably would never have noticed it until I realized my performance > dropped for some reason. >