Actually my workaround (extending LocalFileSystem) does not work since `open` is never called in this case and the path is not normalized to the base directory.
On Tue, Aug 25, 2020 at 11:38 AM Weston Pace <[email protected]> wrote: > > I created a RelativeFileSystem that extended FileSystem and proxied > calls to a LocalFileSystem instance. This filesystem allowed me to > specify a base directory and then all paths were resolved relative to > that base directory (so fs.open("foo.parquet") became > self.target.open("C:\Datadir\foo.parquet"). > > However, because it was not a LocalFileSystem instance it was treated > differently by arrow at: > > https://github.com/apache/arrow/blob/de8bfddae8704a998d910f2a84bd1e2f7bd934d1/python/pyarrow/parquet.py#L1043 > > Instead of using a native file reader the open method was called and > it read from a python file object. Besides the performance impact I > also received a "ResourceWarning: unclosed file" when running `read` > on a dataset piece. > > To avoid these warnings I changed RelativeFileSystem to subclass > LocalFileSystem instead of proxy to it. > > Is this the recommended approach for reading local files? If so I can > probably add something to the filesystems docs. Part of the problem > is that the undesired behavior can be difficult to detect. Had I not > been running with warnings on I would not have noticed the > ResourceWarning or, if that ResourceWarning is patched away, I > probably would never have noticed it until I realized my performance > dropped for some reason.
