I'm in favor of using a submodule for testing data files to avoid bloating the git repository. So far this hasn't been too painful with the Parquet test data files
On Sun, Jan 27, 2019 at 10:36 AM Andy Grove <andygrov...@gmail.com> wrote: > > That's a fair point about not needing a submodule... I was thinking about > converting some of the shared parquet files to CSV to help with testing > DataFusion. I guess I can just put them there for now and if other > implementations are interested we can just move them to a shared directory. > > Thanks, > > Andy. > > On Sun, Jan 27, 2019 at 9:31 AM Antoine Pitrou <anto...@python.org> wrote: > > > > > Well, CSV isn't a standard like Parquet is, meaning each implementation > > can choose their own middle grounds and interpretations. > > > > Also, the parquet-testing submodule exists because Parquet > > implementations are spread accross different repositories. If we want a > > common location for CSV files accross Arrow implementations, we don't > > really need a submodule ;-) > > > > Regards > > > > Antoine. > > > > > > Le 27/01/2019 à 17:28, Andy Grove a écrit : > > > I like the fact that we have a parquet-testing submodule that is shared > > > across implementations. It there any interest in having an equivalent > > for > > > CSV files? > > > > > > Andy. > > > > >