Column/Partition Pruning

Russell Jurney Mon, 27 May 2019 18:27:16 -0700

Hello, I am wondering if pandas.read_parquet(engine='pyarrow') takes
advantage of Parquet by only loading the relevant columns and by using the
partition column(s) sub-directories if a partition column is included in
the load and then filtered on later? Looking at the code for
pandas.read_parquet it is not clear.


For example something like:

stocks_close_df = pd.read_parquet(
'data/v4.parquet',
columns=['DateTime', 'Close', 'Ticker'],
engine='pyarrow'
)

# Filter the data to just this ticker
stocks_close_df = stocks_close_df[stocks_close_df.Ticker == ticker][[
'DateTime', 'Close']]

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Column/Partition Pruning

Reply via email to