Hi All, I have data partitioned by year=yyyy/month=mm/day=dd, what is the best way to get two months of data from a given year (let's say June and July)?
Two ways I can think of: 1. use unionAll df1 = sqc.read.parquet('xxx/year=2015/month=6') df2 = sqc.read.parquet('xxx/year=2015/month=7') df = df1.unionAll(df2) 2. use filter after load the whole year df = sqc.read.parquet('xxx/year=2015/').filter('month in (6, 7)') Which of the above is better? Or are there better ways to handle this? Thank you, Wei