RE: RE: Re: Pyarrow minimal build for lambda layers

2023-11-29 Thread Steven Handerson
Oh - important detail, the directions that I was following are in https://arrow.apache.org/docs/developers/python.html . Steve On 2023/11/27 18:38:55 Akshara Sadheesh wrote: > Thank you so much for your reply Raul! So I did run the build using the > build_venv.sh file. The issue was I think I d

RE: RE: Re: Pyarrow minimal build for lambda layers

2023-11-29 Thread Steven Handerson
I have a similar problem - using a conda build, following the pyarrow build instructions. It works fine on the build machine, but building and installing the wheel ends up missing some libraries (libutf8, for starters). I’m kind of a newbie in this regard, could someone spell out how you can do

Re: [parquet][Iceberg] Should hive partition keys appear as corresponding columns in the file

2023-11-29 Thread Fokko Driesprong
Hey Haocheng, The partitioning in Iceberg is logical, instead of physical. The directory structure (/dt=2021-03-01/) is there just for convenience, but Iceberg does not rely on the actual directory structure. The partition information is stored in the metadata layer (manifests and manifest list).

Re: [parquet][Iceberg] Should hive partition keys appear as corresponding columns in the file

2023-11-29 Thread Micah Kornfield
I don't think there is a strong consensus here unfortunately and different people might want different things, and there is the issue with legacy systems. As another example, whether to include partition columns in data files is a configuration option in Hudi. If I was creating new data from scra

[parquet][Iceberg] Should hive partition keys appear as corresponding columns in the file

2023-11-29 Thread Haocheng Liu
Hi community, I want to solicit people's thoughts on the different toolchain behaviors of whether the hive partition keys should appear as columns in the underlying parquet file. Say I have data layout as: //myTable/dt=2019-10-31/lang=en/0.parquet //myTable/dt=2018-10-31/lang=fr/1.parquet IIRC