Re: Help building pyarrow 4.0.1

2021-11-11 Thread Alenka Frim
Hi Mathieu, The error in your case is new to me but still similar to what I was getting. I tried - downgrading Python from 3.10. to 3.9 (building latest Arrow release), - updating Xcode and the Command Line Tools, - added -DARROW_INSTALL_NAME_RPATH=OFF to cmake. There is a ticket for cmake in Jira

[Documentation] New Contributor's Guide - Survey

2021-11-18 Thread Alenka Frim
Dear Arrow user, As a beginner open source contributor to Apache Arrow project I am working on a New Contributor’s Guide that would help others in the process of making their first PR. I made a very short survey to help me with the information I should include. I invite you to share you opini

Re: [Documentation] New Contributor's Guide - Survey

2021-11-28 Thread Alenka Frim
Thu, Nov 18, 2021 at 4:26 PM Alenka Frim wrote: > Dear Arrow user, > > As a beginner open source contributor to Apache Arrow project I am working > on a New Contributor’s Guide that would help others in the process of > making their first PR. > > I made a very short surve

Re: PyArrow + GCSFS not loading data when using filters...

2022-01-12 Thread Alenka Frim
Hello Kelton, playing around with the files you referenced and with the code you added the following can be observed and improved to make the code work: *1) Defining the partitioning of a dataset* When running *data.files* on your dataset shows that the files are partitioned according to the *hi

Re: [Documentation] New Contributor's Guide - Survey

2022-02-11 Thread Alenka Frim
nt by submitting a survey response and to all that helped with the work! Feel free to propose any changes to the documentation by submitting a PR ;) Best, Alenka On Mon, Nov 29, 2021 at 8:27 AM Alenka Frim wrote: > Hello everybody! > > Just a kind reminder to submit your opinion on the Surv

Re: RPATH and Brew on MacOS

2022-02-21 Thread Alenka Frim
Hi Will, maybe it is connected to https://github.com/apache/arrow/pull/11602. Alenka On Sat, Feb 19, 2022 at 8:18 AM James Duong wrote: > Hi Will, > > Is your goal to have libarrow be loaded from a relative path of > libparquet? I've found that @loader_path works well for this and is close > t

Re: FIleNotFound Error on root directory with fsspec partitioned dataset

2022-02-21 Thread Alenka Frim
Hi Kelton, I can reproduce the same error if I try to load all the data with data = ds.dataset("global-radiosondes/hires-sonde", filesystem=fs) or data = pq.ParquetDataset("global-radiosondes/hires-sonde", filesystem=fs, use_legacy_dataset=False). Could you share your code, where you read a speci

Re: [Parquet][Python, C++]Seg fault using new dataset api; filters not work with old dataset api

2022-04-14 Thread Alenka Frim
Hi Xinyu, The result parquet file can be read by Spark. But using ParquetDataset > with use_legacy_dataset=False will result in segmentation fault. Set > use_legacy_dataset=True works fine. > The new implementation does not support row_group_size. Can you try using max_rows_per_group together wit

Re: [Parquet][Python, C++]Seg fault using new dataset api; filters not work with old dataset api

2022-04-14 Thread Alenka Frim
> > I assume the new implementation is for reading? Like when writing a > Parquet file we can still change the row group size. The seg fault > comes from reading, where we do not need to pass in row group size as > parameters. > Oh sorry, I misunderstood! For the filtering case, yes filtering is

Re: [DISCUSS] Apache Arrow Meetup in Europe

2025-03-06 Thread Alenka Frim
+1 from me too, great idea - would definitely like to attend and help with organisation! V V čet., 6. mar. 2025 ob 17:31 je oseba Raúl Cumplido napisala: > +1, sounds like a great idea. I would definitely attend! > > El jue, 6 mar 2025 a las 16:28, Denny Lee () > escribió: > > > +1 (and would lo

Re: [Python] - serializing pyarrow filter expression

2023-03-16 Thread Alenka Frim via user
Hi Ishan, I do not think there is an option to specify compute expression with substrait at the moment. There is a future plan to get it supported in the C++: https://github.com/apache/arrow/issues/33985 and after we could bind it in Python and we could use that functionality in PyArrow also. Bes