Hi Mathieu,
The error in your case is new to me but still similar to what I was getting.
I tried
- downgrading Python from 3.10. to 3.9 (building latest Arrow release),
- updating Xcode and the Command Line Tools,
- added -DARROW_INSTALL_NAME_RPATH=OFF to cmake.
There is a ticket for cmake in Jira
Dear Arrow user,
As a beginner open source contributor to Apache Arrow project I am working on a
New Contributor’s Guide that would help others in the process of making their
first PR.
I made a very short survey to help me with the information I should include. I
invite you to share you opini
Thu, Nov 18, 2021 at 4:26 PM Alenka Frim wrote:
> Dear Arrow user,
>
> As a beginner open source contributor to Apache Arrow project I am working
> on a New Contributor’s Guide that would help others in the process of
> making their first PR.
>
> I made a very short surve
Hello Kelton,
playing around with the files you referenced and with the code you added
the following can be observed and improved to make the code work:
*1) Defining the partitioning of a dataset*
When running *data.files* on your dataset shows that the files are
partitioned according to the *hi
nt by submitting a survey
response and to all that helped with the work!
Feel free to propose any changes to the documentation by submitting a PR ;)
Best,
Alenka
On Mon, Nov 29, 2021 at 8:27 AM Alenka Frim wrote:
> Hello everybody!
>
> Just a kind reminder to submit your opinion on the Surv
Hi Will,
maybe it is connected to https://github.com/apache/arrow/pull/11602.
Alenka
On Sat, Feb 19, 2022 at 8:18 AM James Duong wrote:
> Hi Will,
>
> Is your goal to have libarrow be loaded from a relative path of
> libparquet? I've found that @loader_path works well for this and is close
> t
Hi Kelton,
I can reproduce the same error if I try to load all the data with data =
ds.dataset("global-radiosondes/hires-sonde", filesystem=fs) or data =
pq.ParquetDataset("global-radiosondes/hires-sonde", filesystem=fs,
use_legacy_dataset=False).
Could you share your code, where you read a speci
Hi Xinyu,
The result parquet file can be read by Spark. But using ParquetDataset
> with use_legacy_dataset=False will result in segmentation fault. Set
> use_legacy_dataset=True works fine.
>
The new implementation does not support row_group_size.
Can you try using max_rows_per_group together wit
>
> I assume the new implementation is for reading? Like when writing a
> Parquet file we can still change the row group size. The seg fault
> comes from reading, where we do not need to pass in row group size as
> parameters.
>
Oh sorry, I misunderstood!
For the filtering case, yes filtering is
+1 from me too, great idea - would definitely like to attend and help
with organisation!
V V čet., 6. mar. 2025 ob 17:31 je oseba Raúl Cumplido
napisala:
> +1, sounds like a great idea. I would definitely attend!
>
> El jue, 6 mar 2025 a las 16:28, Denny Lee ()
> escribió:
>
> > +1 (and would lo
Hi Ishan,
I do not think there is an option to specify compute expression
with substrait at the moment.
There is a future plan to get it supported in the C++:
https://github.com/apache/arrow/issues/33985 and after we could bind it in
Python and we could use that functionality in PyArrow also.
Bes
11 matches
Mail list logo