Le 10/05/2022 à 04:36, Andrew Piskorski a écrit :
On Mon, May 09, 2022 at 07:00:47PM +0200, Antoine Pitrou wrote:
Generally, the Arrow IPC file/stream formats are designed for large
data. If you have many very small files you might try to rethink how you
store your data on disk.
Ah. Is thi
Do we have to give it a particular name at all? Most of the C++
subcomponents simply have a description ("the datasets layer", etc.).
There are probably more important topics to spend our time on.
Regards
Antoine.
Le 09/05/2022 à 21:44, Ian Cook a écrit :
Reflecting on this discussion si
Thanks for the feedback, Krisztián! Lots of good insights on the current
release process.
I can see that you were already taking actions towards the process I was
describing. I will write some notes to update the current process to
reflect that on the Release documentation [1] and will share.
I s
Le 10/05/2022 à 13:27, Raul Cumplido a écrit :
I still think there is some value in standardising the "feature freeze" on
new release candidates once a first release candidate has been created and
only add required fixes for the follow up RCs. What I would like to avoid
with that is rushing bi
I think it is important to give the C++ execution engine a separate name,
as has been said by Wes and Jacques. Two reason for that IMO:
1. The more things we lend the Arrow brand outside of the format, the
harder it becomes for outside users to grasp what "Arrow" is.
2. Giving the C++ engine a n
Hello,
I ran into a problem with running PyArrow that I locally built. The build
worked fine (or so it seems) but then the testing procedure had a failure due
to not being able to load pyarrow._dataset, which I manually confirmed. I'd
appreciate any guidance on how to fix this error.
Below are
I think you need to add:
export PYARROW_WITH_DATASET=1
On Tue, May 10, 2022 at 7:07 AM Yaron Gvili wrote:
>
> Hello,
>
> I ran into a problem with running PyArrow that I locally built. The build
> worked fine (or so it seems) but then the testing procedure had a failure due
> to not being
Hi Yaron,
Does `import pyarrow` work?
On Tue, May 10, 2022 at 1:07 PM Yaron Gvili wrote:
> Hello,
>
> I ran into a problem with running PyArrow that I locally built. The build
> worked fine (or so it seems) but then the testing procedure had a failure
> due to not being able to load pyarrow._da
That said, tests which require should be skipped gracefully instead of
failing.
Le 10/05/2022 à 19:13, Weston Pace a écrit :
I think you need to add:
export PYARROW_WITH_DATASET=1
On Tue, May 10, 2022 at 7:07 AM Yaron Gvili wrote:
Hello,
I ran into a problem with running PyArrow
I am planning on cutting a release candidate later this week.
I have 2 PRs related to release prep work that I would like to get merged
prior to that:
- https://github.com/apache/arrow-datafusion/pull/2479
- https://github.com/apache/arrow-datafusion/pull/2495
I also have these PRs for new featu
Le 10/05/2022 à 19:16, Antoine Pitrou a écrit :
That said, tests which require should be skipped gracefully instead of
failing.
Oops... some words got swallowed:
tests which require *the dataset module* should be skipped gracefully
instead of failing.
Le 10/05/2022 à 19:13, Weston P
> Does `import pyarrow` work?
Yes. Also, all but one unit test succeeded:
=
short test summary info
==
FAILED pyarrow/t
For discussion I've put up https://github.com/apache/arrow/pull/13115 to add
this for the C data/stream interfaces.
On Mon, May 9, 2022, at 15:42, Antoine Pitrou wrote:
> Le 09/05/2022 à 20:28, Tomek Drabas a écrit :
>> I am new to this board so please, let me know if any of this doesn't make
>>
> I think you need to add:
>
> export PYARROW_WITH_DATASET=1
This worked, thanks. I think the documentation [1] may need be fixed to clarify
that DATASET is also an optional component.
[1] https://arrow.apache.org/docs/developers/python.html#build-and-test
Yaron.
_
A couple of other names derivative from the Ace- vibe:
Acero ("steel" or sometimes "sword" in Spanish but apparently also
"maple" in Italian). Also rhymes with Arrow but not sure if this is
good or bad
Acera ("pavement" or "sidewalk" in Spanish)
On Tue, May 10, 2022 at 9:53 AM Will Jones wrote:
Hello!
I am trying to fix C++ code style & lint for my PR.
Currently I am running "archery lint --cpplint --clang-format --clang-tidy
--fix" and encountered 2 issues:
1. File
/home/icexelloss/workspace/arrow/cpp/src/arrow/compute/exec/concurrent_bounded_queue.h
failed C++/CLI lint check: Uses
"Acero" has a nice ring to it. Almost as if you said "ACE Arrow" really
fast. And maybe the steel / iron meaning gives a sort of close-to-metal
vibes (similar to what Rust's name invokes), though I'm not a Spanish
speaker with a meaningful understanding of the words' connotations.
On Tue, May 10,
As a Spanish speaking person, I cannot think of a misleading or bad
connotation for the word "acero". The word is generally used to refer to
either steel materials (actual definition) or as a simile/metaphor
comparing to something very strong. We can view this as a self-laud on the
robust and power
I like Acero too. I like it because (as a non-Spanish speaker, at least) it
has no obvious meaning or connotation and once the community starts to use
this name for the project, that is the meaning that it will come to have.
Just like Gandiva (a word I was not familiar with when I learned about the
1. You are not allowed to include in any public header file.
It has something to do with Windows (I forget the details). If you
can move all use of mutex into the implementation that works.
Sometimes we have to use the pimpl pattern to make this happen.
Another alternative is to include "arrow/ut
If you are reading this as a dataset, and you are not partitioning on
your disk, then it is going to read the entire content of every file,
because there is no statistics-based partitioning currently enabled
with IPC files.
If you have some kind of filter, and you can partition your data on
the sa
Hi,
In
"Re: [DISC][Release] More control on Release Candidates commits" on Tue, 10
May 2022 13:27:09 +0200,
Raul Cumplido wrote:
> I still think there is some value in standardising the "feature freeze" on
> new release candidates once a first release candidate has been created and
> only
@Li appreciate your thoughts on these important pieces. Let me walk through
one by one.
> Numeric code written in numpy/pandas + some relational logic (e.g.,
> np.where to select rows). People like this type of UDFs because they are
> very familiar with pandas/numpy and can be immediately product
23 matches
Mail list logo