Congratulations Alenka!
On Tue, Jul 1, 2025 at 8:52 AM Bryce Mecum wrote:
> Congrats Alenka! Thanks for all you do.
>
> On Tue, Jul 1, 2025 at 12:38 AM Raúl Cumplido wrote:
> >
> > The Project Management Committee (PMC) for Apache Arrow has invited
> Alenka
> > Frim to become a PMC member and w
+1 (binding)
On Thu, May 1, 2025 at 10:14 PM Dewey Dunnington
wrote:
> +1 (binding)!
>
> On Thu, May 1, 2025 at 10:42 PM Ian Cook wrote:
>
> > +1 (binding)
> >
> > Ian
> >
> > On Thu, May 1, 2025 at 9:51 PM David Li wrote:
> >
> > > +1 (binding)
> > >
> > > On Fri, May 2, 2025, at 08:00, Joel
+1 from me, assuming this is acceptable to domoritz / trxcllnt. I feel we
have struggled to find maintainers for JS (outside of a few dedicated and
extremely helpful ones).
Ideally (perhaps idealistically), separating the code into its own
repository will help reduce the barrier for those who wan
I've written a draft at [1] but for simplicity's sake I will include the
text of the proposal inline below.
[1] https://github.com/westonpace/arrow/tree/feat/turtle-extension-type
TURTLE
==
* Extension name: ``arrow.turtle``.
* The storage type of the extension is ``Struct`` where the struc
First, the subject of this email is IPC, which is confusing. From the
discussion it sounds like we are primarily talking about FFI. It sounds
like the options here are:
1. Silently realign unless user opts out of realignment
The user will not get any runtime errors (easier to use) but a method
Congrats Rok! Glad to have you here!
On Thu, Mar 20, 2025 at 5:18 AM Rok Mihevc wrote:
> Thank you! It's a pleasure working with you all!
>
> Best,
> Rok
>
> On Thu, Mar 20, 2025 at 12:06 PM Alenka Frim
> wrote:
>
> > Yay! Congratulations Rok, well deserved!!
> >
> > V čet., 20. mar. 2025, 04:
Congrats Ian!
On Fri, Mar 21, 2025 at 12:03 PM Saurabh Singh
wrote:
> Congratulations Ian!
>
> On Fri, 21 Mar 2025 at 19:28, Felipe Oliveira Carvalho <
> felipe...@gmail.com>
> wrote:
>
> > Congrats! 🚀
> >
> > On Fri, Mar 21, 2025 at 7:23 AM Nic Crane wrote:
> >
> > > Congrats!
> > >
> > > On T
Congrats Matthjis!
On Fri, Mar 21, 2025 at 1:51 PM Andrew Lamb wrote:
> Hi,
>
> On behalf of the Arrow PMC, I'm happy to announce that Matthijs Brobbel
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> Andrew
>
Congrats Jacob!
On Mon, Mar 17, 2025 at 5:04 AM Neal Richardson
wrote:
> Congratulations!
>
> On Mon, Mar 17, 2025 at 7:00 AM Ian Cook wrote:
>
> > Congratulations Jacob!
> >
> > On Mon, Mar 17, 2025 at 05:16 Raúl Cumplido wrote:
> >
> > > Congratulations Jacob!
> > >
> > > El lun, 17 mar 2025
+1
> A possible reason for hesitation is that it provides us yet another
stream that requires maintainer attention
I had been lukewarm on discussions in the past partly because of this and
partly because they cannot be "closed" and grow forever. However, I have
since learned that the latter conc
Congrats and welcome JB!
On Mon, Mar 10, 2025 at 12:42 PM Bryce Mecum wrote:
> Congrats JB. And welcome.
>
> On Sun, Mar 9, 2025 at 11:05 PM Sutou Kouhei wrote:
> >
> > Hi,
> >
> > On behalf of the Arrow PMC, I'm happy to announce that
> > Jean-Baptiste Onofré has accepted an invitation to beco
+1 (binding) Verified on aarch64 Ubuntu 24.04
On Thu, Feb 27, 2025 at 5:00 AM Xuanwo wrote:
> +1 non-binding
>
> Tested locally on archlinux x86_64.
>
> On Thu, Feb 27, 2025, at 20:36, Raphael Taylor-Davies wrote:
> > +1 (binding)
> >
> > Verified on x86_64 GNU/Linux
> >
> > On 27/02/2025 12:23,
Congrats Bryce!
On Wed, Feb 5, 2025 at 8:35 PM Saurabh Singh wrote:
> Congratulations Bryce.
>
> On Thu, 6 Feb 2025 at 07:41, Gang Wu wrote:
>
> > Congrats Bryce!
> >
> > On Thu, Feb 6, 2025 at 9:57 AM Ruoxi Sun wrote:
> >
> > > Congrats Bryce, well deserved!
> > >
> > >
> > > *Regards,*
> > >
Congratulations Ed!
On Wed, Jan 29, 2025 at 2:20 AM Andrew Lamb wrote:
> On behalf of the Arrow PMC, I'm happy to announce that Ed Seidl
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> Andrew
>
Congratulations!
On Tue, Dec 3, 2024, 3:21 PM Ian Cook wrote:
> Congratulations and thanks for all your great work—not just on Arrow but on
> so many parts of the surrounding ecosystem!
>
> On Tue, Dec 3, 2024 at 6:15 PM David Li wrote:
>
> > Congrats!
> >
> > On Wed, Dec 4, 2024, at 07:38, Rok
> Admittedly, I would like to know why this is being done in this fashion,
but that is tangential to my issue.
IIRC, this is a limitation given to use by the AWS C++ SDK. See [1]. The
AWS C++ SDK has static state and they do not manage it with static local
variables. As a result, the initializa
Congratulations Laurent!
On Mon, Nov 25, 2024 at 4:28 AM wish maple wrote:
> Congrats!
>
> Best,
> Xuwei Fu
>
> David Li 于2024年11月25日周一 17:35写道:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Laurent Goujon has
> > accepted an invitation to become a committer on Apache Arrow. Welc
Congratulations!
On Mon, Nov 18, 2024, 7:13 PM Jacob Wujciak wrote:
> Congratulations and welcome Adam!
>
> Am Di., 19. Nov. 2024 um 03:31 Uhr schrieb wish maple <
> maplewish...@gmail.com>:
> >
> > Congrets Adam!
> >
> > Best,
> > Xuwei Fu
> >
> > Sutou Kouhei 于2024年11月19日周二 08:31写道:
> >
> > >
Congrats Neal!
On Wed, Oct 30, 2024, 4:46 AM Raúl Cumplido wrote:
> Thanks Andy for your work during last year and thanks Neal for stepping up!
>
> Raúl
>
> El mié, 30 oct 2024 a las 12:28, Andrew Lamb ()
> escribió:
> >
> > I am pleased to announce that the Arrow Project has a new PMC chair and
On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has
accepted an invitation to become a committer on Apache Arrow. Welcome,
and thank you for your contributions!
> Do you folks believe Duckdb and Datafusion (latter being similar to spark
sql) will be an overkill?
No, I don't believe it would be overkill.
I also wouldn't compare either one to Spark SQL. Spark SQL is meant to be
a distributed query engine that typically requires a cluster of some sort
to o
> Currently, it is not
> possible to assign the Implicit ordering in scan node. Such option has
> been added in another nodes[0]. This problem is mentioned here [1]. I
> have started to work on it [2] but I am unsure how to move forward
> because I did not fine any clear roadmap about ordering in g
Congratulations Will
On Tue, Oct 1, 2024, 2:25 PM Bryce Mecum wrote:
> Congrats Will!
>
> On Tue, Oct 1, 2024 at 9:55 AM Dewey Dunnington
> wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Will Wyd has
> > accepted an invitation to become a committer on Apache Arrow. Welcom
It also seems that two variations of the variant encoding are being
discussed. The original spec, currently housed in Spark, creates a variant
array in row-major order, that is, each element in the array, is contained
contiguously. So, if you have objects like `{"a": 7, "b": 3}` then the
values f
Notes from the meetup:
https://docs.google.com/document/d/1g0_oiEE0GPQP24LAP3Z8pGSoWnw6C3bsr5If8K7c_Xw/edit?usp=sharing
Thanks to Bryce for taking notes!
On Mon, Aug 12, 2024 at 6:15 AM Weston Pace wrote:
> The exact location has been updated in the doc. Looking forward to seeing
> s
The exact location has been updated in the doc. Looking forward to seeing
some of you tomorrow.
On Thu, Jul 18, 2024 at 11:41 AM Weston Pace wrote:
> I'd like to announce an Arrow Meetup on August 13th, 2024 from 5:30PM to
> 7:30PM. Details can be found at [1]. All are welcome.
+1 for getting some kind of async implementation into the spec. I have
proposed a few alternative approaches in the PR.
On Fri, Aug 9, 2024 at 1:18 PM Matt Topol wrote:
> Hello All, I'd like to discuss the potential addition of an
> asynchronous-oriented version of the C Data Stream interface.
+1 (binding)
Looked through the spec & C++/python PRs.
On Mon, Aug 5, 2024 at 7:41 AM Ian Cook wrote:
> +1 (non-binding)
>
> I reviewed the spec addition.
>
> On Mon, Aug 5, 2024 at 3:37 PM Antoine Pitrou wrote:
>
> >
> > Binding +1 (but posted one minor comment on the format PR).
> >
> > Than
+1 as well. 32 bit keys were chosen because the expectation was that
hashtable spilling would come along soon. Since it didn't, I think it's a
good idea to use 64-bit keys until spilling is added.
On Mon, Aug 5, 2024 at 6:05 AM Antoine Pitrou wrote:
>
> I don't have any concrete data to test t
+1 (binding)
On Wed, Jul 24, 2024 at 8:01 AM Dane Pitkin wrote:
> +1 (non-binding)
>
> I reviewed the spec and Java implementation.
>
> On Wed, Jul 24, 2024 at 10:37 AM Ian Cook wrote:
>
> > +1 (non-binding)
> >
> > I reviewed the spec additions.
> >
> > Ian
> >
> > On Wed, Jul 24, 2024 at 10:2
I'd like to announce an Arrow Meetup on August 13th, 2024 from 5:30PM to
7:30PM. Details can be found at [1]. All are welcome.
We will be discussing what’s going on in the Arrow community and what
community members have planned or would like to see in the coming years.
If you think you can make
ng a coherent abstraction extremely difficult.
> >
>
> > Iceberg also took a similar approach with its File IO abstraction [2].
> >
>
> > [1]:
> >
> https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface
> > [2]: https://ta
> The markers are necessary to offer file system semantics on top of object
> stores. You will get a ton of subtle bugs otherwise.
Yes, object stores and filesystems are different. If you expect your
filesystem to act like a filesystem then these things need to be done in
order to avoid these bug
turn pa.list_(pa_type)
> > elif isinstance(obj, dict):
> > items = []
> > for k, child_obj in obj.items():
> > pa_type = _convert_to_arrow_type(k, child_obj)
> > items.append((k, pa_type))
> > return pa.struct(
+1 for empty stream/file as schema serialization. I have used this
approach myself on more than one occasion and it works well. It can even
be useful for transmitting schemas between different arrow-native libraries
in the same language (e.g. rust->rust) since it allows the different
libraries to
pyarrow uses c++ code internally. With the large files I would guess that
less than 0.1% of your pyarrow benchmark is spent in the python interpreter.
Given this fact, my main advice is to not worry too much about the
difference between pyarrow and carrow. A lot of work goes into pyarrow to
make
I've noticed that a number of Arrow people will be in Seattle in August. I
know there are a number of Arrow contributors that live in the Seattle area
as well. I'd like to organize a face-to-face meetup for the Arrow
community and have created an issue for discussion[1]. I welcome any
input, fee
No vote is required from an ASF perspective (this is not a release)
No vote is required from Arrow conventions (this is not a spec change and
does not impact more than one implementation)
I will send a message to the parquet ML to solicit feedback.
On Fri, May 24, 2024 at 8:22 AM Laurent Goujon
> I think what we are slowly converging on is the need for a spec to
> describe the encoding of Arrow array statistics as Arrow arrays.
This has been something that has always been desired for the Arrow IPC
format too.
My preference would be (apologies if this has been mentioned before):
- Agree
+1 (binding)
I also tested on Ubuntu 22.04 with USE_CONDA=1
dev/release/verify-release-candidate.sh 12 4
On Mon, May 20, 2024 at 5:20 AM David Li wrote:
> My vote: +1 (binding)
>
> Are any other PMC members able to take a look?
>
> On Fri, May 17, 2024, at 23:36, Dewey Dunnington wrote:
> > +1
Congrats Dane!
On Tue, May 7, 2024, 7:30 AM Nic Crane wrote:
> Congrats Dane, well deserved!
>
> On Tue, 7 May 2024 at 15:16, Gang Wu wrote:
> >
> > Congratulations Dane!
> >
> > Best,
> > Gang
> >
> > On Tue, May 7, 2024 at 10:12 PM Ian Cook wrote:
> >
> > > Congratulations Dane!
> > >
> > >
I think "inheritance" and "composition" are more concerns for
implementations than they are for spec (I could be wrong here).
So it seems that it would be sufficient to write the HLLSKETCH's canonical
definition as "this is an extension of the JSON logical type and supports
all the same storage ty
+1 (binding)
On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc wrote:
> Thanks for all the reviews and comments! I've included the big-endian
> requirement so the proposed language is now as below.
> I'll leave the vote open until after the May holiday.
>
> Rok
>
> UUID
>
>
> * Extension name: `ar
+1 (binding)
I agree we should be explicit about RFC-8259
On Mon, Apr 29, 2024 at 4:46 PM David Li wrote:
> +1 (binding)
>
> assuming we explicitly state RFC-8259
>
> On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote:
> > +1 (binding)
> >
> > On Mon, Apr 29, 2024 at 5:36 PM Ian Cook wrote:
> >
e from_arrow_device method which returns a cudf::table?
> > Should
> > > it error, or should it create a table with a single column?
> >
> > Presumably it should just error? I can see this being ambiguous if there
> > were an API that dynamically returned either
> *As per Apache Parquet Community Parquet V2 is not final yet so it is not
> official . They are advising not to use Parquet V2 for writing (though
code
> is available ) .*
This would be news to me. Parquet releases are listed (by the parquet
community) at [1]
The vote to release parquet 2.10 i
I tend to agree with Dewey. Using run-end-encoding to represent a scalar
is clever and would keep the c data interface more compact. Also, a struct
array is a superset of a record batch (assuming the metadata is kept in the
schema). Consumers should always be able to deserialize into a struct
ar
> people generally find use in Arrow schemas independently of concrete data.
This makes sense. I think we do want to encourage use of Arrow as a "type
system" even if there is no data involved. And, given that we cannot
easily change a field's data type property to "optional" it makes sense to
u
> may want an Other type to signal that it would fail if asked to provide
particular columns.
I interpret "would fail" to mean we are still speaking in some kind of
"planning stage" and not yet actually creating arrays. So I don't know
that this needs to be a data type. In other words, I see thi
Congratulations!
On Thu, Apr 11, 2024 at 9:12 AM wish maple wrote:
> Congrats!
>
> Best,
> Xuwei Fu
>
> Kevin Gurney 于2024年4月11日周四 23:22写道:
>
> > Congratulations, Sarah!! Well deserved!
> >
> > From: Jacob Wujciak
> > Sent: Thursday, April 11, 2024 11:14 AM
> >
> Probably major versions should match between C++ and PyArrow, but I guess
> we could have diverging minor and patch versions. Or at least patch
> versions given that
> a new minor version is usually cut for bug fixes too.
I believe even this would be difficult. Stable ABIs are very finicky in
C
Forgot link:
[1]
https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory
On Tue, Apr 2, 2024 at 11:38 AM Weston Pace wrote:
> Thanks for taking the time to address my concerns.
>
> > I've split the S3/HTTP URI flight pieces out into a separate document
was easier for iterating on the
> > protocol
> > > specification than a markdown PR for the Arrow documentation as I could
> > > more visually express things without a preview of the rendered
> markdown.
> > If
> > > it would get people to be more li
Wouldn't support for ADT require expressing more than 1 type id per
record? In other words, if `put` has type id 1, `delete` has type id 2,
and `erase` has type id 3 then there is no way to express something is (for
example) both type id 1 and type id 3 because you can only have one type id
per re
Congratulations Joel!
On Mon, Apr 1, 2024 at 1:16 PM Bryce Mecum wrote:
> Congrats, Joel!
>
> On Mon, Apr 1, 2024 at 6:59 AM Matt Topol wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Joel Lubinitsky
> has
> > accepted an invitation to become a committer on Apache Arrow. W
Thank you for bringing this up. I'm in favor of this. I think there are
several motivations but the main ones are:
1. Decoupling the versions will allow components to have no release, or
only a minor release, when there are no breaking changes
2. We do have some vote fatigue I think and we don
I'm sorry for the very late reply. Until yesterday I had no real concept
of what this was talking about and so I had stayed out.
I'm +0 only because it isn't clear what we are voting on. There is a word
doc with no implementation or PR. I think there could be an implementation
/ PR. For exampl
> I don't think there is currently a direct equivalent to
> `FlightRecordBatchStream` in the arrow javascript library, but you should
> be able to combine the data header + body and then read it using the
> `fromIPC` functions since it's just the Arrow IPC format
The RecordBatchReader[1] _should_
Congratulations!
On Sun, Mar 17, 2024, 8:01 PM Jacob Wujciak wrote:
> Congrats, well deserved!
>
> Nic Crane schrieb am Mo., 18. März 2024, 03:24:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has
> > accepted an invitation to become a committer on Apache Arrow. Welco
Felipe's points are good.
I don't know that you need to adapt the entire ADBC, it sort of depends
what you're after. I see what you've got right now as more of an SQL
abstraction layer. For example, similar to things like [1][2][3] (though 3
is more of an ORM). If you like the SQL interface tha
+1 (binding)
On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb wrote:
> Hello,
>
> As we have discussed[1][2] I would like to vote on the proposal to
> create a new Apache Top Level Project for DataFusion. The text of the
> proposed resolution and background document is copy/pasted below
>
> If the com
Congrats!
On Fri, Feb 16, 2024 at 3:07 AM Raúl Cumplido wrote:
> Congratulations!!
>
> El vie, 16 feb 2024 a las 12:02, Daniël Heres
> () escribió:
> >
> > Congratulations!
> >
> > On Fri, Feb 16, 2024, 11:33 Metehan Yıldırım <
> metehan.yildi...@synnada.ai>
> > wrote:
> >
> > > Congrats!
> > >
+1. There have been a few times I've attempted to run the verification
scripts. They have failed, but I was pretty confident it was a problem
with my environment mixing with the verification script and not a problem
in the software itself and I didn't take the time to debug the verification
scrip
I agree engines can use their own strategy. Requiring explicit casts is
probably ok as long as it is well documented but I think I lean slightly
towards implicitly falling back to the storage type. I do think think
people still shy away from extension types. Adding the extension type to
an impli
t; > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1
> > > [ ] +0
> > > [ ] -1 Keep Flight SQL experimental because...
> > >
> > > On Fri, Dec 8, 2023, at 13:37, Weston Pace wrote:
> > >> +1
> > &
+1
On Fri, Dec 8, 2023 at 10:33 AM Micah Kornfield
wrote:
> +1
>
> On Fri, Dec 8, 2023 at 10:29 AM Andrew Lamb wrote:
>
> > I agree it is time to "promote" ArrowFlightSQL to the same level as other
> > standards in Arrow
> >
> > Now that it is used widely (we use and count on it too at InfluxDa
Congratulations Felipe!
On Thu, Dec 7, 2023 at 8:38 AM wish maple wrote:
> Congrats Felipe!!!
>
> Best,
> Xuwei Fu
>
> Benjamin Kietzman 于2023年12月7日周四 23:42写道:
>
> > On behalf of the Arrow PMC, I'm happy to announce that Felipe Oliveira
> > Carvalho
> > has accepted an invitation to become a co
Congrats Andy!
On Mon, Nov 27, 2023, 7:31 PM wish maple wrote:
> Congrats Andy!
>
> Best,
> Xuwei Fu
>
> Andrew Lamb 于2023年11月27日周一 20:47写道:
>
> > I am pleased to announce that the Arrow Project has a new PMC chair and
> VP
> > as per our tradition of rotating the chair once a year. I have resi
Congratulations James
On Fri, Nov 17, 2023 at 6:07 AM Metehan Yıldırım <
metehan.yildi...@synnada.ai> wrote:
> Congratulations!
>
> On Thu, Nov 16, 2023 at 10:45 AM Sutou Kouhei wrote:
>
> > On behalf of the Arrow PMC, I'm happy to announce that James Duong
> > has accepted an invitation to beco
Congratulations Raúl!
On Mon, Nov 13, 2023 at 1:34 PM Ben Harkins
wrote:
> Congrats, Raúl!!
>
> On Mon, Nov 13, 2023 at 4:30 PM Bryce Mecum wrote:
>
> > Congrats, Raúl!
> >
> > On Mon, Nov 13, 2023 at 10:28 AM Andrew Lamb
> > wrote:
> > >
> > > The Project Management Committee (PMC) for Apache
+1 for the original proposal as well.
---
The (minor) problem I see with flags is that there isn't much point to this
feature if you are gating on a flag. I'm assuming the goal is what Dewey
originally mentioned which is making buffer calculations easier. However,
if you're gating the feature w
Is this buffer lengths buffer only present if the array type is Utf8View?
Or are you suggesting that other types might want to adopt this as well?
On Thu, Oct 26, 2023 at 10:00 AM Dewey Dunnington
wrote:
> > I expect C code to not be much longer then this :-)
>
> nanoarrow's buffer-length-calcul
Congratulations Xuwei!
On Mon, Oct 23, 2023 at 3:38 AM wish maple wrote:
> Thanks kou and every nice person in arrow community!
>
> I've learned a lot during learning and contribution to arrow and
> parquet. Thanks for everyone's help.
> Hope we can bring more fancy features in the future!
>
> B
> Of course, what I'm really asking for is to see how Lance would compare
;-)
> P.S. The second paper [2] also talks about ML workloads (in Section 5.8)
> and GPU performance (in Section 5.9). It also cites Lance as one of the
> future formats (in Section 5.6.2).
Disclaimer: I work for LanceDb an
Congratulations Jon!
On Sun, Oct 15, 2023, 1:51 PM Neal Richardson
wrote:
> Congratulations!
>
> On Sun, Oct 15, 2023 at 1:35 PM Bryce Mecum wrote:
>
> > Congratulations, Jon!
> >
> > On Sat, Oct 14, 2023 at 9:24 AM Andrew Lamb
> wrote:
> > >
> > > The Project Management Committee (PMC) for Ap
Congratulations!
On Sun, Oct 15, 2023, 8:51 AM Gang Wu wrote:
> Congrats!
>
> On Sun, Oct 15, 2023 at 10:49 PM David Li wrote:
>
> > Congrats & welcome Curt!
> >
> > On Sun, Oct 15, 2023, at 09:03, wish maple wrote:
> > > Congratulations!
> > >
> > > Raúl Cumplido 于2023年10月15日周日 20:48写道:
> > >
> I feel the broader question here is what is Arrow's intended use case -
interchange or execution
The line between interchange and execution is not always clear. For
example, I think we would like Arrow to be considered as a standard for UDF
libraries.
On Fri, Oct 6, 2023 at 7:34 AM Mark Raasve
In other languages I have seen this called "async local"[1][2][3]. I'm not
sure of any C++ implementations. Folly's fibers claim to have fiber-local
variables[4] but I can't find the actual code to use them. I can't seem to
find reference to the concept in boost's asio or cppcoro.
I've also see
+1
Thanks to all for the discussion and thanks to Ben for all of the great
work.
On Mon, Aug 21, 2023 at 9:16 AM wish maple wrote:
> +1 (non-binding)
>
> It would help a lot when processing UTF-8 related data!
>
> Xuwei
>
> Andrew Lamb 于2023年8月22日周二 00:11写道:
>
> > +1
> >
> > This is a great e
> But I can't figure out how to express "select struct field 0 from field 2
> of the original table where field 2 is a struct column"
>
> Any idea how the substrait message should look like for the above?
I believe it would be:
```
"expression": {
"selection": {
"direct_reference": {
> I would welcome a draft PR showcasing the changes necessary in the IPC
> format definition, and in the C Data Interface specification (no need to
> actually implement them for now :-)).
I've proposed something at [1].
> One sketch of an idea: define sets of types that we can call “kinds”**
> (e
size to 6/8/16, The system works fine. CPU is about
> > 100%. like 2.1.1
> > 2.2.2 for bucket_size to 32, the bug comes back. CPU halts at 550%.
> >
> > 2.3 io_thread_count to 8
> > 2.3.1 for bucket_size to 16, it fails somehow. After transferring
> > done, the m
goes up as well, to 800%.
> 1. Sometimes, the writing queue can overcome, CPU will goes down after
> the memory accumulated. The writing speed recoved and memory back to
> normal.
> 2. Sometimes, it can't. IOBPS goes down sharply, and CPU never goes
> down after that.
>
> How
You'll need to measure more but generally the bottleneck for writes is
usually going to be the disk itself. Unfortunately, standard OS buffered
I/O has some pretty negative behaviors in this case. First I'll describe
what I generally see happen (the last time I profiled this was a while back
but
on! Very helpful explanation.
>
> On Tue, Jul 25, 2023 at 6:41 PM Weston Pace wrote:
>
> > 1) As a rule of thumb I would probably prefer `async_scheduler`. It's
> more
> > feature rich and simpler to use and is meant to handle "long running"
> tasks
> > (
above it is probably ok to assume an implicit
ordering in many cases).
On Wed, Jul 26, 2023 at 8:18 AM Weston Pace wrote:
> > I think the key problem is that the input stream is unordered. The
> > input stream is a ArrowArrayStream imported from python side, and then
rator.
> > Also, I'd like to have a discuss on dataset scanner, is it produce a
> > stable sequence of record batches (as an implicit ordering) when the
> > underlying storage is not changed? For my situation, the downstream
> > executor may crush, then it would request to con
1) As a rule of thumb I would probably prefer `async_scheduler`. It's more
feature rich and simpler to use and is meant to handle "long running" tasks
(e.g. 10s-100s of ms or more).
The scheduler is a bit more complex and is intended for very fine-grained
scheduling. It's currently only used in
> Reading the source code of exec_plan.cc, DeclarationToReader called
> DeclarationToRecordBatchGenerator, which ignores the sequence_output
> parameter in SinkNodeOptions, also, it calls validate which should
> fail if the SinkNodeOptions honors the sequence_output. Then it seems
> that Declaratio
> Also, I don't understand why there are two versions of the hash table
> ("hashing32" and "hashing64" apparently). What's the rationale? How is
> the user meant to choose between them? Say a Substrait plan is being
> executed: which hashing variant is chosen and why?
It's not user-configurable.
Yes, those are the two main approaches to hashing in the code base that I
am aware of as well. I haven't seen any real concrete comparison and
benchmarks between the two. If collisions between NA and 0 are a problem
it would probably be ok to tweak the hash value of NA to something unique.
I susp
> I may be missing something, but why copy to *out_values++ instead of
> *out_values and add 32 to out_values afterwards? Otherwise I agree this is
> the way to go.
I agree with Jin. You should probably be incrementing `out` by 32 each
time `VisitValue` is called.
On Mon, Jul 17, 2023 at 6:38 AM
f challenges are in the forefront.
> I agree 100% that this sort of interoperability is what makes Arrow so
> compelling and something we should work very hard to preserve. This is
> the crux of my concern with standardising alternative layouts. I
> definitely hope that with time Arrow will penetr
Yes, that is correct.
What Substrait calls "groupings" is what is often referred to in SQL as
"grouping sets". These allow you to compute the same aggregates but group
by different criteria. Two very common ways of creating grouping sets are
"group by cube" and "group by rollup". Snowflake's do
jor bottleneck so can't speak authoritatively here.
> >>
> >> Which leads on to my major concern with this proposal, that it adds
> >> complexity and cognitive load to the specification and implementations,
> >> whilst not meaningfully improving the performance
I agree the experiment isn't working very well. I've been meaning to
change my listing from `compute` to `acero` for a while. I'd be +1 for
just removing it though.
On Tue, Jul 4, 2023, 6:44 AM Dewey Dunnington
wrote:
> Just a note that for me, the main problem is that I get automatic
> review
Congratulations Kevin!
On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei wrote:
> On behalf of the Arrow PMC, I'm happy to announce that Kevin Gurney
> has accepted an invitation to become a committer on Apache
> Arrow. Welcome, and thank you for your contributions!
>
> --
> kou
>
> is this overflow considered a bug? Or is large exec batch something that
should be avoided?
This is not a bug and it is something that should be avoided.
Some of the hash-join internals expect small batches. I actually thought
the limit was 32Ki and not 64Ki because I think there may be some p
Is your use case to operate on a batch of graphs? For example, do you have
hundreds or thousands of graphs that you need to run these algorithms on at
once?
Or is your use case to operate on a single large graph? If it's the
single-graph case then how many nodes do you have?
If it's one graph a
We do this quite a bit in the Arrow<->Parquet bridge if IIUC. There are
macros defined like this:
```
#define BEGIN_PARQUET_CATCH_EXCEPTIONS try {
#define END_PARQUET_CATCH_EXCEPTIONS \
}\
catch (const ::parquet::ParquetStat
1 - 100 of 445 matches
Mail list logo