Re: [ANNOUNCE] New Arrow PMC member: Alenka Frim

2025-07-03 Thread Felipe Oliveira Carvalho
Great news. Congratulations Alenka! -- Felipe On Wed, Jul 2, 2025 at 6:27 PM Nic Crane wrote: > Congrats Alenka! > > On Wed, 2 Jul 2025 at 05:03, Alenka Frim wrote: > > > Thank you all for the support and for welcoming me so openly > > into the community right from the start. I’m really lookin

Re: [DISCUSS] Parquet/Iceberg adding new interval types & Arrow compatibility

2025-06-22 Thread Felipe Oliveira Carvalho
What about adding a canonical extension type so teams using Arrow don't have to keep re-inventing timestamps and duration types? Using Decimal128 as storage type for these since we are missing 128-bit integers (another debate). -- Felipe On Sun, Jun 22, 2025 at 9:48 AM Antoine Pitrou wrote: >

Re: [DISCUSS][C++] Switch to C++20

2025-05-19 Thread Felipe Oliveira Carvalho
+1 Is it fair to say most users of Arrow C++ do that via Python/R or shared libraries? Making the migration to a recent C++ standard relatively safe? -- Felipe On Mon, May 19, 2025 at 1:14 PM Antoine Pitrou wrote: > > Hello, > > I am proposing that we switch Arrow C++ to require C++20. > > C++

Re: [DISCUSS] Arrow Variant Extension Type

2025-05-12 Thread Felipe Oliveira Carvalho
> As far as relying on union types, the reason we can't do so is because > the specific purpose of this Variant type is that we don't know the > types up front, it's dynamic. This is why "VARIANT" is a misnomer for this type. It's a DYNAMIC type, not a VARIANT (a type that can be a sum of multiple

Re: [DISCUSS] Split JS release process

2025-04-16 Thread Felipe Oliveira Carvalho
+1 from for the same reasons listed by Weston above. On Tue, Apr 15, 2025 at 6:02 AM Weston Pace wrote: > +1 from me, assuming this is acceptable to domoritz / trxcllnt. I feel we > have struggled to find maintainers for JS (outside of a few dedicated and > extremely helpful ones). > > Ideally

Re: [ANNOUNCE] New Arrow committer: Matthijs Brobbel (mbrobbel)

2025-04-05 Thread Felipe Oliveira Carvalho
Congrats! On Sat, 22 Mar 2025 at 13:23 Gang Wu wrote: > Congrats! > > On Sun, Mar 23, 2025 at 12:21 AM Bryce Mecum wrote: > > > Congrats! > > > > On Fri, Mar 21, 2025 at 1:51 PM Andrew Lamb > wrote: > > > > > > Hi, > > > > > > On behalf of the Arrow PMC, I'm happy to announce that Matthijs Bro

Re: [C++][FlightRPC] Preferred PR Format for the Arrow ODBC Driver

2025-04-01 Thread Felipe Oliveira Carvalho
Hi Alina, I don't speak for the whole community but one approach I took in the past [1] was opening a huge PR that does "everything" and as the reviews on that one progressed I would extract smaller, self-contained PRs that got more detailed reviews. Once those were merged, I would rebase and repe

Re: Request for comments on adding new IPC option 'ensure_memory_alignment'

2025-03-27 Thread Felipe Oliveira Carvalho
Hi, All this complexity everywhere when arrow-rs could simply check the alignment when they ingest external buffers and re-allocate to ensure alignment. I'm in favor of producers of Arrow arrays like a Flight client ensuring alignment as early as possible (when buffers are allocated for arrays de

Re: [ANNOUNCE] New Arrow PMC member: Ian Cook

2025-03-21 Thread Felipe Oliveira Carvalho
Congrats! 🚀 On Fri, Mar 21, 2025 at 7:23 AM Nic Crane wrote: > Congrats! > > On Thu, 20 Mar 2025, 23:15 Ed Seidl, wrote: > > > Congrats Ian! > > > > Cheers, > > Ed > > > > On 2025/03/20 08:04:03 Sutou Kouhei wrote: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > >

Re: [ANNOUNCE] New Arrow PMC member: Rok Mihevc

2025-03-19 Thread Felipe Oliveira Carvalho
Congratulations Rok! Well deserved. On Wed, Mar 19, 2025 at 7:42 PM David Li wrote: > Congrats Rok! > > On Thu, Mar 20, 2025, at 06:09, Fokko Driesprong wrote: > > Congrats Rok! > > > > Op wo 19 mrt 2025 om 22:08 schreef Adam Reeve > > > >> Congratulations Rok! > >> > >> On Thu, 20 Mar 2025 at

Re: [ANNOUNCE] New Arrow PMC member: Jacob Wujciak

2025-03-18 Thread Felipe Oliveira Carvalho
Great news! Congratulations Jacob. -- Felipe On Tue, Mar 18, 2025 at 3:33 AM Jean-Baptiste Onofré wrote: > Congrats Jacob ! > > Regards > JB > > On Mon, Mar 17, 2025 at 6:23 AM Sutou Kouhei wrote: > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Jacob Wujciak to

Re: [ANNOUNCE] New Arrow PMC member: Bryce Mecum

2025-02-05 Thread Felipe Oliveira Carvalho
Great news! Congratulations, Bryce. On Wed, Feb 5, 2025 at 6:15 PM Neal Richardson wrote: > Congrats, Bryce! > > On Wed, Feb 5, 2025 at 2:09 PM William Ayd .invalid> > wrote: > > > Congrats! > > > > Sent from my iPhone > > > > > On Feb 5, 2025, at 2:51 PM, Ian Cook wrote: > > > > > > Congratu

Re: [VOTE] Apache Arrow array representation of statistics

2024-12-23 Thread Felipe Oliveira Carvalho
+1 On Mon, Dec 23, 2024 at 2:37 AM Sutou Kouhei wrote: > Hi, > > I would like to propose standardizing how to represent > statistics as Apache Arrow array. > > Motivation: > > * We want to pass not only Apache Arrow data but also > statistics of them through the C data interface for query >

Re: [DISCUSS] Arrow array representation of statistics

2024-12-17 Thread Felipe Oliveira Carvalho
> I think it's fair not to mention any other Arrow-like transport mechanism > since the benefits of transporting the statistics as an Arrow array are > less clear right now. When we (or applications) start thinking about more advanced statistics like compressed histograms and sketch data structure

Re: [ANNOUNCE] New Arrow PMC member: Gang Wu

2024-12-03 Thread Felipe Oliveira Carvalho
Congratulations Gang Wu! On Tue, Dec 3, 2024 at 9:21 PM Weston Pace wrote: > Congratulations! > > On Tue, Dec 3, 2024, 3:21 PM Ian Cook wrote: > > > Congratulations and thanks for all your great work—not just on Arrow but > on > > so many parts of the surrounding ecosystem! > > > > On Tue, Dec

Re: [VOTE] Add Async C Data Interface

2024-10-25 Thread Felipe Oliveira Carvalho
+1 from me. I reviewed the PR some time ago and it's not a trivial protocol, but the complexity seems warranted and necessary. On Thu, Oct 24, 2024 at 6:02 PM Dewey Dunnington wrote: > Thanks Matt for putting this together! > > I was initially concerned about the complexity of the proposal; > h

Re: [ANNOUNCE] New Arrow committer: Rossi Sun

2024-10-22 Thread Felipe Oliveira Carvalho
Great news! Congratulations. — Felipe On Tue, 22 Oct 2024 at 16:03 Weston Pace wrote: > On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has > accepted an invitation to become a committer on Apache Arrow. Welcome, > and thank you for your contributions! >

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Felipe Oliveira Carvalho
Hi Susmit, For an example of what David Li is proposing, you can take a look at this project (https://github.com/voltrondata/sqlflite). It's a Flight SQL server (in C++ though) that can forward queries to either SQLite or DuckDB. -- Felipe On Wed, Oct 16, 2024 at 10:22 AM David Li wrote: > If

Re: [Discuss][C++] Deprecate precompiled headers option?

2024-10-02 Thread Felipe Oliveira Carvalho
I say we remove it. Arrow has great header hygiene (compared to other codebases I've worked on). With a little bit more effort we can probably eliminate long header include chains. -- Felipe On Wed, Oct 2, 2024 at 6:53 AM Antoine Pitrou wrote: > > Hello, > > Long ago, we added a ARROW_USE_PREC

Re: How to return auth token back to client?

2024-09-16 Thread Felipe Oliveira Carvalho
In what language are you implementing your Flight service? For C++ you implement a flight::ServerMiddlewareFactory. That allows you to populate response headers with the returned JWT token [1] -- Felipe [1] https://github.com/voltrondata/sqlflite/blob/f8c72976ea9eef7c7de264b7f93da3ae1fa2bcd7/src

Re: [VOTE] Allow Decimal32 and Decimal64 bitwidths in Arrow Format

2024-09-04 Thread Felipe Oliveira Carvalho
+1 (non-binding) Micah Kornfield, you have a good point. One spec without at least 2 implementations is not a serious spec. But I think we can run the vote now and defer the merging of the implementations until we are confident both implementations are almost complete, tested and the spec text co

Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-06 Thread Felipe Oliveira Carvalho
+1 (non-binding) -- Felipe On Tue, Aug 6, 2024 at 6:24 AM Gang Wu wrote: > +1 (non-binding) > > Looked through the spec and C++ impl. > > Best, > Gang > > On Tue, Aug 6, 2024 at 11:55 AM wish maple wrote: > > > +1 (non-binding) > > > > Best, > > Xuwei Fu > > > > David Li 于2024年8月6日周二 10:20写道:

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-18 Thread Felipe Oliveira Carvalho
I think it would confuse implementors of the spec and people implementing kernels way too much. “the bool Arrow type” should probably not start meaning two different things. — Felipe On Fri, 19 Jul 2024 at 01:26 Micah Kornfield wrote: > As Boolean is already in the arrow type system I think it

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Felipe Oliveira Carvalho
Hi, The markers are necessary to offer file system semantics on top of object stores. You will get a ton of subtle bugs otherwise. If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore interface that wraps local filesystems and object stores with object-store semantics (i.e. no con

Re: [DISCUSS] Statistics through the C data interface

2024-07-11 Thread Felipe Oliveira Carvalho
o leverage the fact that libraries handles unions gracefully, this could be: map, dense_union<...needed types based on stat kinds in the keys...>> X is either sparse or dense. A possible alternative is to use a custom struct instead of map and reduce the levels of nesting: struct> --

Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-04 Thread Felipe Oliveira Carvalho
f { > > >> >> >>>let left = left.as_primitive(); > > >> >> >>>let right = right.as_primitive(); > > >> >> >>>res = binary(left, right, |l, r| gcd(l, r)); > > >> >> >>>Arc::new(res

Re: [DISCUSS] Statistics through the C data interface

2024-07-01 Thread Felipe Oliveira Carvalho
Hi, You can promise that well-known int32 statistic keys won't ever be higher than a certain value (2^18) [1] like TCP IP ports (well-known ports in [0, 2^10)) but for non-standard statistics from open-source products the key=0 combined with string label is the way to go, otherwise collisions woul

Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-06-28 Thread Felipe Oliveira Carvalho
On Fri, Jun 28, 2024 at 11:07 AM Andrew Lamb wrote: > > Hi Xuanwo, > > Sorry for the delay in responding. I think the ability to easily write > functions that "feel" like native functions in whatever language and be > able to generate arrow / vectorized versions of them is quite valuable. > This

Re: [DISCUSS] Statistics through the C data interface

2024-06-14 Thread Felipe Oliveira Carvalho
On Sun, Jun 9, 2024 at 7:53 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: [DISCUSS] Statistics through the C data interface" on Sun, 9 Jun 2024 > 22:11:54 +0200, > Antoine Pitrou wrote: > > Fields: > | Name | Type | Comments | > ||---

Re: [Discuss][C++] Switch to mimalloc by default?

2024-06-08 Thread Felipe Oliveira Carvalho
+1. I think the benefits outweigh the risks. On Wed, Jun 5, 2024 at 3:05 PM Anja wrote: > > I did want to start off by acknowledging that all of the pros you listed > for mimalloc are accurate. > > I did want to contribute the times that people have been caught off-guard > by the perceived increa

Re: [DISCUSS] Statistics through the C data interface

2024-06-08 Thread Felipe Oliveira Carvalho
te: > > > > Le 07/06/2024 à 18:30, Felipe Oliveira Carvalho a écrit : > > On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote: > >> > >> > >> Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit : > >>> I've been thinking about how t

Re: [DISCUSS] Statistics through the C data interface

2024-06-07 Thread Felipe Oliveira Carvalho
On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote: > > > Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit : > > I've been thinking about how to encode statistics on Arrow arrays and > > how to keep the set of statistics known by both producers and > &

Re: [DISCUSS] Statistics through the C data interface

2024-06-06 Thread Felipe Oliveira Carvalho
I've been thinking about how to encode statistics on Arrow arrays and how to keep the set of statistics known by both producers and consumers (i.e. standardized). The statistics array(s) could be a map< // the column index or null if the statistics refer to whole table or batch column:

Re: [VOTE] Migration of parquet-cpp issues to Arrow's issue tracker

2024-05-29 Thread Felipe Oliveira Carvalho
+1 (non-binding) On Wed, 29 May 2024 at 11:30 Micah Kornfield wrote: > +1 (non-binding for Parquet, Binding for Arrow if that makes a difference) > > > > On Wed, May 29, 2024 at 7:15 AM Rok Mihevc wrote: > > > # sending this to both dev@arrow and dev@parquet > > > > Hi all, > > > > Following th

Re: [DISCUSS] Statistics through the C data interface

2024-05-23 Thread Felipe Oliveira Carvalho
I want to +1 on what Dewey is saying here and some comments. Sutou Kouhei wrote: > ADBC may be a bit larger to use only for transmitting statistics. ADBC has > statistics related APIs but it has more other APIs. It's impossible to keep the responsibility of communication protocols cleanly separa

Re: [ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Felipe Oliveira Carvalho
Great news. Congratulations Dane! On Tue, May 7, 2024 at 7:57 PM Vibhatha Abeykoon wrote: > > Congratulations Dane!!! > > Vibhatha Abeykoon > > > On Wed, May 8, 2024 at 4:02 AM Jacob Wujciak wrote: > > > Congrats! > > > > Am Di., 7. Mai 2024 um 23:19 Uhr schrieb Bryce Mecum > >: > > > > > Congr

Re: [VOTE][Format] UUID canonical extension type

2024-04-29 Thread Felipe Oliveira Carvalho
Isn't that easily decodable from the UUID data itself? If you allow the version to be specified as metadata, you now have to validate and make sure it's consistent with the version encoded in the contents of the UUID column. And UUID versions are more of a concern for UUID generation than consumpt

Re: Unsupported/Other Type

2024-04-11 Thread Felipe Oliveira Carvalho
The OP used UUID as an example. Would that be enough or the request is for a flexible mechanism that allows the creation of one-off nominal types for very specific use-cases? — Felipe On Thu, 11 Apr 2024 at 05:06 Antoine Pitrou wrote: > > Yes, JSON and UUID are obvious candidates for new canoni

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Felipe Oliveira Carvalho
Algebraic Data Types (Sums and Products) are very abstract. This means they don't fully specify a concrete/physical layout [1]: different physical layouts can match the same algebraic definition. As an in-memory data format specification, Arrow doesn't and shouldn't rigidly specify concretization r

Re: [DISCUSS] Looking for feedback on my Rust library

2024-03-14 Thread Felipe Oliveira Carvalho
Two comments: —— Since this library is analogous to things like ADBC, ODBC, and JDBC, it’s more of a “driver” than a “connector”. This might make your life easier when explaining what it does. It’s not a black and white thing, but “connector” might imply networking to some people. I believe you

Re: [DISCUSS] Status and future of @ApacheArrow Twitter account

2024-01-29 Thread Felipe Oliveira Carvalho
> I have found Twitter an extremely effective way for an open-source project to communicate with the “exo-community” — people who are interested in the project but not so invested that they join the email list. An open source project needs to perform pretty much all of the functions of a for-profit

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-08 Thread Felipe Oliveira Carvalho
gt; wrote: > > > > > > > > > Congratulations, Felipe! > > > > > ________ > > > > > From: Daniël Heres > > > > > Sent: Thursday, December 7, 2023 2:59 PM > > > > > To: dev@arrow.apache.org > &

Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Felipe Oliveira Carvalho
Congratulations! Well deserved. On Mon, Nov 13, 2023 at 5:16 PM Neal Richardson wrote: > Congratulations! > > On Mon, Nov 13, 2023 at 3:10 PM Matt Topol wrote: > > > Congratulations Raul!! > > > > On Mon, Nov 13, 2023, 3:09 PM Antoine Pitrou wrote: > > > > > > > > Welcome Raul, we're glad to h

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-23 Thread Felipe Oliveira Carvalho
Congratulations Xuwei! — Felipe On Mon, 23 Oct 2023 at 10:26 Vibhatha Abeykoon wrote: > Congratulations Xuwei! > > On Mon, Oct 23, 2023 at 6:38 PM Weston Pace wrote: > > > Congratulations Xuwei! > > > > On Mon, Oct 23, 2023 at 3:38 AM wish maple > wrote: > > > > > Thanks kou and every nice pe

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Felipe Oliveira Carvalho
+1 On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington wrote: > +1! > > On Wed, Oct 18, 2023 at 2:14 PM Matt Topol wrote: > > > > +1 > > > > On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou > wrote: > > > > > +1 > > > > > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit : > > > > Hello all, > > >

Re: Apache Arrow file format

2023-10-17 Thread Felipe Oliveira Carvalho
It’s not the best since the format is really focused on in- memory representation and direct computation, but you can do it: https://arrow.apache.org/docs/python/feather.html — Felipe On Tue, 17 Oct 2023 at 23:26 Nara wrote: > Hi, > > Is it a good idea to use Apache Arrow as a file format? Loo

Re: Language-specific discussion (with C# example)

2023-10-17 Thread Felipe Oliveira Carvalho
The Zulip is https://ursalabs.zulipchat.com/ On Tue, Oct 17, 2023 at 9:55 PM Will Jones wrote: > Hi Curt, > > I think the most visible place for now would be creating an issue for > discussion. > > In the future, if you and some others want to have a place to discuss C# > development, you could

Re: [Vote][Format] (new proposal) C data interface format string for ListView and LargeListView arrays

2023-10-11 Thread Felipe Oliveira Carvalho
> > > But I also reiterate my plea that these existing parsers get fixed so > as > > > to entirely validate the format string instead of stopping early. > > > > > > Regards > > > > > > Antoine. > > > > > > > > &

[Vote][Format] (new proposal) C data interface format string for ListView and LargeListView arrays

2023-10-06 Thread Felipe Oliveira Carvalho
Hello, I'm writing to propose "+vl" and "+vL" as format strings for list-view and large list-view arrays passing through the Arrow C data interface [1]. The previous proposal was considered a bad idea because existing parsers of these format strings might be looking at only the first `l` (or `L`)

Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

2023-10-06 Thread Felipe Oliveira Carvalho
n. My vote would be +1 for +vl and > +vL. > > On Thu, Oct 5, 2023 at 6:40 PM Felipe Oliveira Carvalho > wrote: > > > > > Union format strings share enough properties that having them in the > > > same switch case doesn't result in additional complexity...lis

Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

2023-10-05 Thread Felipe Oliveira Carvalho
here a reason > >> that +lv and +Lv were chosen over a single-character version (i.e., > >> maybe +v and +V)? A single-character version is (slightly) easier to > >> parse in C. > >> > >> On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho &

Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

2023-10-05 Thread Felipe Oliveira Carvalho
to parse the format string are already rather > unwieldy...it would be a nice quality-of-life improvement (although by > no means a required one) to use a separate character. > > On Thu, Oct 5, 2023 at 3:34 PM Felipe Oliveira Carvalho > wrote: > > > > This ma

Re: [Vote][Format] C data interface format string for ListView and LargeListView arrays

2023-10-05 Thread Felipe Oliveira Carvalho
where this discussion may have occurred...is there a reason > that +lv and +Lv were chosen over a single-character version (i.e., > maybe +v and +V)? A single-character version is (slightly) easier to > parse in C. > > On Thu, Oct 5, 2023 at 2:00 PM Felipe Oliveira Carvalho > wrot

[Vote][Format] C data interface format string for ListView and LargeListView arrays

2023-10-05 Thread Felipe Oliveira Carvalho
Hello, I'm writing to propose "+lv" and "+Lv" as format strings for list-view and large list-view arrays passing through the Arrow C data interface [1]. The vote will be open for at least 72 hours. [ ] +1 - I'm in favor of this new C Data Format string [ ] +0 [ ] -1 - I'm against adding this new

Re: [VOTE][Format] Add ListView and LargeListView Arrays to Arrow Format

2023-10-02 Thread Felipe Oliveira Carvalho
> > >> > There'll probably be some minor comments to the format PR, but those > > >> > don't deter from accepting these new layouts into the standard. > > >> > > > >> > Regards > > >> > > > >> > Antoi

Re: [VOTE][Format] Add ListView and LargeListView Arrays to Arrow Format

2023-09-29 Thread Felipe Oliveira Carvalho
run into similar issues as [1]? > > Kind Regards, > > Raphael Taylor-Davies > > [1]: https://lists.apache.org/thread/l8t1vj5x1wdf75mdw3wfjvnxrfy5xomy > > On 29/09/2023 13:09, Felipe Oliveira Carvalho wrote: > > Hello, > > > > I'd like to propose adding L

[VOTE][Format] Add ListView and LargeListView Arrays to Arrow Format

2023-09-29 Thread Felipe Oliveira Carvalho
Hello, I'd like to propose adding ListView and LargeListView arrays to the Arrow format. Previous discussion in [1][2], columnar format description and flatbuffers changes in [3]. There are implementations available in both C++ [4] and Go [5]. I'm working on the integration tests which I will pus

Re: [DISCUSS][C++] Raw pointer string views

2023-09-28 Thread Felipe Oliveira Carvalho
My take here is that Ben did an excellent job in hiding the fact that C++ has two variations of the format without leaking the pointer version via the interfaces through which Arrow arrays are communicated to other implementations. As things stand right now, there is no zero-copy transfer of point

Re: [C++] Potential cache/memory leak when reading parquet

2023-09-06 Thread Felipe Oliveira Carvalho
> (a) stays pretty stable throughout the scan (stays < 1G), (b) keeps increasing during the scan (looks linear to the number of files scanned). I wouldn't take this to mean a memory leak but the memory allocator not paging out virtual memory that has been allocated throughout the scan. Could you r

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-08-21 Thread Felipe Oliveira Carvalho
I marked the C++ implementation PR ready for review today and will soon be working on the Go implementation. https://github.com/apache/arrow/pull/35345 Note that differently from Velox's ArrayVector, the Arrow implementation (ListView) also features a 64-bit version (LargeListView) to be symmetri

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-08-18 Thread Felipe Oliveira Carvalho
+1 (non-binding) — Felipe On Fri, 18 Aug 2023 at 18:48 Jacob Wujciak-Jens wrote: > +1 (non-binding) > > On Fri, Aug 18, 2023 at 6:04 PM L. C. Hsieh wrote: > > > +1 (binding) > > > > On Fri, Aug 18, 2023 at 5:53 AM Neal Richardson > > wrote: > > > > > > +1 > > > > > > Thanks all for the though

[Format] C data interface format string for run-end encoded arrays

2023-08-15 Thread Felipe Oliveira Carvalho
Hello, I'm writing to inform you that I'm proposing "+r" as format string for run-end encoded arrays passing through the Arrow C data interface [1]. Feel free to also discuss in the linked PR with the changes to bridge.cc and reference docs. [1] https://arrow.apache.org/docs/format/CDataInterfac

Re: [DISCUSS] Canonical alternative layout proposal

2023-08-05 Thread Felipe Oliveira Carvalho
at a single logical type may have > multiple physical layouts. I agree. E.g. variable size list<32>, variable > size list<64>, and REE are the physical layouts that, combined with the > logical type "string", give you "string", "large string", an

Re: [DISCUSS] Canonical alternative layout proposal

2023-08-01 Thread Felipe Oliveira Carvalho
A major difficulty in making the Arrow array types open for extension [1] is that as soon as we define an (a) universal representation* or (b) abstract interface, we close the door for vectorization. (a) prevents having new vectorization friendly formats and (b) limits the implementation of new vec

Re: Question about TypeHolder in arrow

2023-07-04 Thread Felipe Oliveira Carvalho
int8(), int16()… all return the same shared_ptr that gets inc-ref’d on every "creation". But any code taking type pointers shouldn't assume it comes from `static` storage. All uses of a non-owning TypeHolder should be based on something else ensuring the shared_ptr is alive while the TypeHolder is

Re: Question about nested columnar validity

2023-06-29 Thread Felipe Oliveira Carvalho
Values in the `offsets` Buffer of a ListArray can’t be left undefined because the length of a valid entry before a NULL entry is the offset associated with that NULL entry minus the previous offset. The ListViewArray format I’m working on doesn’t have that restriction because all the information a

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-15 Thread Felipe Oliveira Carvalho
gt; don't see anything inherently wrong with it, and if it ain't broke we > really shouldn't be trying to fix it. > > Kind Regards, > > Raphael Taylor-Davies > > On 14 June 2023 17:52:52 BST, Felipe Oliveira Carvalho > wrote: > > Genera

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Felipe Oliveira Carvalho
tView aspires to, such an addition could require non trivial changes to > many / all of those implementations (and the APIs they expose). > > Andrew > > On Wed, Jun 14, 2023 at 12:53 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > General appr

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Felipe Oliveira Carvalho
just between > systems?" > > > On Wed, Jun 14, 2023 at 2:07 AM Antoine Pitrou wrote: > > > > > I agree that ListView cannot be an extension type, given that it > > features a new layout, and therefore cannot reasonably be backed by an > > existing storage type (AFAICT

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-06 Thread Felipe Oliveira Carvalho
ypes might be deprecated in favor of view variants [2]. Others > > > were > > > > > > > worried that it might undermine the perception that the Arrow > > > format > > > > is > > > > > > > stable. I think it might be worth thinking about &quo

Re: [VOTE][Format] Add experimental ArrowDeviceArray to C-Data API

2023-05-25 Thread Felipe Oliveira Carvalho
+1 for me. The C structs are clean and leave good room for extension. -- Felipe On Thu, May 25, 2023 at 12:04 PM David Li wrote: > +1 for me. > > (Heads up: on the PR, there was some discussion since the last email and > the meaning of 'experimental' was clarified.) > > On Tue, May 23, 2023, a

Re: New datatype: Huge integers & decimals

2023-05-24 Thread Felipe Oliveira Carvalho
Have you considered using fixed-length binary values for these? Crypto algorithms might logically be defined in terms of mathematical operations on integers, but their efficient implementation tends to feature inlined operations at the machine word level instead of generic add, div, mod, mul opera

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-05-21 Thread Felipe Oliveira Carvalho
. For example, > operations > >> that slice these containers can be implemented in a zero-copy manner by > >> just rearranging the lengths/offsets indices, without ever touching the > >> larger internal buffers. This is a similar motivation as for StringView > >&

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-05-19 Thread Felipe Oliveira Carvalho
ort for the type, including compute kernels? Or are they likely to > just > > convert this type to ListArray at import boundaries? > > > > Because if it turns out to be the latter, then we might as well ask Velox > > to export this type as ListArray and save the rest of

Re: Freeing memory when working with static crt in windows.

2023-05-12 Thread Felipe Oliveira Carvalho
> I am actually trying to switch to arrow_static.lib. Perhaps the issue is arrow_static.lib being linked with a static crt that's not the one you are using in your project? On Fri, May 12, 2023 at 3:13 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) < avertl...@bloomberg.net> wrote: > This is not only

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-05-11 Thread Felipe Oliveira Carvalho
1m2ggz2kdq > > >> > > >> On Tue, Apr 25, 2023 at 3:13 PM Will Jones > > wrote: > > >> > > >>> Hi Felipe, > > >>> > > >>> Thanks for the introduction. I'd be interested to hear about the > > >>

Re: [ANNOUNCE] New Arrow PMC member: Matt Topol

2023-05-03 Thread Felipe Oliveira Carvalho
Congratulations, Matt! On Wed, 3 May 2023 at 14:37 Andrew Lamb wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Matt Topol (zeroshade) to become a PMC member and we are pleased to > announce > that Matt has accepted. > > Congratulations and welcome! >

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-26 Thread Felipe Oliveira Carvalho
After Weston's suggestion above, I've renamed files and classes in my WIP implementation: ArrayView -> ListView On Wed, Apr 26, 2023 at 11:08 AM Ian Cook wrote: > +1 to what Weston and Joris suggested regarding the name. "ListView" > seems like the best name to use for this layout in Arrow. > >

[DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-25 Thread Felipe Oliveira Carvalho
Hi folks, I would like to start a public discussion on the inclusion of a new array format to Arrow — array-view array. The name is also up for debate. This format is inspired by Velox's ArrayVector format [1]. Logically, this array represents an array of arrays. Each element is an array-view (of

Re: [DISCUSS] The default commit message for merge button

2023-01-31 Thread Felipe Oliveira Carvalho
+1 for "pull request title *and* description". Being able to read descriptions without leaving the editor is handy. Keeping that information tracked in the repo means we don’t depend on GitHub to reconstruct the history of the project. On Tue, 31 Jan 2023 at 06:43 Antoine Pitrou wrote: > > +1 f