Re: Do we have nightly source tar ball

2022-07-07 Thread Joris Van den Bossche
We do upload nightly wheels to an alternative PyPI index (see https://arrow.apache.org/docs/python/install.html#installing-nightly-packages), at https://pypi.fury.io/arrow-nightlies/pyarrow, and it seems we actually also upload an sdist there. (it could still be more reliable to used HEAD, though,

Re: DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-08-17 Thread Joris Van den Bossche
+1 on the overall proposal, documenting those in a central place sounds good to me. On Wed, 17 Aug 2022 at 18:10, Antoine Pitrou wrote: > > > > * The specification text to be added *must* follow these requirements > > 1) It *must* have a well-defined name starting with "ARROW:" > One remar

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-06 Thread Joris Van den Bossche
Congrats Weston! It is great to have you on the team! On Tue, 6 Sept 2022 at 06:10, Weston Pace wrote: > Thank you everyone! I look forward to continuing to work with you all. > > On Mon, Sep 5, 2022 at 3:56 PM Kun Liu wrote: > > > > Congrats Weston!! > > > > > > Gavin Ray 于2022年9月6日周二 08:04写

Re: Usage of the name Feather?

2022-09-06 Thread Joris Van den Bossche
Personally, I like the "Feather" name (and actually think it could help disambiguate the file vs in-memory distinction), but I understand that we have chosen a certain path (eg ".arrow" is the official registered extension), and have to move on. However, I think we need to be very careful in how w

Re: Parser for expressions

2022-10-12 Thread Joris Van den Bossche
Another advantage of "add(x, y)" is that this matches our current string representation for expressions. Although that might give the impression that we support anything that we output as string, and so that raises the question if we want to make this explicit: if we add parsing capabilities, woul

Re: [DISCUSS] Move issue tracking to

2022-10-24 Thread Joris Van den Bossche
I would also support a migration of our issues to GitHub. It seems unlikely to me that another third-party tool would be good enough to make the whole experience better (given that we already use GitHub for PRs). And I agree with others that keep using JIRA is not a good option with this change. A

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-26 Thread Joris Van den Bossche
Congratulations Nic! ;) On Wed, 26 Oct 2022 at 19:26, Weston Pace wrote: > > Thanks Nic and congratulations! > > On Wed, Oct 26, 2022 at 8:28 AM Raúl Cumplido wrote: > > > > Thanks Nic for your contributions! > > > > El mié, 26 oct 2022 a las 17:17, Antoine Pitrou () > > escribió: > > > > > > >

Re: [ANNOUNCE] New Arrow PMC member: Jacob Quinn

2022-10-26 Thread Joris Van den Bossche
Congratulations! On Wed, 26 Oct 2022 at 19:25, Weston Pace wrote: > > Congrats Jacob! > > On Wed, Oct 26, 2022 at 6:10 AM Jacob Wujciak > wrote: > > > > Congrats! > > > > On Wed, Oct 26, 2022 at 8:31 AM Alenka Frim > > wrote: > > > > > Congratulations! > > > > > > On Wed, Oct 26, 2022 at 7:54 A

Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-27 Thread Joris Van den Bossche
+1 On Thu, 27 Oct 2022 at 07:27, Jacob Quinn wrote: > > +1 > > On Wed, Oct 26, 2022 at 5:04 PM Neal Richardson > wrote: > > > I propose that we move issue tracking from the ASF's Jira to GitHub Issues. > > This has been discussed on [1] and [2] and there seems to be consensus. A > > number of Ar

Re: [Discuss][Python] Stop publishing universal wheels?

2022-10-27 Thread Joris Van den Bossche
The cibuildwheel documentation has a note about this (https://cibuildwheel.readthedocs.io/en/stable/faq/#universal2), quoting: > The dual-architecture universal2 has a few benefits, but a key benefit to > a universal wheel is that a user can bundle these wheels into an application > and ship a sin

Re: [ANNOUNCE] New Arrow committer: Eric Patrick Hanson

2022-10-27 Thread Joris Van den Bossche
Congratulations, and welcome Eric! On Thu, 27 Oct 2022 at 13:53, Eric Hanson wrote: > > thanks, I'm excited to join! > > On 2022/10/26 21:38:53 Sutou Kouhei wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Eric Patrick Hanson > > has accepted an invitation to become a committer o

Re: [ANNOUNCE] New Arrow committer: Ben Baumgold

2022-10-27 Thread Joris Van den Bossche
Congratulations, and welcome Ben! On Thu, 27 Oct 2022 at 05:11, Weston Pace wrote: > > Congratulations Ben! > > On Wed, Oct 26, 2022 at 2:05 PM David Li wrote: > > > > Welcome Ben! > > > > On Wed, Oct 26, 2022, at 17:57, Ian Joiner wrote: > > > Congrats Ben! > > > > > > Ian > > > > > > On Wednes

Re: [ANNOUNCE] New Arrow committer: Bogumił Kamiński

2022-10-27 Thread Joris Van den Bossche
Congrats! On Wed, 26 Oct 2022 at 23:56, Ian Joiner wrote: > > Congrats Bogumił! > > Ian > > On Tuesday, October 25, 2022, Sutou Kouhei wrote: > > > Hi, > > > > On behalf of the Arrow PMC, I'm happy to announce that Bogumił Kamiński > > has accepted an invitation to become a committer on Apache >

Re: [RESULT][VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-27 Thread Joris Van den Bossche
- [?] Upload wheels/sdist to pypi Done: https://pypi.org/project/pyarrow/10.0.0 On Wed, 26 Oct 2022 at 13:56, Neal Richardson wrote: > > I will submit the R package to CRAN. > > Neal > > On Wed, Oct 26, 2022 at 4:40 AM Sutou Kouhei wrote: > > > Thanks!!! > > > > Current status: > > > > - [Done]

Re: Request for Patch release of 10.0.1

2022-11-08 Thread Joris Van den Bossche
Hey Matt, See also the "[DISCUSS] Pyarrow wheels for Python 3.11" thread, where I think the conclusion is that we need a 10.0.1 bugfix release anyway for Python as well (and in practice this means a bug fix release for the full repo). For that purpose, a 10.0.1 milestone is created, and normally a

Re: [DISCUSS]: Interim plan for new users reporting issues before GitHub migration

2022-11-18 Thread Joris Van den Bossche
+1 on pointing new users directly to GitHub issues. On Thu, 17 Nov 2022 at 21:15, MAURICIO ANDRES VARGAS SEPULVEDA wrote: > > Hi! > > +Inf to Nic's point > > Asking to write a Gh issue seems to be the easiest > > Get Outlook for Android > >

Re: [VOTE] Disable ASF Jira issue reporting

2022-11-24 Thread Joris Van den Bossche
On Thu, 24 Nov 2022 at 11:31, Antoine Pitrou wrote: > > > Are all the required labels ready? I don't seem to see the components > in https://github.com/apache/arrow/labels. Also, we should curate the > existing labels and namespace all the remaining ones so that the > categories can be easily unde

Re: [VOTE] Disable ASF Jira issue reporting

2022-11-24 Thread Joris Van den Bossche
+1 On Wed, 23 Nov 2022 at 22:37, Todd Farmer wrote: > > Hello, > > I would like to propose that issue reporting in ASF Jira for the Apache > Arrow project be disabled, and all users directed to use GitHub issues for > reporting going forward. GitHub issue reporting is now enabled [1] in > respons

Re: Arrow sync call November 23 at 12:00 US/Eastern, 17:00 UTC

2022-11-28 Thread Joris Van den Bossche
FYI: Raúl also already opened a PR to update the merge script to work with github issues: https://github.com/apache/arrow/pull/14731 Personally I also think that we should consider using the merge button instead of our script (or at least re-evaluate what the script still does better, or might now

Re: Arrow sync call November 23 at 12:00 US/Eastern, 17:00 UTC

2022-11-28 Thread Joris Van den Bossche
On Mon, 28 Nov 2022 at 12:09, Joris Van den Bossche wrote: > > FYI: Raúl also already opened a PR to update the merge script to work > with github issues: https://github.com/apache/arrow/pull/14731 (sorry, that PR is to update the github actions workflow (the bot that comments on PRs)

Current state of using GitHub issues for Arrow

2022-11-30 Thread Joris Van den Bossche
Hi all, There is a separate vote thread about already using GitHub issues for all new issues (and in practice users also are already doing that, since new JIRA signup has been disabled). And so to prepare for this, there has been some ongoing work to update our workflows to handle GitHub issues (b

Re: [ANNOUNCE] New Arrow committer: Raúl Cumplido

2022-12-07 Thread Joris Van den Bossche
Congrats Raúl! On Wed, 7 Dec 2022 at 09:03, Raúl Cumplido wrote: > Thank you everyone! > > El mar, 6 dic 2022, 17:30, Weston Pace escribió: > > > Congratulations! > > > > On Tue, Dec 6, 2022 at 7:57 AM Nic wrote: > > > > > > Congratulations! > > > > > > On Tue, 6 Dec 2022 at 15:49, Ian Cook w

Re: Current state of using GitHub issues for Arrow

2022-12-08 Thread Joris Van den Bossche
On Tue, 6 Dec 2022 at 08:41, Benson Muite wrote: > > > > For sure the exact workflows will still be further refined while starting > > to use this. And if there are things missing or unclear in the current > > practices around how to handle GitHub issues or any other feedback or > > ideas, this th

Re: [ANNOUNCE] New Arrow committer: Jacob Wujciak

2022-12-15 Thread Joris Van den Bossche
Congrats! On Fri, 16 Dec 2022 at 03:22, Dewey Dunnington wrote: > > Congrats, Jacob! > > On Thu, Dec 15, 2022 at 9:26 PM Matt Topol wrote: > > > Congrats Jacob!! > > > > On Thu, Dec 15, 2022, 7:53 PM Neal Richardson > > > > wrote: > > > > > Congrats! > > > > > > On Thu, Dec 15, 2022 at 7:00 PM

Re: [DISCUSS] The default commit message for merge button

2023-01-31 Thread Joris Van den Bossche
I would personally prefer to use just "Pull request title" instead of "Pull request title and description". In my experience, including the description in the commit message (as we already do) more often gives noise to the output of `git log`, and you can always go from the commit to the PR to see

Re: [DISCUSS] The default commit message for merge button

2023-01-31 Thread Joris Van den Bossche
> > > > > > > On Tue, 31 Jan 2023 at 06:43 Antoine Pitrou > > > wrote: > > > > > > > > > >> > > > > >> +1 for "pull request title *and* description". > > > > >> > > > > >>

Re: [DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-03 Thread Joris Van den Bossche
On Thu, 2 Feb 2023 at 16:06, Clark Zinzow wrote: > > Hi Alenka, > > Great work on the RFC, I'm super excited to see this! I was planning to > open a similar RFC at some point over the next few weeks, so this just > saved me a bunch of work. :D > > At the Ray project [1], we've developed two tensor

Re: [DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-14 Thread Joris Van den Bossche
On Tue, 7 Feb 2023 at 19:32, Quentin Lhoest wrote: > > Hi, > > If I remember correctly one can already pass `types_mapper` > to `pa.Table.to_pandas`, to allow Ray or HF Datasets to define > their own pandas extension types associated to the arrow > extension types. I guess this could also be used

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-15 Thread Joris Van den Bossche
+1 (binding) I ran the verification on Ubuntu 20.04 using conda: $ USE_CONDA=1 ARROW_TMPDIR=/tmp/adbc-verification ./dev/release/verify-release-candidate.sh 0.2.0 1 ... Release candidate looks good! I only had a problem with installing some ruby dependencies (for GLIB tests), not finding /usr/bi

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-16 Thread Joris Van den Bossche
On Wed, 15 Feb 2023 at 21:31, Sutou Kouhei wrote: > > > not finding /usr/bin/mkdir > > Could you show the log of this? Yes: current directory: /tmp/adbc-verification/apache-arrow-adbc-0.2.0/glib/vendor/bundle/ruby/3.1.0/gems/fiddle-1.1.1/ext/fiddle make DESTDIR\= install make: /usr/bin/mkdir: Co

Re: Proposal: renaming the 'master' branch to 'main'

2023-02-17 Thread Joris Van den Bossche
Also for https://github.com/apache/arrow the default branch is now renamed to "main". You will see some instructions the first time visiting the github repo since the rename, but copying them here below. You can rename the master branch on your fork as well (visiting https://github.com//arrow will

Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-02-21 Thread Joris Van den Bossche
On Tue, 21 Feb 2023 at 18:00, Rok Mihevc wrote: > > > > > Should we rule that `dim_names` and `permutation` are mutually exclusive? > > > > Since `dim_names` have to "map to the physical layout (row-major)" that > means permutation will always be trivial which indeed makes it unnecessary > to stor

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-21 Thread Joris Van den Bossche
On Wed, 22 Feb 2023 at 00:55, Sutou Kouhei wrote: > > Hi, > > In > "Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1" on Thu, 16 Feb 2023 > 09:19:50 +0100, > Joris Van den Bossche wrote: > > > current directory: > > /tmp/adbc-verification/a

Re: [VOTE] Release Apache Arrow nanoarrow 0.1.0 - RC1

2023-03-02 Thread Joris Van den Bossche
+1 (binding) Verified on Ubuntu 20.04 It worked with conda R for me, I only needed to ensure to install a conda compiler to get it building (https://github.com/apache/arrow-nanoarrow/pull/142) On Thu, 2 Mar 2023 at 05:29, Jin Shang wrote: > > +1 (non-binding). Verified on macOS 12.5 aarch64 and

Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-07 Thread Joris Van den Bossche
+1 (binding) On Tue, 7 Mar 2023 at 23:35, David Li wrote: > > +1 (binding) > > Just one comment, though: since we also define a separate "Tensor" IPC > structure in Arrow, maybe we should state the relationship somewhere in the > documentation? (Even if the answer is "no relationship".) > > On

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Joris Van den Bossche
Congrats Will! On Mon, 13 Mar 2023 at 22:01, Michel Miotto Barbosa wrote: > > Congratulations Wiil! > > A disposição | At your disposal > > Michel Miotto Barbosa > https://www.linkedin.com/in/michelmiottobarbosa/ > mmiottobarb...@gmail.com > +55 11 984 342 347 > > > > > On Mon, Mar 13, 2023 at 2:

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Joris Van den Bossche
I am personally not a huge fan of auto-closing PRs. Especially not after a short period like 30 days (I think that's too short for an open source project), and we have to be careful with messaging. Very often such a PR is "stale" because it is waiting for reviews. I know we have the labels now that

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Joris Van den Bossche
On Fri, 31 Mar 2023 at 17:38, Alessandro Molina wrote: > > .. > My question probably would be... If a PR was sitting ignored for 30 days > without anyone from the community feeling the need to review and merge it > and without its primary author feeling the need to push for getting it > merged. Is

Re: [VOTE] Formalize how to change format

2023-04-26 Thread Joris Van den Bossche
+1 On Wed, 26 Apr 2023 at 04:18, Sutou Kouhei wrote: > > Hi, > > I've added one more note about documentation: > > We must update the corresponding documentation (files in > ``_) > too. > > https://github.com/apache/arrow/pull/35

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-26 Thread Joris Van den Bossche
On Wed, 26 Apr 2023 at 02:37, Weston Pace wrote: > > For context, there was some discussion on this back in [1]. At that time > this was called "sequence view" but I do not like that name. However, > array-view array is a little confusing. Given this is similar to list can > we go with list-vie

Re: [ANNOUNCE] New Arrow PMC member: Matt Topol

2023-05-04 Thread Joris Van den Bossche
Congrats Matt! On Thu, 4 May 2023 at 06:31, Nic Crane wrote: > > Congratulations! > > On Thu, 4 May 2023, 05:24 Vibhatha Abeykoon, wrote: > > > Congratulations Matt! > > > > On Thu, May 4, 2023 at 7:35 AM Ian Cook wrote: > > > > > Congratulations Matt!!! > > > > > > On Wed, May 3, 2023 at 9:55 

Re: [ANNOUNCE] New Arrow committer: Marco Neumann

2023-05-11 Thread Joris Van den Bossche
Congrats Marco! On Thu, 11 May 2023 at 15:05, Weston Pace wrote: > > Congratulations! > > On Thu, May 11, 2023 at 4:28 AM vin jake wrote: > > > Congratulations Marco! > > > > On Thu, May 11, 2023 at 7:18 AM Andrew Lamb wrote: > > > > > On behalf of the Arrow PMC, I'm happy to announce that Marc

Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Joris Van den Bossche
+1 (verified source release on Ubuntu 20.04, using conda) On Sat, 10 Jun 2023 at 22:31, Sutou Kouhei wrote: > > +1 > > I ran the followings on Debian GNU/Linux sid: > > * TEST_DEFAULT=0 \ > TEST_SOURCE=1 \ > LANG=C \ > TZ=UTC \ > CUDAToolkit_ROOT=/usr \ > ARROW_CMA

Re: Converting Pandas DataFrame <-> Struct Array?

2023-06-13 Thread Joris Van den Bossche
I think your original code roundtripping through RecordBatch (`pa.RecordBatch.from_pandas(df).to_struct_array()`) is the best option at the moment. The RecordBatch<->StructArray part is a cheap (zero-copy) conversion, and by using RecordBatch.from_pandas, you can rely on all pandas<->arrow conversi

Re: [Python] Dataset scanner fragment skip options.

2023-06-13 Thread Joris Van den Bossche
On Mon, 12 Jun 2023 at 21:30, Jerald Alex wrote: > > hi Weston, > > Thank you so much for taking the time to respond. Really appreciate it. > > I'm using parquet files. So would it be possible to elaborate the below.? I > cannot seem to find any documentation for ParquetFileFragment. > > "there ma

Re: [ANNOUNCE] New Arrow PMC member: Jie Wen (jakevin / jackwener)

2023-06-13 Thread Joris Van den Bossche
Congratulations! On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido wrote: > > Congratulations Jie!!! > > El lun, 12 jun 2023, 20:35, Matt Topol escribió: > > > Congrats Jie! > > > > On Sun, Jun 11, 2023 at 9:20 AM Andrew Lamb wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow ha

[Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Joris Van den Bossche
Hi all, Bringing up https://github.com/apache/arrow/issues/35746 to the mailing list: this issue proposes to bump the default Parquet version we use for writing to Parquet files in the C++ library (and in the various bindings including pyarrow and R arrow) from the current default of "2.4" to "2.6

Re: [Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Joris Van den Bossche
t; Ian > > On Thu, Jun 15, 2023 at 12:25 PM Joris Van den Bossche > wrote: > > > > Hi all, > > > > Bringing up https://github.com/apache/arrow/issues/35746 to the > > mailing list: this issue proposes to bump the default Parquet version > > we use for writin

Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington

2023-06-23 Thread Joris Van den Bossche
Congrats Dewey! On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens wrote: > > Well deserved! Congratulations Dewey! > > Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > > > Congratulations Dewey! > > > > On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > > wrote: > > > > > > Congrats Dewey!! > > > > >

Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-04 Thread Joris Van den Bossche
Congrats Kevin! On Tue, 4 Jul 2023 at 13:47, David Li wrote: > > Welcome Kevin! > > On Tue, Jul 4, 2023, at 05:55, Raúl Cumplido wrote: > > Congratulations Kevin!!! > > > > El mar, 4 jul 2023 a las 3:32, Weston Pace () > > escribió: > >> > >> Congratulations Kevin! > >> > >> On Mon, Jul 3, 2023

Re: Do we need CODEOWNERS ?

2023-07-04 Thread Joris Van den Bossche
I think it can be useful in certain cases, where the selection is specific enough (for example if all Go related PRs is not too much for Matt, this features sounds useful for him. I can also imagine if you are working on flight, just getting notifications for changes to the flight-related files mig

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-08-22 Thread Joris Van den Bossche
+1 On Mon, 21 Aug 2023 at 19:33, Weston Pace wrote: > > +1 > > Thanks to all for the discussion and thanks to Ben for all of the great > work. > > > On Mon, Aug 21, 2023 at 9:16 AM wish maple wrote: > > > +1 (non-binding) > > > > It would help a lot when processing UTF-8 related data! > > > > Xu

Re: [VOTE][Format] Variable shape tensor canonical extension type

2023-10-06 Thread Joris Van den Bossche
Worth noting that here were some minor changes made to the spec while the vote was active: - The "uniform_dimensions" metadata key was removed, since this can also be inferred from the "uniform_shape" information - The shape of non-constant dimensions in the "uniform_shape" entry is now represente

Re: [Vote][Format] (new proposal) C data interface format string for ListView and LargeListView arrays

2023-10-07 Thread Joris Van den Bossche
+1 On Sat, 7 Oct 2023 at 10:44, Antoine Pitrou wrote: > > > +1 from me. > > But I also reiterate my plea that these existing parsers get fixed so as > to entirely validate the format string instead of stopping early. > > Regards > > Antoine. > > > Le 06/10/2023 à 23:26, Felipe Oliveira Carvalho a

Re: [ANNOUNCE] New Arrow PMC member: Jonathan Keane

2023-10-14 Thread Joris Van den Bossche
Congratulations! On Sat, 14 Oct 2023 at 20:02, Matt Topol wrote: > > Congrats Jon!!! > > On Sat, Oct 14, 2023, 1:42 PM David Li wrote: > > > Congrats Jon! > > > > On Sat, Oct 14, 2023, at 13:25, Ian Cook wrote: > > > Congratulations Jonathan! > > > > > > On Sat, Oct 14, 2023 at 13:24 Andrew Lamb

Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-17 Thread Joris Van den Bossche
Welcome to the team, Curt! On Mon, 16 Oct 2023 at 23:17, Curt Hagenlocher wrote: > > Thanks, all! > > On Mon, Oct 16, 2023 at 9:19 AM Dane Pitkin > wrote: > > > Congrats Curt! > > > > On Mon, Oct 16, 2023 at 12:00 PM Kevin Gurney > > > > wrote: > > > > > Congratulations, Curt! > > > ___

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-19 Thread Joris Van den Bossche
+1 On Wed, 18 Oct 2023 at 23:33, Jonathan Keane wrote: > > +1 > > -Jon > > > On Wed, Oct 18, 2023 at 2:26 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > +1 > > > > On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington > > wrote: > > > > > +1! > > > > > > On Wed, Oct 18, 2023 at 2:1

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-26 Thread Joris Van den Bossche
Congrats! On Wed, 25 Oct 2023 at 08:23, Ian Joiner wrote: > > Congrats! > > On Mon, Oct 23, 2023 at 2:33 AM Sutou Kouhei wrote: > > > On behalf of the Arrow PMC, I'm happy to announce that Xuwei Fu > > has accepted an invitation to become a committer on Apache > > Arrow. Welcome, and thank you f

Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Joris Van den Bossche
Congrats, and thanks for all the releases you've already managed! Joris On Tue, 14 Nov 2023 at 08:15, Alenka Frim wrote: > > Yay! Congratulations Raul!!! > > On Tue, Nov 14, 2023 at 6:33 AM Vibhatha Abeykoon > wrote: > > > Congratulations Raúl !!! > > > > On Tue, Nov 14, 2023 at 10:54 AM wish m

Re: [ANNOUNCE] New Arrow committer: James Duong

2023-11-15 Thread Joris Van den Bossche
Congrats! On Thu, 16 Nov 2023 at 08:44, Sutou Kouhei wrote: > > On behalf of the Arrow PMC, I'm happy to announce that James Duong > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > -- > kou > >

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-11 Thread Joris Van den Bossche
Congrats Felipe! ;) On Mon, 11 Dec 2023 at 05:41, Alenka Frim wrote: > > Congratulations Felipe!! > > On Fri, Dec 8, 2023 at 12:25 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > Thank you everyone! > > > > -- > > Felipe > > github.com/felipecrv > > > > On Thu, Dec 7, 2023 at 11

Re: [VOTE] Release Apache Arrow 14.0.2 - RC3

2023-12-14 Thread Joris Van den Bossche
+1 Successfully verified C++ and Python source and Python wheels on Ubuntu 20.04. On Wed, 13 Dec 2023 at 22:40, Raúl Cumplido wrote: > > Hi, > > A couple of minor nits for the release verification. > > The PR with the verification tasks [1] shows a couple of binary > verification failures. > > "

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Joris Van den Bossche
+1 (binding) On Fri, 1 Mar 2024 at 22:18, Sutou Kouhei wrote: > > +1 > > In > "[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project" > on Fri, 1 Mar 2024 06:33:08 -0500, > Andrew Lamb wrote: > > > Hello, > > > > As we have discussed[1][2] I would like to vote on the prop

Re: [INFO] Arrow 16.0.0 feature freeze - 8th April

2024-03-14 Thread Joris Van den Bossche
On Thu, 14 Mar 2024 at 17:28, Adam Lippai wrote: > > Pandas and NumPy will have major releases in the next month or so. Tracking > each other’s timelines might help avoiding unexpected breaks. > Yes, we are aware of that. Here is the issue for numpy 2.0 compatibility: https://github.com/apache/ar

[DISCUSS] Expanding the Arrow PyCapsule Protocol with (non-CPU) Device support

2024-03-26 Thread Joris Van den Bossche
Hi all, Last year, we defined a protocol exposing the C Data Interface (schema, array and stream) in Python through PyCapsule objects and dunder methods `__arrow_c_schema/array/stream__` (https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). A bit earlier last year, we als

Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-09 Thread Joris Van den Bossche
I am also in favor of this idea in general and in the principle, but (somewhat repeating others) I think we should be aware that this will create _more_ work overall for releasing (refactoring release scripts (at least initially), deciding which version to use for which component, etc), and not les

Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread Joris Van den Bossche
Congrats! On Thu, 11 Apr 2024 at 22:56, Sarah Gilmore wrote: > > Thank you everyone! It's been awesome working with everyone and look > forwarding to continuing to do so! 😄 > > From: Ian Cook > Sent: Thursday, April 11, 2024 2:43 PM > To: dev@arrow.apache.org >

Re: [VOTE] Release Apache Arrow 16.0.0 - RC0

2024-04-17 Thread Joris Van den Bossche
+1 (binding) Tested source with conda on Ubuntu On Wed, 17 Apr 2024 at 16:28, Vibhatha Abeykoon wrote: > > I executed the following > > # Verifying C++ > > ```bash > TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.0.0 0 > ``` > > # Verifying C++ and Python > > ```bash > TEST_DEFAULT=0

Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Joris Van den Bossche
+1 (binding) On Tue, 30 Apr 2024 at 19:52, Jacob Wujciak wrote: > +1 (non-binding) > > Am Di., 30. Apr. 2024 um 17:48 Uhr schrieb Weston Pace < > weston.p...@gmail.com>: > > > +1 (binding) > > > > On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc wrote: > > > > > Thanks for all the reviews and comment

[ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Joris Van den Bossche
On behalf of the Arrow PMC, I'm happy to announce that Dane Pitkin has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Joris

Re: [VOTE][Format] Opaque canonical extension type

2024-07-24 Thread Joris Van den Bossche
+1 (binding) On Wed, 24 Jul 2024 at 07:34, David Li wrote: > > Hello, > > I'd like to propose the 'Opaque' canonical extension type. Prior discussion > can be found at [1] and the proposal and implementations for C++, Go, Java, > and Python can be found at [2]. The proposal is additionally repr

Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-06 Thread Joris Van den Bossche
+1 (binding) On Tue, 6 Aug 2024 at 17:41, Matt Topol wrote: > > +1 (binding) > > On Tue, Aug 6, 2024 at 11:40 AM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > +1 (non-binding) > > > > -- > > Felipe > > > > On Tue, Aug 6, 2024 at 6:24 AM Gang Wu wrote: > > > > > +1 (non-binding)

Re: [VOTE] Split Go release process

2024-08-27 Thread Joris Van den Bossche
+1 (binding) On Mon, 26 Aug 2024 at 09:56, Antoine Pitrou wrote: > > +1 (binding) > > Le 26/08/2024 à 04:37, Sutou Kouhei a écrit : > > Hi, > > > > I would like to propose splitting Go release process. > > > > Motivation: > > > > * We want to reduce needless major releases because major > >re

Re: [VOTE] Allow Decimal32 and Decimal64 bitwidths in Arrow Format

2024-09-05 Thread Joris Van den Bossche
+1 (binding) On Fri, 6 Sept 2024 at 03:57, Gang Wu wrote: > > +1 (non-binding) > > On Fri, Sep 6, 2024 at 3:57 AM Sutou Kouhei wrote: > > > +1 (binding) > > > > In > > "[VOTE] Allow Decimal32 and Decimal64 bitwidths in Arrow Format" on Wed, > > 4 Sep 2024 17:17:49 -0400, > > Matt Topol wro

[DISCUSS] Monorepo GitHub workflow: allow one issue with multiple PRs

2024-09-11 Thread Joris Van den Bossche
Hi all, This is a discussion specifically for the GitHub development workflow we use in the monorepo, i.e. https://github.com/apache/arrow/ We have the unwritten(?) (but implicitly implied by our tooling) rule that we always should have one issue for one PR to close that issue. I would like to di

Re: Standard/Reserved Metadata keys

2021-04-13 Thread Joris Van den Bossche
On Thu, 8 Apr 2021 at 19:52, Micah Kornfield wrote: > 1. Do the standard libraries handle that metadata key automatically ? > > C++, Python and Java have facilities to support them automatically > (extensions needs to register themselves), I'm not sure about other > languages. > > 2. Are there

Re: [VOTE] Move Rust components to new repos and process

2021-04-15 Thread Joris Van den Bossche
+1 (non-binding) Joris On Thu, 15 Apr 2021 at 15:42, Wes McKinney wrote: > +1 (binding) > > On Thu, Apr 15, 2021 at 7:31 AM Weston Steimel > wrote: > > > > +1 > > > > On Thu, 15 Apr 2021 at 00:05, Andy Grove wrote: > > > > > This vote is to determine if the Arrow PMC is in favor of the Rust >

Re: [Python] Custom Metadata in PyArrow

2021-04-26 Thread Joris Van den Bossche
On Fri, 23 Apr 2021 at 14:50, Michael Lavina wrote: > Hello Team, > > The docs for Custom Metadata in PyArrow say TODO > https://arrow.apache.org/docs/python/data.html#custom-schema-and-field-metadata > So I am wondering if someone has any example of adding some custom > metadata to PyArrow. > W

Re: Issue with pyarrow v4.0.0 - Write parquet files with non str datatypes

2021-04-27 Thread Joris Van den Bossche
Hi Jorge, How did you install pyarrow 4.0.0? The error you show typically points to an installation issue (eg built with a wrong numpy) Best, Joris On Tue, 27 Apr 2021 at 16:47, Jorge Alarcon wrote: > Hi everybody, > > > > Please, there is an issue with pyarrow (version 4.0.0) when you try to

Re: [Python] Who has been able to use PyArrow 4.0.0?

2021-04-28 Thread Joris Van den Bossche
On Wed, 28 Apr 2021 at 10:05, Ying Zhou wrote: > > On the other hand a Conda installation is not even possible. Does anyone > know what’s going on? > For conda installation: the conda packages for pyarrow 4.0 were uploaded around 1 hour ago, so this should now be possible. Joris > > Ying

Re: [VOTE] Register media types (MIME types) for Apache Arrow formats to IANA

2021-05-04 Thread Joris Van den Bossche
+1 On Tue, 4 May 2021 at 13:41, Weston Pace wrote: > Per ARROW-7396 I would like to propose an application to the IANA to > register media types for the Arrow IPC formats (both file and > streaming). > > The proposed application is available as [1]. It is based on previous > discussion in a dra

Re: New style in documentation on the website looks great

2021-05-04 Thread Joris Van den Bossche
Thanks, I am happy that people like it! It's a slightly customized version of the pydata-sphinx-theme , to feature a single sidebar and some custom colors. Concrete feedback is certainly welcome (I am no design expert ;)). Joris On Sun, 2 May 2021 at

Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-06 Thread Joris Van den Bossche
Congrats! On Thu, 6 May 2021 at 07:03, Weston Pace wrote: > Congratulations Ben! > > On Wed, May 5, 2021 at 6:48 PM Micah Kornfield > wrote: > > > Congrats! > > > > On Wed, May 5, 2021 at 4:33 PM David Li wrote: > > > > > Congrats Ben! Well deserved. > > > > > > Best, > > > David > > > > > > O

Re: [Format] Timestamp timezone semantics?

2021-06-02 Thread Joris Van den Bossche
On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou wrote: > > Hello, > > For the first time I notice this piece of information about the > timestamp type: > > > /// * If the time zone is set to a valid value, values can be > displayed as > > /// "localized" to that time zone, even though the under

Re: [NIGHTLY] Arrow Build Report for Job nightly-2021-06-06-0

2021-06-07 Thread Joris Van den Bossche
The three "test-ubuntu-18.04-cpp" failing builds are due to a Gandiva test case failure, for which I opened https://issues.apache.org/jira/browse/ARROW-12987 (and Anthony is already fixing it). Looking into the "kartothek" failures (for which I opened https://issues.apache.org/jira/browse/ARROW-12

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-08 Thread Joris Van den Bossche
Hi Li, It's correct that arrow uses "None" for null values when converting a string array to numpy / pandas. As far as I am aware, there is currently no option to control that (and to make it use np.nan instead), and I am not sure there would be much interest in adding such an option. Now, I know

Re: Representation of "null" values for non-numeric types in Arrow/Pandas interop

2021-06-09 Thread Joris Van den Bossche
> > As a workaround, the "fill_null" compute function can be used to replace > nulls with nans: > > >>> nan = pa.scalar(np.NaN, type=pa.float64()) > >>> pa.Array.from_pandas(s).fill_null(nan).to_pandas() > > On Tue, Jun 8, 2021, 16:15 Joris Van den Bo

[Discuss] Handling timezones in (C++) compute kernels for timestamp data

2021-06-10 Thread Joris Van den Bossche
Hi all, There was recently a discussion on the interpretation of the spec about the "timezone" field of timestamp type (and different timestamp-related types that Arrow should have). See https://lists.apache.org/thread.html/r017084eed74edbc95810fc049056570f45b0bb034d6eeadd647e8621%40%3Cdev.arrow.a

Re: [Discuss] Handling timezones in (C++) compute kernels for timestamp data

2021-06-10 Thread Joris Van den Bossche
On Thu, 10 Jun 2021 at 18:06, Antoine Pitrou wrote: > > On Thu, 10 Jun 2021 17:33:23 +0200 > Joris Van den Bossche wrote: > > > > We just merged a PR to add some kernels to extract fields from timestamps > > (year, month, day, hour, etc -> ARROW-11759 > > &

Re: [Format][Important] Needed clarification of timezone-less timestamps

2021-06-14 Thread Joris Van den Bossche
On Mon, 14 Jun 2021 at 17:57, Antoine Pitrou wrote: > > ... > > Joris' interpretation is that timestamp *values* are expressed in an > arbitrary "local time" that is unknown and unspecified. It is therefore > difficult to exactly interpret them, since the timezone information is > unavailable. > >

Re: [Format][Important] Needed clarification of timezone-less timestamps

2021-06-15 Thread Joris Van den Bossche
Some inline answers to Weston's email below: On Tue, 15 Jun 2021 at 07:34, Weston Pace wrote: > ... > Let's pretend two astronomers observe a meteoroid impact on the moon. > We are talking about two different ways they can record the time. The > first method, universal time, is done by recording

Re: [Format][Important] Needed clarification of timezone-less timestamps

2021-06-15 Thread Joris Van den Bossche
On Mon, 14 Jun 2021 at 21:57, Adam Hooper wrote: > > On Mon, Jun 14, 2021 at 3:25 PM Weston Pace wrote: > > > > So it's wrong to put "timezone=UTC", because in Arrow, the 'timezone" > > field > > > means, "how the data is *displayed*." The data isn't displayed as UTC. > > > > I don't think users

Re: [Format][Important] Needed clarification of timezone-less timestamps

2021-06-15 Thread Joris Van den Bossche
On Tue, 15 Jun 2021 at 10:11, Antoine Pitrou wrote: > > > Le 15/06/2021 à 09:31, Joris Van den Bossche a écrit : > > > > (but I also don't fully understand your point here, as your "they > > would get the correct histogram" seems to imply a positive stat

Re: [Format][Important] Needed clarification of timezone-less timestamps

2021-06-15 Thread Joris Van den Bossche
A general observation: it might be useful to get back to the message of Julian Hyde in the previous email thread about this 2 weeks ago (https://lists.apache.org/thread.html/r5a89aa20b1cb812dc01a3817a5bfb365971577986d586dcc7ee21e72%40%3Cdev.arrow.apache.org%3E). Quoting part of that email: On Wed,

Re: [Python] Drop Python 3.6 and Numpy 1.16 support?

2021-06-24 Thread Joris Van den Bossche
Note that the last bug-fix release of Python 3.6 already happened at 2018-12-11 (3.6.8 release), and since then it's only supported for source-only and security-only releases. But agreed with Antoine that it's currently not a big burden to keep Python 3.6 a bit longer. With the change of the relea

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Joris Van den Bossche
+1 On Thu, 24 Jun 2021 at 21:21, Micah Kornfield wrote: > +1 (binding) > > On Thu, Jun 24, 2021 at 12:17 PM Weston Pace > wrote: > > > The discussion in [1] led to the following proposal which I would like > > to submit for a vote. > > > > --- > > Arrow allows a timestamp column to omit the tim

Re: [Python] ascii_trim bug & documentation

2021-07-01 Thread Joris Van den Bossche
Hi, I think this is fixed on the master branch. With master, I get: >>> pc.utf8_trim("aba", characters="a") >>> pc.ascii_trim("aba", characters="a") while with released pyarrow 4.0.1, the second one indeed raises an error (not directly sure when it was fixed). For the documentation, contribut

Re: [python] [iter_batches] Is there any value to an iterator based parquet reader in python?

2021-07-05 Thread Joris Van den Bossche
There is a recent JIRA where a row-wise iterator was discussed: https://issues.apache.org/jira/browse/ARROW-12970. This should not be too hard to add (although there is a linked JIRA about improving the performance of the pyarrow -> python objects conversion, which might require some more engineer

Re: [python] [iter_batches] Is there any value to an iterator based parquet reader in python?

2021-07-06 Thread Joris Van den Bossche
; implementation to better leverage row groups, without the need to keep in > memory the whole Table when you are iterating over data. While the current > jira issue seems to suggest the implementation for Table once it's already > fully available. > > On Tue, Jul 6, 2021 at

Re: [Rust] Eliminate Timezone field from Timestamp types?

2021-07-07 Thread Joris Van den Bossche
On Wed, 7 Jul 2021 at 18:46, Jorge Cardoso Leitão wrote: > Hi, > > AFAIK timezone is part of the spec. And for reference, the current spec (Schema flatbuffer file) for timestamp is at https://github.com/apache/arrow/blob/6c8d30ea8fd2750b999840872d3f6cbdc8f8/format/Schema.fbs#L217-L247. >

  1   2   3   4   >