Re: ARROW-11465

2022-05-18 Thread Jacques Nadeau
I second Weston's comments. The idea of separate files is part of the de jure spec but not the de facto one. It's up to the parquet community whether the de facto spec should be "altered" . Afaik, zero oss readers support use of this field. On Wed, May 18, 2022, 8:53 AM Weston Pace wrote: > I

Re: Merge a pull request with GitHub API

2022-05-18 Thread Sutou Kouhei
Hi, > I assume that the GH API approach would be able to preserve the > author/co-author attribution Yes. https://github.com/apache/arrow/pull/13184 does it. https://github.com/apache/arrow/commit/6faee474db2ff246f798d63b79a3100a1f15204c was merged by https://github.com/apache/arrow/pull/13184 .

Re: Merge a pull request with GitHub API

2022-05-18 Thread Sutou Kouhei
Hi, Thanks for the information! We enable only "squash" merge by https://issues.apache.org/jira/browse/INFRA-17869 but we should use .asf.yml when need to change it. Thanks, -- kou In "Re: Merge a pull request with GitHub API" on Wed, 18 May 2022 10:16:17 +0200, Jarek Potiuk wrote: > Ju

Re: Merge a pull request with GitHub API

2022-05-18 Thread Sutou Kouhei
Hi, > we should just ensure that commits are squashed > and rebased on top of the main/master branch. We can do this by specifying "squash" to "merge_method" parameter: https://docs.github.com/en/rest/pulls/pulls#merge-a-pull-request https://github.com/apache/arrow/pull/13184/files#diff-4aea0167

Re: Merge a pull request with GitHub API

2022-05-18 Thread Wes McKinney
One of the benefits of the current merge script is that the PR description is preserved (maybe this could be possible with this method) — authors and co-authors are preserved by the explicit by-lines, e.g. Lead-authored-by: Nic Crane Co-authored-by: Ian Cook Signed-off-by: Ian Cook I assume th

Re: ARROW-11465

2022-05-18 Thread Weston Pace
I can try and clarify my earlier feedback: This is an Arrow datasets question if your goal is to create multiple independent parquet files, each one a complete file, and read them as a combined dataset. This is not an Arrow question (but instead a parquet question) if your goal is to create a sing

ARROW-11465

2022-05-18 Thread Jeszy
Hello, I wanted to circle back to this topic and make sure there's a decision by the community. Although there was sporadic discussion over jira[1], the PR[2], and this list[3] in the past, the messaging across these channels changed over time. E.g. while the PR comment is negative, the much more

Re: Merge a pull request with GitHub API

2022-05-18 Thread Raul Cumplido Dominguez
I like the idea. As a new contributor is something that also confused me. It also has the side effect of easily identifying PRs that have been merged vs PRs that have been closed without merging which require some more investigation with the current workflow. On Wed, May 18, 2022 at 10:16 AM Jarek

Re: Merge a pull request with GitHub API

2022-05-18 Thread Jarek Potiuk
Just a small comment here - (friendly comment from a visitor :). If you are following squash & rebase workflow - in Apache Airflow we exclusively merge with GitHub UI's merge. You can configure .asf.yml to only allow "squash & rebase" and then squashing and rebasing happens automatically when you

Re: Merge a pull request with GitHub API

2022-05-18 Thread Antoine Pitrou
That sounds ok to me, we should just ensure that commits are squashed and rebased on top of the main/master branch. (also, the commit title and description should inherit the PR's corresponding fields) Le 18/05/2022 à 05:43, Sutou Kouhei a écrit : Hi, How about using GitHub API instead