Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Mike Seddon
Thanks Micah. It is actually Rust implementation that is the odd one out. Ideally adding a metadata KeyValue to the RecordBatch plus your suggested 'reserved' key would be the best option. On Thu, Feb 25, 2021 at 3:26 PM Micah Kornfield wrote: > Thanks for looking into it. I would guess it is l

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Micah Kornfield
Thanks for looking into it. I would guess it is likely possible "hoist" metadata from a record batch schema object to the Message but understand if it isn't something you want to pursue. On Wed, Feb 24, 2021 at 8:19 PM Mike Seddon wrote: > Hi Micah, > Thank you for providing this information. I

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Mike Seddon
Hi Micah, Thank you for providing this information. I have reviewed the documentation you provided and have a few conclusions: 1. RecordBatch does not have the capability to attach user defined metadata (KeyValue attributes): https://github.com/apache/arrow/blob/master/format/Message.fbs#L83 2. Sc

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Micah Kornfield
The process would be to create a PR proposal to update to the custom metadata specification [1] to reserve a new word and describe its use. Then send a [DISCUSS] email on this list. Once there is consensus we can formally vote and merge the change. [1] https://github.com/apache/arrow/blob/master/

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Mike Seddon
Thanks for both of your comments. @Andrew Schema.metadata does look like a logical place to house the information so that would solve part of the problem. Do you have any thoughts on whether we change the function signature: From: Result + Send + Sync>; To: Result + Send + Sync>; @Micah It w

Re: [Proposal] Allow source-only release vote for patch releases

2021-02-24 Thread Wes McKinney
I'm supportive of this plan. I think it's been previously discussed to allow Rust to do its own minor/point releases (which presumably would try to obey SemVer) so that also seems fine to me. On Wed, Feb 24, 2021 at 4:11 PM Andy Grove wrote: > > Thanks for writing this up, Neal. I think this is a

Re: [Proposal] Allow source-only release vote for patch releases

2021-02-24 Thread Andy Grove
Thanks for writing this up, Neal. I think this is a pragmatic solution and I support this approach. The only risk I see is that there may be a temptation to use this for more than bug fixes and as a way to bypass the official release process but I think the committers just need to be careful about

[Proposal] Allow source-only release vote for patch releases

2021-02-24 Thread Neal Richardson
Hi all, We've had some discussion about ways to reduce the cost of releasing and ways to allow maintainers of subprojects to make more frequent maintenance releases. Specifically, see these two recent mailing list threads: * https://lists.apache.org/thread.html/rf43d270b4dde2dce601c69fdbb0ab9e7412

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Micah Kornfield
At least C++ (and the IPC format) a schema can be shared across the many RecordBatch's which might have different sources. It might be useful to define a reserved metadata key (similar to extension types) so that the data can be interpreted consistently. On Wed, Feb 24, 2021 at 11:29 AM Andrew L

Re: [Rust][DataFusion] Supporting input_file_name()

2021-02-24 Thread Andrew Lamb
I wonder if you could add the file_name as metadata on the `Schema` of the RecordBatch rather than the RecordBatch itself? Since every RecordBatch has a schema, I don't fully understand the need to add something additional to the RecordBatch https://docs.rs/arrow/3.0.0/arrow/datatypes/struct.Schem

Apache Arrow Rust Sync Call 2/24/2021

2021-02-24 Thread Andy Grove
Attendees - Andy Grove - Mike Seddon - Andrew Lamb - Fernando Herrera - Neville Dipale - Remi Dettai Topics Discussed - JIRA automation to reduce burden for contributors. Andrew is going to work on this. - Discussed the increasing num

Re: DataFusion Postgres License Requirements

2021-02-24 Thread Wes McKinney
I think as long as it is clear which sources contain derived third party works it is okay, but it is important to keep them contained to certain files and clearly marked so that if someone else were to derive a third party work from Apache Arrow that they will know that the original Postgres licens

Re: [C++] adopting an SIMD library - xsimd

2021-02-24 Thread Antoine Pitrou
For the record, a PR is now up for bundling xsimd with Arrow C++: https://github.com/apache/arrow/pull/9556 Regards Antoine. On Tue, 9 Feb 2021 11:14:45 +0800 Yibo Cai wrote: > This topic was talked in an earlier thread [1], but not landed yet. > > PR https://github.com/apache/arrow/pull/94

[NIGHTLY] Arrow Build Report for Job nightly-2021-02-24-0

2021-02-24 Thread Crossbow
Arrow Build Report for Job nightly-2021-02-24-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-24-0 Failed Tasks: - test-build-vcpkg-win: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-24-0-github-test-build-vcpkg-w