Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-12 Thread Micah Kornfield
Hi Jorge, I think it would make sense to add some clarifications to the document per Wes's comments. Do you want to maybe try to make a PR? One small edge case to consider is how NaN float values are compared. -Micah On Thu, Nov 12, 2020 at 8:44 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-12 Thread Micah Kornfield
I skimmed through and this seems like a clean design (I would have to reread the PR to do a comparison. A few thoughts of the top of my head: > - Multiple internal classes are left public in header files, where it > would be > preferred that public classes be kept to a minimum. I think some o

Re: Request “Contributor” permissions

2020-11-12 Thread Sutou Kouhei
Done! -- kou In "Request “Contributor” permissions" on Thu, 12 Nov 2020 19:34:22 -0800, Bill Zhao wrote: > Hi, > > I would like to assign a Jira ticket to myself. The ticket is: > > https://issues.apache.org/jira/browse/ARROW-10574 > > Can a project maintainer please grant me the "Contr

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-12 Thread Jorge Cardoso Leitão
Hi Wes, Thanks a lot. I agree. My question is whether we should make it explicit in the specification. AFAIK, "if the data represented in the slot is equal" depends on the datatype: for variable sized arrays with offsets (e.g. strings), the equality of slot i is something along the lines of: star

Request “Contributor” permissions

2020-11-12 Thread Bill Zhao
Hi, I would like to assign a Jira ticket to myself. The ticket is: https://issues.apache.org/jira/browse/ARROW-10574 Can a project maintainer please grant me the "Contributor" permission? Thank you, Weiyang (Bill)

Re: [Rust]: Architecture support

2020-11-12 Thread Patrick Horan
I agree with Andrew.  If we are going be serious about supporting additional architectures we should merge these changes along with changes adding them to CI. On November 12, 2020, Andrew Lamb wrote: > In general I think expanding support for additional architectures is a > good > idea. The list

Re: Pandas Block Manager

2020-11-12 Thread Micah Kornfield
Hi Nicholas, I don't think allowing for flexibility of non 8 byte aligned types is a good idea. The specification explicitly calls out the alignment requirements and allowing for writers to output different non-aligned values potentially breaks other implementations. I'm not sure of your exact us

Re: Pandas Block Manager

2020-11-12 Thread Nicholas White
OK got everything to work, https://github.com/apache/arrow/pull/8644 (part of ARROW-10573 now) is ready for review. I've updated the test case to show it is possible to zero-copy a pandas DataFrame! The next step is to dig into `arrow_to_pandas.cc` to make it work automagically... On Wed, 11 Nov 2

Re: Rust ParquetReader trait

2020-11-12 Thread Andrew Lamb
Here is what I had to do to our code in IOx to adapt it to use the new Parquet interfaces -- perhaps it will be helpful to you too https://github.com/influxdata/influxdb_iox/pull/395 On Thu, Nov 12, 2020 at 8:46 AM Rémi Dettai wrote: > Here is an example: > https://gist.github.com/rdettai/950b1e

Re: [Rust]: Architecture support

2020-11-12 Thread Andrew Lamb
In general I think expanding support for additional architectures is a good idea. The list of possible features and support you describe is pretty substantial, and I suggest we are careful about taking on too much too soon. In terms of what to support, I would personally recommend waiting for some

Re: Patch Release 2.0.1?

2020-11-12 Thread Antoine Pitrou
ARROW-10519 would also be nice to have in a bugfix release. Regards Antoine. Le 11/11/2020 à 21:21, Micah Kornfield a écrit : > There are a couple of Parquet bugs that I think might warrant a patch > release. The most pressing I think is: ARROW-10493 which can potentially > lose data silentl

Re: Support for reading arbitrary nested objects

2020-11-12 Thread Micah Kornfield
Hi Renato, I would suggest reading the Arrow specification [1] which explains how nesting is handled. -Micah [1] https://arrow.apache.org/docs/format/Columnar.html On Thu, Nov 12, 2020 at 6:40 AM Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > Hi Micah, > > Thanks for the ans

Re: Support for reading arbitrary nested objects

2020-11-12 Thread Renato Marroquín Mogrovejo
Hi Micah, Thanks for the answer! yeah basically that was my question, I was not sure about the full extent of support for nested data, but it seems that both (parquet binding and arrow format) do reading/writing nested objects. Just a couple of follow up questions: - If Arrow/Feather file format

Re: Rust ParquetReader trait

2020-11-12 Thread Rémi Dettai
Here is an example: https://gist.github.com/rdettai/950b1ed3e8e2f0fc416a6e8f3659b7e6 Rusoto is kind of annoying because it's forcing you to use async...This is not the solution I ended up with because I'm calling this from DataFusion and *sync *was not playing very well with *async*. But it gives

Re: [Rust]: Architecture support

2020-11-12 Thread vertexclique vertexclique
In addition to the previous e-mail there are some changes at companies like Apple: Next-generation will ship will Aarch64 as you know. Since my local test showed that we can build on: * aarch64-unknown-linux-gnu But we need sysroot image for: * aarch64-apple-darwin Aarch64 on apple is another topi

[Rust]: Architecture support

2020-11-12 Thread vertexclique vertexclique
Hi Team; There are 3 topics fall under this: * no_std compatibility * endianness compatibility * target datapath size (32-bit/64-bit, rust naming target_pointer_width) So after the sync call yesterday, Micah said that there were efforts on that for some time at Java, C++ side. That's nice. Curren

Re: Rust ParquetReader trait

2020-11-12 Thread vertexclique vertexclique
Hi Remi; I see. I am unsure how much things need a change at our side since I haven't estimated the adaptation/refactoring needed for it as of yet. If it is possible can you share the S3 implementation that you've worked on? It will guide us to do the estimate and if possible we want to adopt the