Re: [DISCUSS] Other website improvements?

2020-07-27 Thread kekronbekron
Please consider adding benchmarks or comparisons of sorts. I don't know how doable this is but .. it'll be good to drive people away from other storage formats in favour or Arrow/Parquet enterprise-wide. - KB ‐‐‐ Original Message ‐‐‐ On Tuesday, July 28, 2020 3:17 AM, Wes McKinney wrote

[DISCUSS] Support of higher bit-width Decimal type

2020-07-27 Thread Micah Kornfield
Hi Arrow Dev, ZetaSQL (Google's open source standard SQL library) recently introduced a BigNumeric [1] type which requires a 256 bit width to properly support it. I'd like to add support (possibly in collaboration with some of my colleagues) to add support for 256 bit width Decimals in Arrow to sup

Re: Gandiva and Threads

2020-07-27 Thread Matt Youill
Managed to track down the issue (sort of). I removed a call to set_chunksize on TableBatchReader where the chunk size was less than the number of rows in a table being read. Runs fine now (tested with 100s threads over mils of rows). Strangely, Gandiva fails if I don't call set_chunksize for

Re: [DISCUSS] Other website improvements?

2020-07-27 Thread Neal Richardson
My outstanding to-do list from this iteration of the website includes: * Richer "use cases" page with more examples, ideally with runable code but also good to have more links to blog posts (including on sites other than arrow.apache.org) showcasing the use cases * Better "getting started" user gu

Re: Gandiva and Threads

2020-07-27 Thread Wes McKinney
Crashing when running from multiple threads doesn't sound right, perhaps there are some missing synchronizations in internal data structures. Could you open a JIRA issue and show the backtraces of any crashes or other clues about how to reproduce the issues? On Sun, Jul 26, 2020 at 8:12 PM Matt Yo

[DISCUSS] Other website improvements?

2020-07-27 Thread Wes McKinney
Thanks to Neal and others who helped with the website overhaul for the 1.0.0 release, definitely feels like a big improvement to me. What other things would we like to do with the website? For example, it might be nice to have a "call out" on the front page to a news item like the most recent code

Re: [ext] Re: language independent representation of filter expressions

2020-07-27 Thread Wes McKinney
I am OK with using the .proto files for now while the serialization protocol is in development and focusing on capturing the functional requirements and leaving the Protobuf vs Flatbuffers debate for later. I don't think that JSON is an adequate substitute, because if Protobuf is the desired / off

Re: Next library version

2020-07-27 Thread Wes McKinney
Yes, I do not think we will frequently (if ever) make MINOR library releases in SemVer parlance, so the next non-patch release should be 2.0.0 On Mon, Jul 27, 2020 at 5:10 AM Krisztián Szűcs wrote: > > Hi, > > During the release process I set up the next version to 1.1.0. > > Wes has noted that t

Re: Versioning of arrow

2020-07-27 Thread Wes McKinney
Yes, the TL;DR is that we do not at this time intend to make minor LIBRARY releases in SemVer parlance, even if there are no backwards incompatible changes. Either we will make Major releases or Patch releases of the libraries. We will likely make minor releases of the columnar protocol, though. T

Re: Versioning of arrow

2020-07-27 Thread Neal Richardson
https://arrow.apache.org/docs/format/Versioning.html is the statement that came from the resolution of the previous discussion. IIRC the discussion came between the 0.15 and 0.16 releases, if you want to search the mailing list archives. I wouldn't want to speak for everyone, but I believe there a

Versioning of arrow

2020-07-27 Thread Jorge Cardoso Leitão
Hi First off, congrats for the 1.0.0 release! I am writing because I am trying to understand the versioning schema we will use going onwards. AFAI understand, 1.0.0 was assigned to all subcomponents of arrow. I.e. I can now use pyarrow and assign something like >=1,<2 on a setup.py. However, lo

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-27 Thread Neal Richardson
CRAN has accepted the R package: https://cran.r-project.org/web/packages/arrow/index.html I'll stay on top of the Homebrew formula but no guarantees when the maintainers will accept it. Neal On Mon, Jul 27, 2020 at 2:36 AM Krisztián Szűcs wrote: > Updated checklist: > > 1. [done] rebase mast

Re: Introducing Cylon

2020-07-27 Thread Niranda Perera
Hi Micah, Thank you very much for raising these questions. We are further analyzing the reasons for Cylon's performance improvement. We believe the main reason is using Arrow and columnar format and it helps our shuffleByIndex-compute-recreateData approach (more like BSP). And we are getting nati

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-27-0

2020-07-27 Thread Crossbow
Arrow Build Report for Job nightly-2020-07-27-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-27-0 Failed Tasks: - conda-linux-gcc-py36-cpu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-27-0-azure-conda-linux-gcc-py36-cp

Next library version

2020-07-27 Thread Krisztián Szűcs
Hi, During the release process I set up the next version to 1.1.0. Wes has noted that the next version should be 2.0.0 in correspondence with the versioning documentation [1]. I assume we're going to make incompatible API changes in the upcoming releases, so I agree that we should prefer major re

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-27 Thread Krisztián Szűcs
Updated checklist: 1. [done] rebase master 2. [done] upload source 3. [done] upload binaries 4. [done] update website 5. [done] upload ruby gems 6. [wontdo] upload js packages 8. [done] upload C# packages 9. [done] upload rust crates 10. [pending-pr] update conda recipes 11. [done] upload

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-27 Thread Krisztián Szűcs
On Fri, Jul 24, 2020 at 7:43 PM Andy Grove wrote: > > Rust crates are published. This went very smoothly this time. Thanks Andy! > > On Fri, Jul 24, 2020 at 11:26 AM Andy Grove wrote: > > > Nver mind, I figured it out ... > > https://downloads.apache.org/arrow/arrow-1.0.0/ > > > > On Fri, Jul 24,

Re: [VOTE] Release Apache Arrow 1.0.0 - RC2

2020-07-27 Thread Krisztián Szűcs
On Fri, Jul 24, 2020 at 11:51 PM Sutou Kouhei wrote: > > I'll update MSYS2 package: > > 1. [done] rebase master > 2. [done] upload source > 3. [done] upload binaries > 4. [done/PR-ready] update website > 5. [done] upload ruby gems > 6. [ ] upload js packages > 8. [done] upload C# packages >