Re: Usage of the name Feather?

2022-10-25 Thread Maarten Breddels
In my mind there were two/three formats * 2 related: IPC stream/file: native storage, everything memory mappable, slight overhead from having to read in meta data due to chunking. * feather: like IPC, but with possible compression/codecs, so non-memory mappable (at least no practical use), to pro

Re: [ANNOUNCE] New Arrow PMC member: Joris Van den Bossche

2021-11-18 Thread Maarten Breddels
Nice Joris, congratulations! On Thu, Nov 18, 2021 at 9:34 AM Nic wrote: > Congratulations, great news! > > On Thu, 18 Nov 2021 at 07:27, Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > Thanks all! > > > > On Thu, 18 Nov 2021 at 08:10, Jorge Cardoso Leitão < > > jorgecarl

Re: [ANNOUNCE] Official media types (MIME types) for Apache Arrow formats

2021-06-24 Thread Maarten Breddels
Great work, nice to see this formalized. On Thu, Jun 24, 2021 at 9:17 AM Antoine Pitrou wrote: > > Can we document them in the format docs and/or in the FAQ? > > > On Thu, 24 Jun 2021 10:47:34 +0900 (JST) > Sutou Kouhei wrote: > > Hi, > > > > The official media types (MIME types) for Apache

Re: [C++] Breakpoints and VSCode integration

2021-02-25 Thread Maarten Breddels
Hi Ying, If you manage to get the debugger to work nicely with VScode, could you share the instructions on how to do set this up? I usually just use gdb, but that can be a bit crude, would love to use a visual debugger sometimes. Regards, Maarten Breddels Software engineer / consultant / data

Re: upper() / lower() for utf8 strings

2020-12-23 Thread Maarten Breddels
that never got off the ground. So yes, if you can do what Wes suggests, that would be great. cheers, Maarten Breddels Software engineer / consultant / data scientist Python / C++ / Javascript / Jupyter www.maartenbreddels.com / vaex.io maartenbredd...@gmail.com +31 6 2464 0838 <+31+6+24640

Re: Development with C++ and Cython APIs in Arrow

2020-11-06 Thread Maarten Breddels
Another approach I took is; https://github.com/vaexio/vaex-arrow-ext But it uses pybind11, not cython. (from mobile phone) On Fri, 6 Nov 2020, 17:19 Vibhatha Abeykoon, wrote: > One more question about packaging, here when the API requires both Cython > and C++ APIs, > Pyarrow dependency must a

Re: How to autoformat cpp code?

2020-07-16 Thread Maarten Breddels
I'm using vscode with the Clang-format plugin, and configured it with: "editor.formatOnSafe": true, "clang-format.executable": "clang-format-7", clang-format-7 I installed with apt-get I think It will auto format when safe. Op do 16 jul. 2020 om 22:19 schreef Micah Kornfield : > If you ha

Re: Developing a C++ Python extension

2020-07-02 Thread Maarten Breddels
our on Windows: You don't actually link > against numpy but you statically link a set of functions that are resolved > to NumPy's function when you import numpy. Quick googling leads to > https://github.com/yugr/Implib.so which could provide something similar > for Linux. > &g

Re: Developing a C++ Python extension

2020-07-02 Thread Maarten Breddels
Ok, thanks! I'm setting up a repo with an example here, using pybind11: https://github.com/vaexio/vaex-arrow-ext and I'll just try all possible combinations and report back. cheers, Maarten Breddels Software engineer / consultant / data scientist Python / C++ / Javascript

Developing a C++ Python extension

2020-07-02 Thread Maarten Breddels
where someone installed a pyarrow 2014 wheel, or build from source, or installed from conda-forge? cheers, Maarten Breddels Software engineer / consultant / data scientist Python / C++ / Javascript / Jupyter www.maartenbreddels.com / vaex.io maartenbredd...@gmail.com +31 6 2464 0838 <+3

Sharing our experience adopting (py) Arrow in Vaex

2020-07-02 Thread Maarten Breddels
sts for 32bit offset strings arrays) Overall, we're quite positive, and as you see, the pain points are not fundamental issue, but annoyances that might be easy to fix, and make adoption smoother/faster. cheers, Maarten Breddels Software engineer / consultant / data scientist Python / C

Re: [Discuss] Extremely dubious Python equality semantics

2020-07-01 Thread Maarten Breddels
I think that if __eq__ does not return True/False exclusively, __bool__ should raise an exception to avoid these unexpected truthies. Python users are used to that due to Numpy. Op wo 1 jul. 2020 om 15:40 schreef Joris Van den Bossche < jorisvandenboss...@gmail.com>: > On Wed, 1 Jul 2020 at 09:4

Re: Using gdb on a test

2020-06-15 Thread Maarten Breddels
ooking at the loaded > shared libraries. > > François > > On Mon, Jun 15, 2020 at 10:38 AM Antoine Pitrou > wrote: > > > > > > Hi Maarten, > > > > You should build in debug mode, i.e. pass -DCMAKE_BUILD_TYPE=Debug > > > > Regards > &

Using gdb on a test

2020-06-15 Thread Maarten Breddels
o tab completion, because I assume this is not exported (although the symbol is visible using nm path/to/libarrow.so.100). Is there an easy (or hard?) way to get a breakpoint there, and what might be the reason I cannot put a breakpoint at TransformAsciiUpper. cheers, Maarten Breddels

[jira] [Created] (ARROW-9100) Add ascii_lower kernel

2020-06-11 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-9100: --- Summary: Add ascii_lower kernel Key: ARROW-9100 URL: https://issues.apache.org/jira/browse/ARROW-9100 Project: Apache Arrow Issue Type: Task

Re: [C++][Discuss] Approaches for SIMD optimizations

2020-06-09 Thread Maarten Breddels
Hi Antoine, Adding xsimd to the list of options: * https://github.com/xtensor-stack/xsimd Not sure how it compares to the rest though. cheers, Maarten

[jira] [Created] (ARROW-8865) windows distribution for 0.17.1 seems broken (conda only?

2020-05-19 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-8865: --- Summary: windows distribution for 0.17.1 seems broken (conda only? Key: ARROW-8865 URL: https://issues.apache.org/jira/browse/ARROW-8865 Project: Apache Arrow

Re: Strategy for mixing large_string and string with chunked arrays

2019-12-17 Thread Maarten Breddels
Op wo 27 nov. 2019 om 19:37 schreef Wes McKinney : > On Tue, Nov 26, 2019 at 9:40 AM Maarten Breddels > wrote: > > > > Op di 26 nov. 2019 om 15:02 schreef Wes McKinney : > > > > > hi Maarten > > > > > > I opened https://issues.apache.org/j

Re: Non-chunked large files / hdf5 support

2019-12-17 Thread Maarten Breddels
nary type. > > > > > > Another solution is to create a `FixedBuilder class where > > - the number of elements is known > > - the data type is of fixed width > > - Nullability is know (whether you need an extra buffer). > > > > I think sooner or later we'll ne

Re: [Python] Exposing compute kernels

2019-12-17 Thread Maarten Breddels
Hi Uwe, Having it in a separate package/module/namespace makes it easier to make it an optional install in the future, might that happen. Also, it would be more tab completion friendly. Cheers, Maarten > On 17 Dec 2019, at 10:24, Uwe L. Korn wrote: > > Hello all, > > we have developed quit

Non-chunked large files / hdf5 support

2019-11-26 Thread Maarten Breddels
In vaex I always write the data to hdf5 as 1 large chunk (per column). The reason is that it allows the mmapped columns to be exposed as a single numpy array (talking numerical data only for now), which many people are quite comfortable with. The strategy for vaex to write unchunked data, is to fi

Re: Strategy for mixing large_string and string with chunked arrays

2019-11-26 Thread Maarten Breddels
. Also in vaex, all the processing happens in chunks, and no chunk will ever be that large (for the near future...). In vaex, when exporting to hdf5, I always write in 1 chunk, and that's where most of my issues show up. cheers, Maarten > > - Wes > > [1]: > https:/

Strategy for mixing large_string and string with chunked arrays

2019-11-26 Thread Maarten Breddels
uld play better with pa.ChunkedArray. Regards, Maarten Breddels

[jira] [Created] (ARROW-3686) Support for masked arrays in to/from numpy

2018-11-01 Thread Maarten Breddels (JIRA)
Maarten Breddels created ARROW-3686: --- Summary: Support for masked arrays in to/from numpy Key: ARROW-3686 URL: https://issues.apache.org/jira/browse/ARROW-3686 Project: Apache Arrow Issue

[jira] [Created] (ARROW-3685) Better roundtrip between numpy and arrow binary array

2018-11-01 Thread Maarten Breddels (JIRA)
Maarten Breddels created ARROW-3685: --- Summary: Better roundtrip between numpy and arrow binary array Key: ARROW-3685 URL: https://issues.apache.org/jira/browse/ARROW-3685 Project: Apache Arrow

[jira] [Created] (ARROW-3669) pyarrow swallows big endian arrow without converting or error msg

2018-11-01 Thread Maarten Breddels (JIRA)
Maarten Breddels created ARROW-3669: --- Summary: pyarrow swallows big endian arrow without converting or error msg Key: ARROW-3669 URL: https://issues.apache.org/jira/browse/ARROW-3669 Project