Re: [DISCUSSION] New Flags for Arrow C Interface Schema

2024-04-24 Thread Keith Kraus
> I believe several array implementations (e.g., numpy, R) are able to broadcast/recycle a length-1 array. Run-end-encoding is also an option that would make that broadcast explicit without expanding the scalar. Some libraries behave this way, i.e. Polars, but others like Pandas and cuDF only broa

Re: [VOTE] C++: switch to C++17

2022-08-24 Thread Keith Kraus
+1 (non-binding) On Wed, Aug 24, 2022 at 12:12 PM David Li wrote: > +1 (binding) > > On Wed, Aug 24, 2022, at 12:06, Ivan Ogasawara wrote: > > +1 (non-binding) > > > > On Wed, Aug 24, 2022 at 12:00 PM Sasha Krassovsky < > krassovskysa...@gmail.com> > > wrote: > > > >> ++1 (non-binding) > >> > >>

Re: cmake FindPackage fails in Windows

2022-08-19 Thread Keith Kraus
t; > > -- Configuring incomplete, errors occurred! > See also "C:/temp/build/CMakeFiles/CMakeOutput.log". > (temp) PS C:\temp\build> conda list | findstr "arrow" > arrow-cpp 8.0.0 py310h38b8b19_0 > pyarrow 8.0.0 py310h26aae1b_

Re: cmake FindPackage fails in Windows

2022-08-19 Thread Keith Kraus
hat should have changed CMake handling between 8.0.0 and 8.0.1. -Keith On Fri, Aug 19, 2022 at 1:31 PM Niranda Perera wrote: > Hi Keith, > Interestingly it was working with 8.0.1. So, I am guessing 8.0.0 Windows > artifacts have been overridden by the point release? > > On Fri,

Re: cmake FindPackage fails in Windows

2022-08-19 Thread Keith Kraus
Hey Niranda, Could you share exactly which 8.0.0 package you have installed? The output of `conda list arrow-cpp` should show it. On Thu, Aug 18, 2022 at 9:37 PM Niranda Perera wrote: > This issue is not there in v9.0.0 as well. > > On Thu, Aug 18, 2022 at 9:34 PM Niranda Perera > wrote: > > >

Re: DISCUSS: [C++] Switch to C++17

2022-08-17 Thread Keith Kraus
+1 (non-binding) >From having previously run a large C++ project that migrated from C++11 to C++17, there was a huge quality of life improvement for developers and it made attracting new developers much easier. One potential pitfall, C++17 wasn't supported by NVIDIA compilers until CUDA Toolkit 1

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-03-03 Thread Keith Kraus
Libcudf / cuDF have supported 32-bit and 64-bit decimals for a few releases now (as well as 128-bit decimals in the past couple of releases) and they've generally been received positively from the community. Being able to roundtrip them through Arrow would definitely be nice as well! On Thu, Mar 3

Re: [ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-25 Thread Keith Kraus
Congrats Kou! Thanks for all of your work! On Tue, Jan 25, 2022 at 8:04 PM Weston Pace wrote: > Congratulations Kou! > > On Tue, Jan 25, 2022 at 8:22 AM Neal Richardson > wrote: > > > > Congratulations! > > > > Neal > > > > On Tue, Jan 25, 2022 at 12:53 PM Benson Muite < > benson_mu...@emailplu

Re: Preparing for version 7.0.0 release

2022-01-13 Thread Keith Kraus
I responded on the JIRA ticket, but this is just the runtime library which is backwards compatible and doesn't force an upgrade of the actual compiler in any way. I have built and run libcudf and cudf in my environment which has gcc / g++ 9.4.0 with libstdcxx 11.2.0 without issue. On Thu, Jan 13,

Re: [ANNOUNCE] New Arrow PMC member: Joris Van den Bossche

2021-11-17 Thread Keith Kraus
Congrats Joris! On Wed, Nov 17, 2021 at 6:35 PM Weston Pace wrote: > Congratulations! I continue to be grateful for Joris' many contributions > and advice. This is great news. > > On Wed, Nov 17, 2021, 12:56 PM Wes McKinney wrote: > > > The Project Management Committee (PMC) for Apache Arrow

Re: Arrow in HPC

2021-10-26 Thread Keith Kraus
Outside of just HPC, integrating UCX would potentially allow taking advantage of its shared memory backend which would be interesting from a performance perspective in the single-node, multi-process case in many situations. Not sure it's worth the UCX dependency in the long run, but would allow us

Re: [C++] Decimal arithmetic edge cases

2021-09-30 Thread Keith Kraus
For another point of reference, here's microsoft's docs for SQL server on resulting precision and scale for different operators including its overflow rules: https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver15 -Keith On Thu, Sep 30,

Re: [Python] manylinux2014 and _GLIBCXX_USE_CXX11_ABI setting

2021-09-10 Thread Keith Kraus
e new ABI. -Keith On Fri, Sep 10, 2021 at 11:13 AM Antoine Pitrou wrote: > > Le 10/09/2021 à 17:05, Keith Kraus a écrit : > > For what it's worth, setting it to 1 as opposed to 0 will make the > package > > incompatible with CentOS / RHEL 7 as the glibc they ship d

Re: [Python] manylinux2014 and _GLIBCXX_USE_CXX11_ABI setting

2021-09-10 Thread Keith Kraus
For what it's worth, setting it to 1 as opposed to 0 will make the package incompatible with CentOS / RHEL 7 as the glibc they ship does not support the new ABI. -Keith On Fri, Sep 10, 2021, 4:53 AM Philipp Moritz wrote: > Ah ok, that makes sense! I'm also not even sure if > _GLIBCXX_USE_CXX11_

Re: [ANNOUNCE] New Arrow committer: Nic Crane

2021-09-09 Thread Keith Kraus
Congrats Nic! On Thu, Sep 9, 2021 at 11:47 AM Neal Richardson wrote: > On behalf of the Apache Arrow PMC, I'm happy to announce that Nic Crane > has accepted an invitation to become a committer on Apache Arrow. > > Welcome and thank you for your contributions! > > Neal >

Re: Arrow in HPC

2021-09-09 Thread Keith Kraus
There's nothing stopping us from transmitting HTTP/2 or another binary protocol over UCX. You can think of UCX as a transport layer abstraction library which allows transparently taking advantage of things like RDMA over InfiniBand / RoCE, inter-process shared memory, TCP sockets, etc. The other t

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Keith Kraus
If I remember correctly the reason cuDF interacts with the Cython code for IPC stuff is that in the past the existing IPC machinery in Arrow didn't work correctly with GPU memory. If that is fixed I think there's a case to remove this code entirely from cuDF and instruct users to use the higher lev

Re: [VOTE][Format] Clarify allowed value range for the Time types

2021-08-20 Thread Keith Kraus
+1 (non-binding) On Fri, Aug 20, 2021 at 9:49 AM Rok Mihevc wrote: > +1 (non-binding) > > On Fri, Aug 20, 2021 at 3:46 PM Jorge Cardoso Leitão > wrote: > > > > +1 > > > > On Fri, Aug 20, 2021 at 2:43 PM David Li wrote: > > > > > +1 > > > > > > On Thu, Aug 19, 2021, at 18:33, Weston Pace wrote:

Re: [VOTE][Format] Add in a new interval type can combines Month, Days and Nanoseconds

2021-08-17 Thread Keith Kraus
+1 (non-binding) On Tue, Aug 17, 2021 at 7:34 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > +1 > > On Tue, Aug 17, 2021 at 8:50 PM Micah Kornfield > wrote: > > > Hello, > > As discussed previously [1], I'd like to call a vote to add a new > interval > > type which is a triple of M

Re: [DISCUSS] Splitting out the Arrow format directory

2021-08-13 Thread Keith Kraus
> Personally, I do not care about the speed of IR processing right now. > Any non-trivial (and probably trivial too) computation done > by an IR consumer will dwarf the cost of IR processing. Of course, > we shouldn't prematurely pessimize either, but there's no reason > to spend time worrying abou

Re: [Question] what is the purpose of the typeids in the UnionArray?

2021-08-13 Thread Keith Kraus
How would using the typeid directly work with arbitrary Extension types? -Keith On Fri, Aug 13, 2021 at 12:49 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > In the UnionArray, there is a level of indirection between types (buffer of > i8s) -> typeId (i8) -> field. For examp

Re: [DISCUSS] Dropping support for Visual Studio 2015

2021-08-09 Thread Keith Kraus
+1 as well. Is there any build platforms that we're currently supporting that still use vs2015? Conda-forge did its migration ~1.5 years ago: https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/501. -Keith On Mon, Aug 9, 2021 at 12:01 PM Antoine Pitrou wrote: > > +1 for requiring

JIRA Access

2021-07-29 Thread Keith Kraus
Hello, Could someone give me access to assign myself to JIRA issues? Would like to assign myself to https://issues.apache.org/jira/browse/ARROW-13500. Thanks! Keith

[jira] [Created] (ARROW-6043) Array equals returns incorrectly if NaNs are in arrays

2019-07-25 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-6043: -- Summary: Array equals returns incorrectly if NaNs are in arrays Key: ARROW-6043 URL: https://issues.apache.org/jira/browse/ARROW-6043 Project: Apache Arrow

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-23 Thread Keith Kraus
Just following up in case anyone was following that this turned out to be an NVCC bug that we've reported to the relevant team internally. We moved the `ipc.cu` file to `ipc.cpp` and it works as expected with gcc. Thanks everyone! -Keith On 7/22/19, 12:52 PM, "Keith Kraus" wr

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-22 Thread Keith Kraus
nding construct? Regards Antoine. Le 22/07/2019 à 18:46, Keith Kraus a écrit : > I temporarily removed the csr related code that has the namespace clash and confirmed that the same compilation warnings and errors still occur. > > On 7/20/19, 1:03 AM,

Re: Error building cuDF on new Arrow with std::variant backport

2019-07-22 Thread Keith Kraus
I temporarily removed the csr related code that has the namespace clash and confirmed that the same compilation warnings and errors still occur. On 7/20/19, 1:03 AM, "Micah Kornfield" wrote: The namespace collision is a definite possibility, especially if you are using g++ which seems

Re: New CI system: Ursabot

2019-06-21 Thread Keith Kraus
There's nvidia-docker (https://github.com/NVIDIA/nvidia-docker) which handles passing through the GPU devices and necessary driver modules into a docker container. CUDA doesn't get mapped in as it's userspace so you'll need to either use an image with CUDA baked in (i.e. https://hub.docker.com/

[jira] [Created] (ARROW-5008) ORC Reader Core Dumps in PyArrow if `/etc/localtime` does not exist

2019-03-25 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-5008: -- Summary: ORC Reader Core Dumps in PyArrow if `/etc/localtime` does not exist Key: ARROW-5008 URL: https://issues.apache.org/jira/browse/ARROW-5008 Project: Apache Arrow

[jira] [Created] (ARROW-4766) Casting empty boolean array causes segfault

2019-03-04 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-4766: -- Summary: Casting empty boolean array causes segfault Key: ARROW-4766 URL: https://issues.apache.org/jira/browse/ARROW-4766 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-4324) [Python] Array dtype inference incorrect when created from list of mixed numpy scalars

2019-01-22 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-4324: -- Summary: [Python] Array dtype inference incorrect when created from list of mixed numpy scalars Key: ARROW-4324 URL: https://issues.apache.org/jira/browse/ARROW-4324

[jira] [Created] (ARROW-3374) [Python] Dictionary has out-of-bound index when creating DictionaryArray from Pandas with NaN

2018-09-30 Thread Keith Kraus (JIRA)
Keith Kraus created ARROW-3374: -- Summary: [Python] Dictionary has out-of-bound index when creating DictionaryArray from Pandas with NaN Key: ARROW-3374 URL: https://issues.apache.org/jira/browse/ARROW-3374