[C++]Handling client disconnects on DoExchange (memory leak?)

2021-06-25 Thread Radu Teodorescu
Hi, I am seeing a memory leak server side caused by calls to DoExchange: The simples repro is having a Flight server that implements DoExchange like this DoExchange(…) { while (true) { … reader->Next(&chunk); if (chunk.app_metadata == null

Re: Pyarrow RecordBatchStreamWriter and dictionaries

2021-04-22 Thread Radu Teodorescu
Hi I am seeing a similar problem when serializing tables with lists of dictionary encoded elements: each resulting chunk is pointing to the first chunk’s original dictionary. Is this a known issue/limitation. I can follow with a repro otherwise. Thank you Radu > On Sep 28, 2020, at 1:26 PM, Wes

Re: Columns/Field index semantic for parquet FileReader

2020-11-16 Thread Radu Teodorescu
. > On Nov 16, 2020, at 7:18 PM, Radu Teodorescu > wrote: > > Hi, > (my apologies if this has already been discussed) > I just took a stab at the struct support in parquet FileReader and I am a bit > confused by the column index semantic when trying to read a subset of col

Columns/Field index semantic for parquet FileReader

2020-11-16 Thread Radu Teodorescu
Hi, (my apologies if this has already been discussed) I just took a stab at the struct support in parquet FileReader and I am a bit confused by the column index semantic when trying to read a subset of columns from a subset of row groups: Say I have a single column arrow table top: struct {

Re: mutual TLS peer_identity in arrow flight

2020-10-26 Thread Radu Teodorescu
rs there might think >> about how/if mTLS fits their design. >> >> https://lists.apache.org/thread.html/r485888f4f818e8e4722dc6c53491fb4c68ee7ac16d1c769612e61d21%40%3Cdev.arrow.apache.org%3E >> >> Best, >> David >> >> On 10/26/20, Radu Teodorescu wro

mutual TLS peer_identity in arrow flight

2020-10-26 Thread Radu Teodorescu
Hi, I have a follow up question/feature proposal in the context of mutual TLS (introduced by https://issues.apache.org/jira/browse/ARROW-8742 ): In the context of mutual TLS the client is authenticated at TLS level and the client identity is avai

Re: MacOS CI issue

2020-09-12 Thread Radu Teodorescu
'm not sure if there is a JIRA open yet about fixing it, but if not > we should open one. It seems related to changes in Homebrew > > On Sat, Sep 12, 2020 at 3:42 PM Radu Teodorescu > wrote: >> >> the other task that is failing is also failing on a grpc d

Re: MacOS CI issue

2020-09-12 Thread Radu Teodorescu
tack (most recent call first): 571 C:/Miniconda37-x64/envs/arrow/Library/lib/cmake/grpc/gRPCConfig.cmake:21 (find_package) 572 cmake_modules/ThirdpartyToolchain.cmake:2472 (find_package) 573 CMakeLists.txt:495 (include) > On Sep 12, 2020, at 4:30 PM, Radu Teodorescu > wrote: > > Hi

MacOS CI issue

2020-09-12 Thread Radu Teodorescu
Hi, I am struggling to debug a task that is failing in CI: https://github.com/apache/arrow/pull/8130/checks?check_run_id=1106595772 Looks like a CMake failure and that is strange given my PR only touches existing c++ fil

Re: Multifile parquet support

2020-09-07 Thread Radu Teodorescu
Hi Radu, >> It might be easier to get feedback on some concrete code. Perhaps make a PR >> with a proof of concept and we can discuss there? >> >> Neal >> >> On Fri, Sep 4, 2020 at 4:27 AM Radu Teodorescu >> wrote: >> >>> Micah and

Re: Arrow as a streaming format

2020-09-04 Thread Radu Teodorescu
Hi Pedro, You should be able to use flight for this: pack you subscription call in a DoGet and listen on the FlightDataStream for new data. I thinkˆyou can control the granularity of your messages through the size of the record batches you are writing, but I am not a flight developer so don’t t

Re: Multifile parquet support

2020-09-04 Thread Radu Teodorescu
ow.apache.org%3E > > On Thu, Sep 3, 2020 at 3:20 PM Radu Teodorescu > wrote: > >> Hello, >> What is the current thinking around allowing the logical content of a >> parquet file to be split across multiple files? >> I see that in theory there is support for readi

Multifile parquet support

2020-09-03 Thread Radu Teodorescu
Hello, What is the current thinking around allowing the logical content of a parquet file to be split across multiple files? I see that in theory there is support for reading files where different row groups are in separate files but I cannot see any features that allow that for writing. On a s

conversion between pyspark.DataFrame and pyarrow.Table

2020-08-26 Thread Radu Teodorescu
Hi, I noticed that arrow is mentioned as an optional intermediary format for converting between pandas DFs and spark DFs. Is there a way to explicitly convert an pyarrow Table to a spark DataFrame and the other way around. Absent that, going pysprak->pandas->pyarrow and back works but it’s obviou

Best ways to implement push notifications over Flight?

2020-08-19 Thread Radu Teodorescu
Hi, I am looking at the best way to push notifications from a server to clients over flight and I have a few questions on the approach: A. Is there a standard way of doing it and/or does this fundamentally go against flight philosophy? B. One approach is to run a doGet and then have the server

Re: Building an executable with arrow flight (C++)

2020-08-17 Thread Radu Teodorescu
de that when > linking. > > On Mon, Aug 17, 2020 at 12:47 PM Radu Teodorescu > wrote: >> >> ok - here is a simple illustration of my challenges building with arrow >> flight: https://github.com/raduteo/hello_flight >> <https://github.com/raduteo/hello_flight>

Re: Building an executable with arrow flight (C++)

2020-08-17 Thread Radu Teodorescu
740 > > On Thu, Aug 13, 2020 at 9:44 PM Radu Teodorescu > wrote: > >> Hi Wes, >> >> I will certainly give that a shot and provide feedback - my typical setup >> with arrow has so far used ExternalProject and I tend to prefer this for >> development vs

Re: Building an executable with arrow flight (C++)

2020-08-13 Thread Radu Teodorescu
;s a bug and you should open a JIRA issue. We just > worked a bunch on this for 1.0.0 and after so it's important that this > work consistently. > >> On Thu, Aug 13, 2020 at 4:20 PM Radu Teodorescu >> wrote: >> >> I can produce something isolated shortly -

Re: Building an executable with arrow flight (C++)

2020-08-13 Thread Radu Teodorescu
nd C++ file set to > reproduce your case? > > > Thanks, > -- > kou > > In > "Building an executable with arrow flight (C++)" on Thu, 13 Aug 2020 > 12:06:49 -0400, > Radu Teodorescu wrote: > >> Hello, >> I am trying to build a serve

Building an executable with arrow flight (C++)

2020-08-13 Thread Radu Teodorescu
Hello, I am trying to build a server that uses arrow flight and getting into a bit of a rabbit hole with dependency inclusion. I have arrow included as an external project and so far everything has worked really smoothly (I have executables building with arrow, parquet arrow and I also have arr

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-08-05 Thread Radu Teodorescu
proposals. It is a reference implementation, but certainly not something that can be dropped in directly in its current form (for example, I am leaning quite heavily on c++14/17 and a bit of 20), but if the vision makes sense I would love to bring that into arrow. > On Wed, Aug 5, 2020 at 7:

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-08-05 Thread Radu Teodorescu
it. Thank you Radu > On Jun 25, 2020, at 3:10 PM, Radu Teodorescu > wrote: > > Understood and agreed > My proposal really addresses a number of mechanisms on layer 2 ( "Virtual" > tables) in your taxonomy (I can adjust interface names accordingly as part of >

Re: Writing null structs to parquet

2020-07-30 Thread Radu Teodorescu
timeline for that? (I can work around it for now, but it would be nice to have at some point) … maybe that can be my first contribution given enough time :). > On Jul 30, 2020, at 9:26 AM, Radu Teodorescu > wrote: > > > Thank you Micah! > I spent a bit of time trying to ge

Re: Writing null structs to parquet

2020-07-30 Thread Radu Teodorescu
a bug in JIRA? >>> >>> I'm looking into it to see if I can figure out what is going on. >>> >>> Thanks, >>> Micah >>> >>> On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu >>> wrote: >>> >>>> Is the current

Writing null structs to parquet

2020-07-29 Thread Radu Teodorescu
Is the current version supposed to allow struct columns with null values to be written to parquet: I narrowed it down to a two rows table with one column and two rows and the resulting parquet file is broken both according to parquet-tools as well as our own reader (it looks like a buffer is no

Re: Deep copy for ArrayData,Array, Table in C++ API

2020-06-29 Thread Radu Teodorescu
gt;> On Fri, 26 Jun 2020 13:56:26 -0400 >> Radu Teodorescu wrote: >>> Looks like Concatenate is my best bet if I am looking at putting together >>> ranges, certainly doesn’t look as neatly packaged as Take, but this might >>> be the right tool for this job. >

Re: Deep copy for ArrayData,Array, Table in C++ API

2020-06-26 Thread Radu Teodorescu
Looks like Concatenate is my best bet if I am looking at putting together ranges, certainly doesn’t look as neatly packaged as Take, but this might be the right tool for this job. > On Jun 26, 2020, at 1:01 PM, Radu Teodorescu > wrote: > > That is fabulous and pretty much it!

Re: Deep copy for ArrayData,Array, Table in C++ API

2020-06-26 Thread Radu Teodorescu
work you guys have been putting into this project) Radu > On Jun 26, 2020, at 12:39 PM, Micah Kornfield wrote: > > This sounds like the Take kernel? > > On Friday, June 26, 2020, Radu Teodorescu > wrote: > >> (Light weigh topic this time) >> Are there any ex

Deep copy for ArrayData,Array, Table in C++ API

2020-06-26 Thread Radu Teodorescu
(Light weigh topic this time) Are there any existing functions for deep copying Array,ArrayData or Table objects in the C++ API? Ultimately, I am trying to get a bunch of sparse row ranges from a ranges into a contiguous new Table - I can see how I can copy Buffer and I can implement it all myse

Re: Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-06-25 Thread Radu Teodorescu
eparation of concerns > between these three layers . I'll dig in in more detail sometime after > July 4. > > Thanks > Wes > > > > > On Thu, Jun 25, 2020 at 11:50 AM Radu Teodorescu > wrote: >> >> Here it is as a pull request: >> https://gith

Proposal for arrow DataFrame low level structure and primitives (Was: Two proposals for expanding arrow Table API (virtual arrays and random access))

2020-06-25 Thread Radu Teodorescu
e code. > On Jun 17, 2020, at 6:11 PM, Neal Richardson > wrote: > > Maybe a draft pull request? If you put "WIP" in the pull request title, CI > won't run builds on it, so it's suitable for rough outlines and collecting > feedback. > > Neal > >

Re: Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
; - Wes > > [1]: https://issues.apache.org/jira/browse/ARROW-1329 > [2]: > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h > > On Wed, Jun 17, 2020 at 2:48 PM Radu Teodorescu > wrote: >> >> Hi folks,

Two proposals for expanding arrow Table API (virtual arrays and random access)

2020-06-17 Thread Radu Teodorescu
Hi folks, While I’ve been communicating with some members of this group in the past, this is my first official post so please excuse/correct/guide me as needed. Logistics first: I put most of the content of my proposals in google doc, but if more appropriate, we can keep the conversation going b