Re: [Julia][Python] trouble with loading parquet files from interfaces in different languages

2023-03-21 Thread Bryce Mecum
Hi Kazu, from the description of what behavior you're seeing and the code you've provided, it looks like you may be mixing up the two file formats (Arrow IPC and Parquet) in your code. Your Julia code looks like it's using the Arrow IPC file format whereas your Python code looks like it's using the

Re: [Julia][Python] trouble with loading parquet files from interfaces in different languages

2023-03-22 Thread Bryce Mecum
Parquet.jl and > Arrow.jl and reload it. Am I correct? > > Like: > # convert a parquet file into the Arrow IPC format > tab = Parquet.readfile(“blah.parquet”) > Arrow.write(“blah.arrow”, tab) > > # reload it into in-memory data > tab2 = Arrow.read(“blah.arrow") &

Re: How to troubleshoot curlCode 18 errors

2023-03-22 Thread Bryce Mecum
Your error looks very similar to one already reported [1] that had to do with using a non-AWS S3 compatible storage provider (R2 in this case), though a solution was never provided. Are you seeing this error using AWS S3 or another provider? [1] https://github.com/apache/arrow/issues/33275 On Wed

Re: [C++] S3 timeout on linux

2023-06-12 Thread Bryce Mecum
Do you still get the timeout if you build a standalone program that just uses Arrow C++? It might give us some more information if you run your code with S3's log level turned up. So instead of using EnsureS3Initialized, initialize S3 manually like, S3GlobalOptions options; options.log_level = S3L

Re: [Go] Are builders and CSV readers goroutine safe

2023-07-06 Thread Bryce Mecum
Hi Gus, did you ever get an answer to your questions? >From a look at the source code, neither the CSV reader or builders look goroutine safe. However, your usage of the CSV reader above looks safe to me because 'record' gets copied into each goroutine invocation. Importantly, the builder would ne

Re: Optimizing read performance - wide data frames

2023-09-28 Thread Bryce Mecum
Hi Richard, I tried to reproduce [1] something akin to what you describe and I also see worse-than-expected performance. I did find a GitHub Issue [2] describing performance issues with wide record batches which might be relevant here, though I'm not sure. Have you tried the same kind of workflow

Re: Format for Arrow Flight descriptor and schema for DoPut from Go

2023-10-13 Thread Bryce Mecum
Hi Simon, take a look at the highlighted portion in scenario.go [1] and see if that helps. I think it's pretty similar to what you're wanting to do. PS: Your email shows up as a mostly-white wall of text. To read it I had to paste it into a plain text editor. [1] https://github.com/apache/arrow/

Re: [C++][Parquet] Unable to read memory??

2023-11-16 Thread Bryce Mecum
Your code is correct so I think something else is going on. Can you give us more details about your environment, such as how you're getting the Arrow C++ DLLs (nuget, conda, building from source) and how you're compiling your program? On Thu, Nov 16, 2023 at 4:27 AM wrote: > > Hi, > > > > I’m t

Re: [Python] FlightServerBase how to use/test TLS

2024-01-04 Thread Bryce Mecum
Hi Rick, as mentioned in your thread on the dev mailing list [1], in the above code your server isn't listening using TLS and your client isn't trying to connect over TLS. This has to do with how you're constructing your locations for each. In your server code, use flight.Location.for_grpc_tls ins

Re: Chunk Table into RecordBatches of at most 512MB each

2024-02-26 Thread Bryce Mecum
I filed a minor PR [1] to improve the documentation so it's clear what units are involved as I think the current language is vague. [1] https://github.com/apache/arrow/pull/40251 On Sun, Feb 25, 2024 at 9:08 PM Kevin Liu wrote: > > Hey folks, > > I'm working with the PyArrow API for Tables and R

Re: [Python] Read dataset -> project -> write dataset, without intermediate table?

2024-03-11 Thread Bryce Mecum
Hi Nic, I think you can do this with just the Scanner [1]: taxi_ds = ds.dataset("~/Datasets/nyc-taxi", partitioning = ds.partitioning(pa.schema([("year", pa.int16())]), flavor="hive")) expr = # some expression equivalent to your case_when above scanner = taxi_ds.scanner(columns={'new_col': expr})

Re: [Go][Javascript] How to transfer data between frontend and backend?

2024-03-28 Thread Bryce Mecum
Hi Tom, The short answer is that you want to send your table as Arrow IPC and let the libraries do most of the work serializing and deserializing. That said, what this looks like in a real world scenario like yours isn't currently well-documented but that is rapidly changing. Ian Cook has been wor

Re: [C++] Building a ChunkedArray with allocation size control

2024-07-04 Thread Bryce Mecum
Hi Eric, could you elaborate on what you mean by this? > as ChunkedArray is being built via the API. Sharing some code, either here or as a link might be helpful. On Thu, Jul 4, 2024 at 11:12 AM Eric Jacobs wrote: > Hi, > I would like to build a ChunkedArray but I need to limit the maximum > s

Re: [DISCUSS][Acero] Upgrading to 64-bit row offsets in row table

2024-08-01 Thread Bryce Mecum
Thanks for driving this forward. I didn't see the links in my email client so I'm adding those below in case helps others: Issue: https://github.com/apache/arrow/issues/43495 PR: https://github.com/apache/arrow/pull/43389 On Thu, Aug 1, 2024 at 4:06 AM Ruoxi Sun wrote: > Hello everyone, > > We'

Re: Issue loading IPC data in javascript

2024-08-20 Thread Bryce Mecum
Your Arrow JS code looks fine, you may be running into browser security (i.e., CORS). How is the file being hosted? What do you get when you take Arrow JS out of the equation and just print the fetch response (like with response.text())? On Mon, Aug 19, 2024 at 11:48 PM Simon Knight wrote: > > H

Re: Issue loading IPC data in javascript

2024-08-21 Thread Bryce Mecum
rings using > "string_view", which I don't think the javascript library supports. > If I write out using pandas, it all seems to work correctly. > > Thanks > > On Tue, 20 Aug 2024 at 17:16, Bryce Mecum wrote: >> >> Your Arrow JS code looks fine, you ma

Re: Writing Field Metadata

2025-01-06 Thread Bryce Mecum
Are you able to share your code, particularly how you build your ArrowWriterProperties? The Arrow Schema and therefore the field-level metadata is actually stored in the Parquet file as an opaque blob. Opaque in the sense that it's opaque to the standard Parquet tools. You'll have to read it in wi

Re: Writing Field Metadata

2025-01-08 Thread Bryce Mecum
[1] https://github.com/apache/arrow/issues/31018 [2] https://github.com/apache/arrow/blob/4ede48c89b8ec80bbd1895357f272c5fb61bc9b6/cpp/examples/arrow/parquet_read_write.cc#L115-L116 On Wed, Jan 8, 2025 at 8:46 AM Andrew Bell wrote: > > Thanks for your response > > On Mon, Jan 6, 2025

Re: api gateway with arrow flight grpc

2025-03-18 Thread Bryce Mecum
ice can access headers for validation if needed and redirect vectors > it receives to the designated flight server using another flight client. > > Hope it helps > > On Sat, Mar 15, 2025 at 3:03 AM kekronbekron > wrote: > > Sure - > > https://www.definite.app/blog/duck

Re: api gateway with arrow flight grpc

2025-03-14 Thread Bryce Mecum
Hi kekronbekron, can you share any pointers to the pattern you mention and where people are talking about it? It sounds like something I might be interested in tracking. On Thu, Mar 13, 2025 at 7:27 PM kekronbekron wrote: > > I'm embarking on exactly this. > Amusing how this pattern has become "v

Re: api gateway with arrow flight grpc

2025-03-25 Thread Bryce Mecum
i get the path for the "-import-path" and > also where can i get the file Flight.proto? > ____ > From: Bryce Mecum > Sent: Tuesday, March 18, 2025 12:15 PM > To: user@arrow.apache.org > Subject: Re: api gateway with arrow flight grpc > &