Hi Kazu, from the description of what behavior you're seeing and the code
you've provided, it looks like you may be mixing up the two file formats
(Arrow IPC and Parquet) in your code. Your Julia code looks like it's using
the Arrow IPC file format whereas your Python code looks like it's using
the
Parquet.jl and
> Arrow.jl and reload it. Am I correct?
>
> Like:
> # convert a parquet file into the Arrow IPC format
> tab = Parquet.readfile(“blah.parquet”)
> Arrow.write(“blah.arrow”, tab)
>
> # reload it into in-memory data
> tab2 = Arrow.read(“blah.arrow")
&
Your error looks very similar to one already reported [1] that had to do
with using a non-AWS S3 compatible storage provider (R2 in this case),
though a solution was never provided. Are you seeing this error using AWS
S3 or another provider?
[1] https://github.com/apache/arrow/issues/33275
On Wed
Do you still get the timeout if you build a standalone program that
just uses Arrow C++? It might give us some more information if you run
your code with S3's log level turned up. So instead of using
EnsureS3Initialized, initialize S3 manually like,
S3GlobalOptions options;
options.log_level = S3L
Hi Gus, did you ever get an answer to your questions?
>From a look at the source code, neither the CSV reader or builders
look goroutine safe. However, your usage of the CSV reader above looks
safe to me because 'record' gets copied into each goroutine
invocation. Importantly, the builder would ne
Hi Richard,
I tried to reproduce [1] something akin to what you describe and I
also see worse-than-expected performance. I did find a GitHub Issue
[2] describing performance issues with wide record batches which might
be relevant here, though I'm not sure.
Have you tried the same kind of workflow
Hi Simon, take a look at the highlighted portion in scenario.go [1]
and see if that helps. I think it's pretty similar to what you're
wanting to do.
PS: Your email shows up as a mostly-white wall of text. To read it I
had to paste it into a plain text editor.
[1]
https://github.com/apache/arrow/
Your code is correct so I think something else is going on. Can you
give us more details about your environment, such as how you're
getting the Arrow C++ DLLs (nuget, conda, building from source) and
how you're compiling your program?
On Thu, Nov 16, 2023 at 4:27 AM wrote:
>
> Hi,
>
>
>
> I’m t
Hi Rick, as mentioned in your thread on the dev mailing list [1], in
the above code your server isn't listening using TLS and your client
isn't trying to connect over TLS. This has to do with how you're
constructing your locations for each.
In your server code, use flight.Location.for_grpc_tls ins
I filed a minor PR [1] to improve the documentation so it's clear what
units are involved as I think the current language is vague.
[1] https://github.com/apache/arrow/pull/40251
On Sun, Feb 25, 2024 at 9:08 PM Kevin Liu wrote:
>
> Hey folks,
>
> I'm working with the PyArrow API for Tables and R
Hi Nic,
I think you can do this with just the Scanner [1]:
taxi_ds = ds.dataset("~/Datasets/nyc-taxi", partitioning =
ds.partitioning(pa.schema([("year", pa.int16())]), flavor="hive"))
expr = # some expression equivalent to your case_when above
scanner = taxi_ds.scanner(columns={'new_col': expr})
Hi Tom,
The short answer is that you want to send your table as Arrow IPC and
let the libraries do most of the work serializing and deserializing.
That said, what this looks like in a real world scenario like yours
isn't currently well-documented but that is rapidly changing. Ian Cook
has been wor
Hi Eric, could you elaborate on what you mean by this?
> as ChunkedArray is being built via the API.
Sharing some code, either here or as a link might be helpful.
On Thu, Jul 4, 2024 at 11:12 AM Eric Jacobs wrote:
> Hi,
> I would like to build a ChunkedArray but I need to limit the maximum
> s
Thanks for driving this forward. I didn't see the links in my email client
so I'm adding those below in case helps others:
Issue: https://github.com/apache/arrow/issues/43495
PR: https://github.com/apache/arrow/pull/43389
On Thu, Aug 1, 2024 at 4:06 AM Ruoxi Sun wrote:
> Hello everyone,
>
> We'
Your Arrow JS code looks fine, you may be running into browser
security (i.e., CORS). How is the file being hosted? What do you get
when you take Arrow JS out of the equation and just print the fetch
response (like with response.text())?
On Mon, Aug 19, 2024 at 11:48 PM Simon Knight wrote:
>
> H
rings using
> "string_view", which I don't think the javascript library supports.
> If I write out using pandas, it all seems to work correctly.
>
> Thanks
>
> On Tue, 20 Aug 2024 at 17:16, Bryce Mecum wrote:
>>
>> Your Arrow JS code looks fine, you ma
Are you able to share your code, particularly how you build your
ArrowWriterProperties?
The Arrow Schema and therefore the field-level metadata is actually
stored in the Parquet file as an opaque blob. Opaque in the sense that
it's opaque to the standard Parquet tools. You'll have to read it in
wi
[1] https://github.com/apache/arrow/issues/31018
[2]
https://github.com/apache/arrow/blob/4ede48c89b8ec80bbd1895357f272c5fb61bc9b6/cpp/examples/arrow/parquet_read_write.cc#L115-L116
On Wed, Jan 8, 2025 at 8:46 AM Andrew Bell wrote:
>
> Thanks for your response
>
> On Mon, Jan 6, 2025
ice can access headers for validation if needed and redirect vectors
> it receives to the designated flight server using another flight client.
>
> Hope it helps
>
> On Sat, Mar 15, 2025 at 3:03 AM kekronbekron
> wrote:
>
> Sure -
>
> https://www.definite.app/blog/duck
Hi kekronbekron, can you share any pointers to the pattern you mention
and where people are talking about it? It sounds like something I
might be interested in tracking.
On Thu, Mar 13, 2025 at 7:27 PM kekronbekron
wrote:
>
> I'm embarking on exactly this.
> Amusing how this pattern has become "v
i get the path for the "-import-path" and
> also where can i get the file Flight.proto?
> ____
> From: Bryce Mecum
> Sent: Tuesday, March 18, 2025 12:15 PM
> To: user@arrow.apache.org
> Subject: Re: api gateway with arrow flight grpc
>
&
21 matches
Mail list logo