Re: [DataFusion] Question about async/await?

2021-09-13 Thread Evan Chan
The other suggestion would be to have a way to monitor and watch for when the CPU-bound thread pool saturates, which can result in queues backing up into the main dispatch async threads as well…. Ie there might be some spillover if the CPU thread pool fills up to watch out for. -Evan > On Sep

Re: [Rust] Eliminate Timezone field from Timestamp types?

2021-07-07 Thread Evan Chan
with how it handles it. imo that is an >>> artifact of being currently difficult (API wise) to create an array with a >>> timezone, which have caused people to not use it much (and thus not >>> implement kernels with it / test it properly). >>> >>> I do not see

[Rust] Eliminate Timezone field from Timestamp types?

2021-07-07 Thread Evan Chan
Hi folks, Some of us are having a discussion about a direction change for Rust Arrow timestamp types, which current support both a resolution field (Ns, Micros, Ms, Seconds) similar to the other language implementations, but also optionally a timezone string field. I believe the timezone fiel

Re: [C++] Adopting a library for (distributed) tracing

2021-04-30 Thread Evan Chan
Dear David, OpenTelemetry tracing is definitely the future, I guess the question is how far down the stack we want to put it. I think it would be useful for flight and other higher level modules, and for DataFusion for example it would be really useful. As for being alpha, I don’t think it

Re: [Rust] DataFusion & Ballista User Guide

2021-04-30 Thread Evan Chan
That sounds awesome! Looking forward to contributing to this. > On Apr 21, 2021, at 6:46 AM, Andy Grove wrote: > > We just merged a PR that adds a minimal structure for a user guide in > mdbook format, based on the existing Ballista user guide. > > https://github.com/apache/arrow-datafusion/tr

Re: [RUST] parquet2 experiment

2021-04-17 Thread Evan Chan
This sounds like really awesome work! If it is in its own repo, would that mean the current implementation in Arrow would just be left there? Good parquet support seems really important to have. Evan > On Apr 17, 2021, at 3:14 AM, Andrew Lamb wrote: > > It sounds like exciting work Jorge --

Re: [Rust][Datafusion] Timestamp Millisecond support

2021-04-17 Thread Evan Chan
/github.com/apache/arrow/pull/10005#discussion_r612551640> > > > > On Thu, Apr 15, 2021 at 4:41 PM Evan Chan <mailto:e...@urbanlogiq.com>> wrote: > >> Hi folks, >> >> So currently Arrow Rust/DataFusion supports four types of Timestamp >> arrays,

[Rust][Datafusion] Timestamp Millisecond support

2021-04-15 Thread Evan Chan
Hi folks, So currently Arrow Rust/DataFusion supports four types of Timestamp arrays, with Nano, Micro, Millisecond and Second resolution. However, the best supported by far are Nanos. For example, in DataFusion, the following only works for Nanos and not the other resolutions: * CAST(x as TI

[jira] [Created] (ARROW-8202) [Rust] SIGSEGV when using StringBuilder with jemalloc

2020-03-24 Thread Evan Chan (Jira)
Evan Chan created ARROW-8202: Summary: [Rust] SIGSEGV when using StringBuilder with jemalloc Key: ARROW-8202 URL: https://issues.apache.org/jira/browse/ARROW-8202 Project: Apache Arrow Issue

Re: Summary of RLE and other compression efforts?

2020-03-24 Thread Evan Chan
Hi Micah, Hope everyone is staying safe! > On Mar 16, 2020, at 9:41 PM, Micah Kornfield wrote: > > I feel a little uncomfortable in the fact that there isn't a more clearly > defined dividing line for what belongs in Arrow and what doesn't. I suppose > this is what discussions like these are

Re: Summary of RLE and other compression efforts?

2020-03-14 Thread Evan Chan
/compression.md#predictive-nibblepacking> Happy to give more details, perhaps in a separate channel if needed. -Evan > > Thanks, > -Micah > > [1] https://github.com/apache/arrow/pull/4815/files > <https://github.com/apache/arrow/pull/4815/files> > > > >

Re: Summary of RLE and other compression efforts?

2020-03-11 Thread Evan Chan
any sort of encoding/compression must be supportable > across multiple languages/platforms. > > Thanks, > Micah > > On Tue, Mar 10, 2020 at 3:12 PM Wes McKinney wrote: > >> On Tue, Mar 10, 2020 at 5:01 PM Evan Chan >> wrote: >>> >>> Ma

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Evan Chan
igation turned out be quite the same >> solution as the one in https://github.com/powturbo/Turbo-Transpose. >> >> >> Maybe the points I sent can be helpful. >> >> >> Kinds regards, >> >> Martin >> >> _

Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Evan Chan
> > Note that at some point my investigation turned out be quite the same > solution as the one in https://github.com/powturbo/Turbo-Transpose. > > > Maybe the points I sent can be helpful. > > > Kinds regards, > > Martin > > __

Summary of RLE and other compression efforts?

2020-03-09 Thread Evan Chan
Hi folks, I’m curious about the state of efforts for more compressed encodings in the Arrow columnar format. I saw discussions previously about RLE, but is there a place to summarize all of the different efforts that are ongoing to bring more compressed encodings? Is there an effort to compre

[Rust] Dictionary encoding for strings?

2020-03-09 Thread Evan Chan
Hi, Does the Rust implementation support dictionary encoded strings? It is not in the documentation anywhere, but there seem to be some variable-sized dictionary structs in the code base. If not, is there a plan to support it? Does DataFusion support reading from dictionary strings? It seems a