I don't think you are missing anything. The parquet encoding is baked
into the data on the disk so re-encoding at some stage is inevitable.
Re-encoding in python like you are doing is going to be inefficient.
I think you will want to do the re-encoding in C++. Unfortunately, I
don't think we have
> I don't think replacing Scalar compute paths with dedicated paths for
> RLE-encoded data would ever be a simplification. Also, when a kernel
> hasn't been upgraded with a native path for RLE data, former Scalar
> Datums would now be expanded to the full RLE-decoded version before
> running the ke
We've had some evidence for a while now that the kernel functions
suffer from an overhead problem that prevents us from effectively
utilizing cache. The latest and greatest evidence of this might be
[1]. A number of people have made some very interesting suggestions
that I think could really cut
h, I can try to have a first draft
> PR ready to go maybe by Monday (I was going to work on this over the
> weekend when I can have some uninterrupted time to do the
> refactoring). I'm not sure that a new registry is going to be needed
>
> On Thu, Jun 2, 2022 at 2:50 AM Anto
Efficiently reading from a data source is something that has a bit of
complexity (parsing files, connecting to remote data sources, managing
parallel reads, etc.) Ideally we don't want users to have to reinvent
these things as they go. The datasets module in Arrow-C++ has a lot
of code here alrea
27;t quite well-defined enough to be
> meaningfully integrated (except perhaps via a generic "stream of batches"
> entrypoint), and even if we wanted to feed JDBC/ODBC into an ExecPlan, we'd
> have to do some work that would look roughly like writing an ADBC driver, so
>
then the design would need some other registry or mechanism for
> passing the deserialized data source-UDF to the execution plan.
> 5. The data-source UDF is specific to an execution plan, so definitely
> specific to the user who created the Substrait plan in which it is embedded.
> U
I tried to use CLion for a little while with mixed results. CLion
integrates well with cmake. However, CLion seems to rely heavily on
clang-tidy and I was unable to configure clang-tidy in such a way that
it ran reasonably quickly. I think part of the problem is that CLion
wanted to use all of m
I can try and give a more detailed answer later in the week but the
gist of it is that Arrow manages all "buffer allocations" with a
memory pool. These are the allocations for the actual data in the
arrays. These are the allocations that use the memory pool configured
by ARROW_DEFAULT_MEMORY_POOL
ot;--with-private-namespace=je_arrow_private_"
"--without-export"
"--disable-shared"
# Don't override operator new()
"--disable-cxx"
"--disable-libdl"
# See https://github.com/jemalloc/jemalloc/issues/1237
"--disable-initial-exec-tls"
${EP_
Congratulations all!
On Wed, Jun 22, 2022, 10:27 AM Dragoș Moldovan-Grünfeld <
dragos.m...@gmail.com> wrote:
> Congratulations!
>
> Sent from my iPhone
>
> > On 22 Jun 2022, at 18:13, Neal Richardson
> wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that
> >
> > Dewey Dunningto
This seems reasonable to me. A very similar interface is the
RecordBatchReader[1] which is roughly (glossing over details)...
```
class RecordBatchReader {
virtual std::shared_ptr schema() const = 0;
virtual Result> Next() = 0;
virtual Status Close() = 0;
};
```
This seems pretty close to
, and then the reentrancy problem is mot since
> no parallel-access occurs. OTOH, if the Python-based data-source can be
> accessed in parallel, the above sorting-queue solution is better suited and
> would avoid the reentrancy problem of a ReadNext function.
>
>
> Yaron.
>
This is only for the situation where ALL inputs and outputs are
scalar. Scalars, at the kernel level, do not have length. So in this
case there is nothing to repeat. It does build a buffer, but just
with a single value, so it is all O(1).
On Wed, Jun 29, 2022 at 9:49 AM Antoine Pitrou wrote:
>
Given that Acero does not do any planner / optimizer type tasks I'm
not sure you will find anything like this in arrow-cpp or pyarrow.
What you are describing I sometimes refer to as "plan slicing and
dicing". I have wondered if we will someday need this in Acero but I
fear it is a slippery slope
he source
> > > code
> > > >> for "substrait" from `
> > > >>
> > >
> > https://github.com/substrait-io/substrait/archive/${ARROW_SUBSTRAIT_BUILD_VERSION}.tar.gz
> > > >> ` where `ARROW_SUBSTRAIT_BUILD_VERSION` is set in
> >
At the moment that log is used primarily for Arrow developers and is
not likely to be terribly useful beyond that. It is not, as far as I
know, very extensible. I think you can only configure it to log to
stderr or to a single file. However, it could be made extensible if
someone were motivated
ython object, which is more convenient to manipulate from
> Python, after unpickling in from a field in the Substrait plan. It's just
> read-only access to the field from Python, but still needs access to the
> Substrait protobuf Python classes. This case was mentioned in my previous
Memory profiling would be very helpful. Thanks for looking into this.
A few thoughts:
* Peak allocation is an important number for many users. One major
goal for Acero is to get to a point where it can constrain peak
allocation to a preconfigured amount for a single query. We are close
but not
+1 (I'm assuming, as Neal described, I can just reassign the issue to
myself and it won't confuse the assignment bot)
On Fri, Jul 8, 2022 at 8:29 AM Jacob Wujciak wrote:
>
> I support this idea and a 90 days threshold seems good to me!
>
> On Fri, Jul 8, 2022 at 8:02 PM Neal Richardson
> wrote:
Are you changing the default memory pool to a LoggingMemoryPool?
Where are you doing this? For a benchmark I think you would need to
change the implementation in the benchmark file itself.
Similarly, is AsofJoinNode using the default memory pool or the memory
pool of the exec plan? It should be
also expect to see some allocations from
> TableSourceNode going through the logging memory pool, even if AsOfJoinNode
> was using the default memory pool instead of the Exec Plan's pool, but I am
> not seeing anything come through...
>
> -Original Message-
>
end to end benchmark of "scan - join - write" I think would make sense to
> include all arrow memory allocation (if that makes sense)
>
> On Mon, Jul 11, 2022 at 4:37 PM Weston Pace wrote:
>
> > > Is there anything else I'd need to change?
This might be an interesting topic for the Substrait community. You
can find ways to contact them at [1]. I don't know GraphQL well
enough but from what I do know it seems like a GraphQL -> Substrait
converter would be useful, at the very least.
[1] https://substrait.io/community/
On Mon, Jul 1
er and
> MakeReaderGenerator) to generate for a regular source node.
>
> -Original Message-
> From: Weston Pace
> Sent: Monday, July 11, 2022 4:37 PM
> To: dev@arrow.apache.org
> Subject: Re: cpp Memory Pool Clarification
>
> > Is there anything else I
> After some quick debugging, I found that the asof node's StopProducing (a
conditioning necessary to finish the plan) is called shortly after the
error output.
StopProducing should probably more accurately be named "Abort" or
"StopRightNow". If you run the plan to completion normally I do not
be
If you are using a source node (which it appears you are) then it will
be creating new thread tasks for each batch. So, in theory, these
could get out of order. My guess is that the file reader is slow
enough that by the time you load batch N from disk and decode it, you
have a pretty good chance
> 4) control is not returned to the processing thread
Yes, it looks like the current implementation does not return control
to the processing thread, but I think this is correct, or at least "as
designed". The thread will be used to continue iterating the source.
> control is not returned to the
>
> Would the new first class support in the scheduler be something similar to
> what's available currently in BackpressureMonitor? We are looking to
> implement some more custom backpressure schemes that depend on batch
> ordering/completion rather than memory size.
>
22 at 12:32 PM Ivan Chau wrote:
>
> Hi Weston,
>
> Not sure if the diagrams came through here -- is there some other place I
> need to view them?
>
> Ivan
>
> -----Original Message-
> From: Weston Pace
> Sent: Thursday, July 21, 2022 10:59 PM
> T
aring
>
> On Fri, Jul 22, 2022 at 12:32 PM Ivan Chau wrote:
> >
> > Hi Weston,
> >
> > Not sure if the diagrams came through here -- is there some other place I
> > need to view them?
> >
> > Ivan
> >
> > -Original Message-
> >
presures are handled in Acero, I am curious if there has been
> any more progress on this since May or any future plans?
>
> Thanks,
> Li
>
> On Mon, May 23, 2022 at 10:37 PM Weston Pace wrote:
>
> > > About point 2. I have previously seen the pipeline prioritizatio
1) Yes, that sounds correct. The file readers will read from files in
parallel (even if there is one file it can read from row groups in
parallel). There is no guarantee these reads will finish
sequentially.
2) Hmm, this one will work for now, because the executor==nullptr
behavior is to borrow
I think, from a compute perspective, one would just cast before doing
anything. So you wouldn't need much beyond parse and unparse. For
example, if you have a JSON document and you want to know the largest
value of $.weather.temperature then you could do...
MAX(STRUCT_FIELD(PARSE_JSON("json_col"
I'm not sure of the exact error you are getting but I suspect this may
be related to something I am currently working on[1]. I can reproduce
it fairly easily without GCS:
```
import pyarrow as pa
import pyarrow.dataset as ds
my_dataset = ds.dataset(['/some/big/file.csv'], format='csv')
batch_ite
Just a few additional thoughts:
> at least as measured by
> the memory pools max_memory() method.
The parquet reader does a fair amount of allocation on the global
system allocator (i.e. not using a memory pool). Typically this
should be small in comparison with the data buffers themselves (whic
My first suspicion on a test timeout is usually a deadlock. That
being said, I haven't looked at this test / change in any real detail
so I don't know if that's the case here. How long does the test take
to run locally?
Second, I would try and remove sleeps, and make sure to use the
utilities Sl
+1. I'm very much in favor of upgrading to C++17. I am lucky to
often get to work with people that are new to the Arrow C++ code base
and a common feedback is that the code is quite complex. While I do
not think moving to C++17 will solve this problem by itself I'm pretty
confident that being ab
I agree can be reduced by sampling. Could you
> > explain how to use SCOPED_TEST, or refer to documentation about it? I
> > understand your idea, just looking for an example use of SCOPED_TEST.
> >
> >
> > Yaron.
> >
&g
or if they wanted to use newer features (which
could be an incentive to upgrade their R version).
On Wed, Aug 17, 2022 at 4:30 AM Weston Pace wrote:
>
> +1. I'm very much in favor of upgrading to C++17. I am lucky to
> often get to work with people that are new to the Arrow C++
> Any particular reason why this should be 10.0 and not 9.0 for example?
(is due to an incoming feature of note?)
No. I only said 10.0 because Neal's tactical suggestion earlier in
this thread would mean that 10.0 would be the last build that had
C++11 support. If we choose not to follow that sug
+1 (non-binding)
On Wed, Aug 24, 2022 at 9:24 AM Keith Kraus
wrote:
>
> +1 (non-binding)
>
> On Wed, Aug 24, 2022 at 12:12 PM David Li wrote:
>
> > +1 (binding)
> >
> > On Wed, Aug 24, 2022, at 12:06, Ivan Ogasawara wrote:
> > > +1 (non-binding)
> > >
> > > On Wed, Aug 24, 2022 at 12:00 PM Sasha
I don't know of any work being done to turn Acero into a distributed
query engine.
However, I would hope that Acero can be used in a distributed query
engine, and would be a useful component.
If there are features that Acero would need in this environment (e.g.
some kind of exec node for speciali
+1 (non-binding). This is maybe implied but I would add that
modification of extension types must also require a vote and should be
backwards compatible. Furthermore, extension types (particularly
those with extensive parameterization/serialization should discuss how
future additions would be mad
I agree as well. I think most lingering uses of the term "feather"
are in pyarrow and R however, so it might be good to hear from some of
those maintainers.
On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou wrote:
>
>
> I agree with this as well.
>
> Regards
>
> Antoine.
>
>
> On Mon, 29 Aug 2022
Congratulations!
On Sun, Sep 4, 2022 at 5:04 AM Andy Grove wrote:
>
> Congratulations, L. C.!
>
> On Sun, Sep 4, 2022 at 8:09 AM Wang Xudong wrote:
>
> > Congrats!
> >
> > David Li 于2022年9月4日周日 19:54写道:
> >
> > > Congrats & welcome Liang-Chi!
> > >
> > > On Sun, Sep 4, 2022, at 06:22, Andrew La
On Mon, Sep 5, 2022 at 1:56 AM Sutou Kouhei
> > wrote:
> > > >
> > > >> The Project Management Committee (PMC) for Apache Arrow has invited
> > > >> Weston Pace to become a PMC member and we are pleased to announce
> > > >> that Weston Pace has accepted.
> > > >>
> > > >> Congratulations and welcome!
> > > >>
> > >
> > >
> >
It seems like a reasonable approach. I think my initial gut feeling
would be that initializing and finalizing state for each change of key
might be a bit heavyweight in cases where there are only a few values
per key. I think these cases are fairly common as a data
simplification / cleaning pass.
Congratulations!
On Thu, Sep 8, 2022 at 8:32 AM David Li wrote:
>
> Congrats & welcome, Yanghong!
>
> On Thu, Sep 8, 2022, at 11:04, Daniël Heres wrote:
> > Congratulations!
> >
> > On Thu, Sep 8, 2022, 17:02 Andy Grove wrote:
> >
> >> Congratulations, Yanghong!
> >>
> >> On Thu, Sep 8, 2022 at
I'd agree with Micah. I'm also not aware of anyone working on this.
The docs clarify a bit more on the details[1]. I think we'd need a
bit more thinking around an "update/append" workflow too.
That being said, updates, transactions, and appends are something that
the Iceberg project has thought
Breaking changes should be documented in the release notes which are
announced on the Arrow blog[1][2]. In addition, in pyarrow, changes
to non-experimental APIs (and often also those made to experimental
APIs) should go through a deprecation cycle where a warning is emitted
for at least one relea
Congrats Remzi!
On Mon, Sep 12, 2022 at 5:42 PM Rok Mihevc wrote:
>
> Congrats!
>
> Rok
>
> On Sun, Sep 11, 2022 at 4:27 AM Ian Joiner wrote:
>
> > Congrats Remzi!
> >
> > On Sat, Sep 10, 2022 at 8:12 AM Andrew Lamb wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Remzi Y
> The alternative path of subclassing SourceNode and having ExecNode::Init or
> ExecNode::StartProducing seems quite a bit of change (also I don't think
> SourceNode is exposed via public header). But let me know if you think I am
> missing something.
Agreed that we don't want to go this route. D
gt; the result set can be read.)
> - Those partitions then each become a Fragment, and then they can be read in
> parallel by Dataset.
>
> It sounds like the service in question here isn't quite that complex, though,
> so no need to necessarily go that far.
>
> On Tue, Sep 13,
I'm going to bump this because it would be good to get feedback. In
particular it would be nice to get feedback on the suggested format
change[1]. We are currently moving forward on coming up with an IPC
format proposal which we will share when ready.
The two interesting points that jump out to
t;> difficult to calculate offsets. Translating an array offset to a
> > >> buffer offset takes O(log(N)) time. If the run ends are encoded as
> > >> a
> > >> child array (so the RLE array has no buffers and two child arrays)
> > >> then this
he the run ends buffer is
> physical size of the array (or larger) which cannot be easily
> determined without iterating over the whole buffer.
>
> But we need a valid buffer size, so we can resolve logical to physical
> offsets using binary search. Also I'm not sure if it is
Thank you everyone, I think I was pretty far off base in representing the
work Tobias had done and both Tobias and Matt have clarified well.
* There are two child arrays not necessarily for slicing but more to help
distinguish between the logical length (there are no buffers with the
logical leng
First, I think you are correct that there is a lot of value to users here.
I'd love for a capability like this to someday be in pyarrow too for Arrow
compute functions.
I think there is a distinct enough difference between "a query language"
and "a programming language". However, both of them are
Congratulations!
On Mon, Sep 19, 2022 at 6:17 PM Yijie Shen wrote:
>
> Congratulations, Raphael!
>
> On Tue, Sep 20, 2022 at 11:44 AM L. C. Hsieh wrote:
>
> > Congratulations!
> >
> > On Mon, Sep 19, 2022 at 7:40 PM Andy Grove wrote:
> > >
> > > Congratulations, Raphael!
> > >
> > > On Mon, Sep
Congratulations Dan
On Tue, Sep 20, 2022 at 10:52 AM David Li wrote:
>
> Congrats, Dan!
>
> On Tue, Sep 20, 2022, at 13:43, L. C. Hsieh wrote:
> > Congratulations!
> >
> > On Tue, Sep 20, 2022 at 10:38 AM Chao Sun wrote:
> >>
> >> Congrats Dan!
> >>
> >> On Tue, Sep 20, 2022 at 10:17 AM Ian Join
I'm not great at this build stuff but I think the basic idea is that
you will need to package your custom nodes into a shared object.
You'll need to then somehow trigger that shared object to load from
python. This seems like a good place to invoke the initialize method.
Currently pyarrow has to
Funny you should mention this, I just ran into the same problem :).
We use StartAndCollect so much in our unit tests that there must be
some usefulness there. You are correct that it is not an API that can
be used outside of tests.
I added utility methods DeclarationToTable, DeclarationToBatches,
In pyarrow it is "string(s) -> arrow Table". However, in the actual
C++ (e.g. relation_internal.cc) it is already "string(s) ->
compute::Declaration" which should be sufficiently general for your
needs. A "compute::Declaration" is a combination of node factory name
and node options so you should
Currently Substrait only has a binary (protobuf) serialization (and a
protobuf JSON one but that's not really human writable and barely
human readable). Substrait does not have a text serialization. I
believe there is some desire for one (maybe Sasha wants to give it a
try?). A text format for S
t;>>> I was thinking of proving out a design here before going there. However
> >>>>> we
> >>>>> could also just go straight there :)
> >>>>>
> >>>>> Regarding infix operators and such the edge case I was thinking of is
> >>>
1. Yes.
2. I was going to say yes but...on closer examination...it appears
that it is not applying backpressure.
The SinkNode accumulates batches in a queue and applies backpressure.
I thought we were using a sink node since it is the normal "accumulate
batches into a queue" sink. However, the Su
> Does that sound like a reasonable way to do this?
It's not ideal.
I may be assuming here but I think your problem is more that there is
no way to more flexibly describe a source in python and less that you
need to change the default.
For example, if you could do something like this (in pyarrow
initialization seems cleaner to me because there are many
> >>> other
> >>> extension points that we initialize (add registering in the
> >>> default_exec_factory_registry
> >>> similar to
> >>> https://github.com/apache/arrow/blob/m
Yes. Something like:
if (ErrorIfNotOk(flight_writer->WriteRecordBatch(...))) return;
Today this method calls `output->ErrorReceived(...)`. The original
idea (I think) is that, possibly, a downstream node could "handle" the
error. However, in practice, nothing does that, and all errors
propagat
> Maybe to take a step back - why do we want this in the Arrow
> repositories/under Arrow governance?
I think this is the important question. What is the goal here?
If the goal is to help spread awareness then we can link to a repo
somewhere (e.g. a "projects that use Arrow" section or somethin
+1 for GH issues mainly because it lowers the barrier to entry and
JIRA won't be an acceptable solution any longer with infra's proposed
changes. I suspect I'd be +1 even without the infra change though
providing everyone else was willing to make the switch.
On Mon, Oct 24, 2022 at 8:19 AM Jacob
Congratulations Bogumił.
On Wed, Oct 26, 2022 at 6:10 AM Jacob Wujciak
wrote:
>
> Congrats!
>
> On Wed, Oct 26, 2022 at 8:31 AM Alenka Frim
> wrote:
>
> > Congratulations!
> >
> > On Wed, Oct 26, 2022 at 7:55 AM Daniël Heres
> > wrote:
> >
> > > Congratulations!
> > >
> > > On Wed, Oct 26, 2022
Congrats Jacob!
On Wed, Oct 26, 2022 at 6:10 AM Jacob Wujciak
wrote:
>
> Congrats!
>
> On Wed, Oct 26, 2022 at 8:31 AM Alenka Frim
> wrote:
>
> > Congratulations!
> >
> > On Wed, Oct 26, 2022 at 7:54 AM Daniël Heres
> > wrote:
> >
> > > Congratulations!
> > >
> > > On Wed, Oct 26, 2022, 07:50 B
Thanks Nic and congratulations!
On Wed, Oct 26, 2022 at 8:28 AM Raúl Cumplido wrote:
>
> Thanks Nic for your contributions!
>
> El mié, 26 oct 2022 a las 17:17, Antoine Pitrou ()
> escribió:
>
> >
> > Welcome, Nic!
> >
> >
> > Le 26/10/2022 à 16:37, Dewey Dunnington a écrit :
> > > Congrats, Nic!
Congratulations Ben!
On Wed, Oct 26, 2022 at 2:05 PM David Li wrote:
>
> Welcome Ben!
>
> On Wed, Oct 26, 2022, at 17:57, Ian Joiner wrote:
> > Congrats Ben!
> >
> > Ian
> >
> > On Wednesday, October 26, 2022, Sutou Kouhei wrote:
> >
> >> On behalf of the Arrow PMC, I'm happy to announce that Be
Congrats Eric!
On Wed, Oct 26, 2022 at 2:05 PM David Li wrote:
>
> Welcome Eric!
>
> On Wed, Oct 26, 2022, at 17:57, Ian Joiner wrote:
> > Congrats Eric!
> >
> > Ian
> >
> > On Wednesday, October 26, 2022, Sutou Kouhei wrote:
> >
> >> On behalf of the Arrow PMC, I'm happy to announce that Eric P
FileSystemDataset is part of the public API (and in a pxd file[1]). I
would agree it's fair to say that pyarrow datasets are no longer
experimental.
> Instead we subclass Dataset and return a custom scanner we created. And our
> Dataset subclass *should* be a FileSystemDataset subclass, but
> F
Congratulations
On Thu, Nov 3, 2022, 6:25 AM Patrick Horan wrote:
> Congrats Jiang!
>
> On Thu, Nov 3, 2022, at 1:52 AM, Wang Xudong wrote:
> > Congratulations!
> >
> > Yijie Shen 于2022年11月3日周四 11:08写道:
> >
> > > Congratulations Jiang!
> > >
> > > On Thu, Nov 3, 2022 at 9:54 AM vin jake wrote:
it
> supports appends, but we're working on full schema evolution / support. We
> had to do this outside of iceberg because we're not using parquet). Do you
> have documentation for how you're envisioning schema evolution to work in
> Arrow? Would you be open to chatting w
Indentation works well when you omit the other arguments (e.g. ...)
but once you mix in the arguments for the nodes (especially if those
arguments have their own indentation / structure) then it ends up
becoming unreadable I think. I prefer the idea of each node having
it's own block, with no inde
Congrats Jarrett!
On Thu, Nov 3, 2022 at 11:25 AM Jacob Wujciak
wrote:
>
> Congratulations!
>
> On Thu, Nov 3, 2022 at 2:40 PM Rok Mihevc wrote:
>
> > Congratulations!
> >
> > On Thu, Nov 3, 2022 at 12:31 AM David Li wrote:
> >
> > > Welcome Jarrett!
> > >
> > > On Tue, Nov 1, 2022, at 17:15, S
Congrats!
On Thu, Nov 3, 2022, 11:06 PM Benson Muite
wrote:
> Congratulations
> On 11/4/22 01:29, Vibhatha Abeykoon wrote:
> > Congratulations
> >
> > On Thu, Nov 3, 2022 at 7:09 PM Rok Mihevc wrote:
> >
> >> Congratulations!
> >>
> >> On Thu, Nov 3, 2022 at 12:31 AM David Li wrote:
> >>
> >>>
>From a datasets / Acero perspective I have been thinking about this in
the back of my mind for a while and decided to write my thoughts down
in a document. I will send it in a separate email.
On Tue, Nov 8, 2022 at 9:53 AM Micah Kornfield wrote:
>
> Hi Matthew,
> Could you give some more specif
I've created a document[1] that both describes the general idea of
schema evolution as well as my best guess at how it should work. This
is written from an Acero / datasets perspective but the information
should be generally applicable / accessible.
I am doing some work in the scanner to enable a
tions to it as they think of the next step.
> While here you need to think backward. Obviously you can append to the top
> as you write your pipeline ,but that's still a bit counterintuitive.
>
> Just my two cents.
>
>
>
> On Thu, Nov 3, 2022 at 8:08 PM Weston Pace wrot
Sorry about that. I've enabled it now.
On Wed, Nov 9, 2022, 9:34 PM Micah Kornfield wrote:
> It doesn't look like comment access is enabled?
>
> On Wed, Nov 9, 2022 at 5:16 PM Weston Pace wrote:
>
> > I've created a document[1] that both describes the genera
nvolves for each file figuring out how to convert to
> the desired. I found it easiest to do this per column of the desired
> schema. Then it can be (1) reference a column (2) reference a column and
> cast or (3) create a column of nulls of a given type.
>
> Is something like that
Congrats!
On Mon, Nov 14, 2022 at 8:14 AM L. C. Hsieh wrote:
>
> Congratulations!
>
> On Mon, Nov 14, 2022 at 7:21 AM Andy Grove wrote:
> >
> > Congratulations!
> >
> > On Mon, Nov 14, 2022 at 4:58 AM Andrew Lamb wrote:
> >
> > > Congratulations!
> > >
> > > On Sun, Nov 13, 2022 at 10:15 PM Yij
One thing to note is that you need to have something like "closes #123" in
the PR description or a comment in order for GitHub to close the relevant
issue when the PR is merged. This isn't too much of a burden to check I
think but took a bit of getting used to for me in Substrait where we use
the
Congratulations!
On Tue, Dec 6, 2022 at 7:57 AM Nic wrote:
>
> Congratulations!
>
> On Tue, 6 Dec 2022 at 15:49, Ian Cook wrote:
>
> > Congratulations Raúl!
> >
> > On Tue, Dec 6, 2022 at 10:43 AM Matt Topol wrote:
> > >
> > > Congrats Raúl!!
> > >
> > > On Tue, Dec 6, 2022 at 9:53 AM Dewey Dun
Congratulations Jacob!
On Thu, Dec 15, 2022 at 3:27 PM David Li wrote:
>
> Congrats & welcome Jacob!
>
> On Thu, Dec 15, 2022, at 18:14, Nic Crane wrote:
> > On behalf of the Arrow PMC, I'm happy to announce that Jacob Wujciak has
> > accepted an invitation to become a committer on Apache Arrow.
+1
I agree that run-end encoding makes more sense but also don't see it
as a deal breaker.
The most compelling counter-argument I've seen for new types is to
avoid a schism where some implementations do not support the newer
types. However, for the type proposed here I think the risk is low
beca
Congratulations!
On Sun, Dec 25, 2022, 9:44 PM Remzi Yang <1371656737...@gmail.com> wrote:
> Congratulation Andrew!
>
> On Mon, 26 Dec 2022 at 13:40, David Li wrote:
>
> > Congrats Andrew!
> >
> > On Mon, Dec 26, 2022, at 00:26, vin jake wrote:
> > > congratulation!
> > >
> > > Sutou Kouhei 于 2
There was a discussion a while back about representing complex numbers
that seems similar[1]. If both fields were the same type you could
use a fixed size list array. However, since you want two different
types you'd want some kind of "packed struct" which does not exist (to
my knowledge) today.
I think it would be reasonable to state that a reference
implementation must be a complete implementation (i.e. supports all
existing types) that is not derived from another implementation (e.g.
you can't pick pyarrow and arrow-c++). If an implementation does not
plan on ever supporting a new arra
Congratulations Jie!
On Sun, Jan 8, 2023 at 10:28 AM Rok Mihevc wrote:
>
> Congrats Jie!
>
> Rok
>
> On Sun, Jan 8, 2023 at 7:00 PM Raúl Cumplido wrote:
>
> > Congratulations Jie!
> >
> > El dom, 8 ene 2023, 18:45, David Li escribió:
> >
> > > Congrats Jie & welcome!
> > >
> > > On Sun, Jan 8,
Start:
There have been a few calls in the past for an improved workflow for
reviewing PRs. I think a bot that highlights pull requests that need
attention (e.g. has no reviews in the "changes requested" state, also
some way of knowing how long it's been waiting) would be helpful.
There has been
On further thought it seems a little odd to me that crashes are not
critical. However, many of our crashes are from a failure to properly
validate user input, which I agree isn't as critical. Would it be too
nuanced to say that:
* A crash, given valid input, is critical
* A crash, given invali
I've got a fix[1] in for the verification script for C#. There are
more details in the issue and the PR but IMO we are compatible with
C#7 and C#6, we simply were not testing it correctly. I have run the
tests locally with both 6.0 and 7.0 sdks and they passed.
[1] https://github.com/apache/arro
1 - 100 of 445 matches
Mail list logo