> At the moment as we
are not exposing the execution engine primitives to Python user, are you
expecting to expose them by this approach.
>From our side, these APIs are not directly exposed to the end user, but
rather, primitives that allow us to build on top of.
The end user would just do sth li
What Yaron is going for is really something similar to custom data source
in Spark (
https://levelup.gitconnected.com/easy-guide-to-create-a-custom-read-data-source-in-apache-spark-3-194afdc9627a)
that allows utilizing existing Python APIs that knows how to read data
source as a stream of record ba
Actually, "UDF" might be the wrong terminology here - This is more of a
"custom Python data source" than "Python user defined functions". (Although
under the hood it can probably reuse lots of the UDF logic to execute the
custom data source)
On Fri, Jun 3, 2022 at
Yaron,
Do you mind also linking the previous mailing list discussion here?
On Wed, Jun 22, 2022 at 11:40 AM Yaron Gvili wrote:
> Hi,
>
> I'd like to get the community's feedback about a design proposal
> (discussed below) for integrating user-defined Python-based data-sources in
> Arrow. This i
Hi,
I just noticed there is no specific lib file for Acero/Arrow compute when I
have BUILD_COMPUTE=ON - is it included in the libarrow.so?
Thanks!
Li
obably want to keep casts
> (and perhaps a few other kernels) in the main library. AIUI, we may also
> want to split out acero/the "engine" as well (or at least give it its own
> CMake flag eventually).
>
> https://issues.apache.org/jira/browse/ARROW-8891
>
> -David
>
&
Yaron, do we need to parse the subtrait protobuf in Python so that we can
get the UDFs and register them with Pyarrow?
On Mon, Jul 4, 2022 at 1:24 PM Yaron Gvili wrote:
> This rewriting of the package is basically what I had in mind; the `_ep`
> was just to signal a private package, which cannot
Hello,
I wonder if we have nightly source tarball published somewhere?
Li
thub.com/apache/arrow/archive/refs/heads/master.tar.gz
> [2]: https://github.com/ursacomputing/crossbow/releases
>
> On Thu, Jul 7, 2022, at 10:39, Li Jin wrote:
> > Hello,
> >
> > I wonder if we have nightly source tarball published somewhere?
> >
> > Li
>
and it seems we actually
> also upload an sdist there.
>
> (it could still be more reliable to used HEAD, though, if you want to be
> sure to always have the latest. If our nightly release CI is failing, the
> index might be outdated for some days)
>
> Joris
>
> On Thu, 7 J
Thanks! This is very helpful.
On Thu, Jul 7, 2022 at 11:33 AM Jacob Wujciak wrote:
> The crossbow tarballs do not contain the arrow source, they only contain
> the crossbow source (aka a few yaml files).
>
> On Thu, Jul 7, 2022 at 5:29 PM Li Jin wrote:
>
> > Thanks bot
Hello,
I am trying to build Arrow/Pyarrow with our internal build system (cmake
based) and encounter and error when running pyarrow test:
ImportError while importing test module
'/home/ljin/vats/add-arrowpython-master/ext/public/python/pyarrow/master/dist/lib/python3.9/pyarrow/tests/test_table.py
TE enabled, is that the case?
>
>
>
>
> Le 07/07/2022 à 22:16, Li Jin a écrit :
> > Hello,
> >
> > I am trying to build Arrow/Pyarrow with our internal build system (cmake
> > based) and encounter and error when running pyarrow test:
> >
> > I
> TableSourceNode wouldn't need to allocate since it runs against memory
that's already been allocated.
Is the memory "that is already allocated" tracked in any allocators? For an
end to end benchmark of "scan - join - write" I think would make sense to
include all arrow memory allocation (if that
your table. This might give you
> something to compare/contrast allocation of an individual node with.
>
> On Mon, Jul 11, 2022 at 2:04 PM Li Jin wrote:
> >
> > > TableSourceNode wouldn't need to allocate since it runs against memory
> > that's already been alloc
Hello!
I am new to flight and want to look into implementing a C++ client for our
existing flight-based data service. I don't really know where to start so
wonder if some resources/pointers can be shared?
Thanks,
Li
apache.org/docs/cpp/flight.html
>
> -David
>
> On Wed, Jul 13, 2022, at 15:32, Li Jin wrote:
> > Hello!
> >
> > I am new to flight and want to look into implementing a C++ client for
> our
> > existing flight-based data service. I don't really know where to s
Hello!
I am working on integrating the latest Arrow C++ into our internal build
system. Currently I am planning to build substrait C++ classes
independently and provide header locations and so files to the Arrow
Cmakefile - I wonder if that is a good approach? (We cannot download the
substrait tar
t
> ninja-debug`, I'm getting a libsubstrait.a in build/debug. I'm not familiar
> enough with Arrow's build system to provide more help there.
>
> Regards,
> Jeroen
>
> On Mon, 18 Jul 2022 at 18:00, Li Jin wrote:
>
> > Hello!
> >
> > I am working on
SUBSTRAIT_URL works for both a *.tar.gz file and a repository
> > directory. In my experience, there is no need to also set any
> > sha256-related setting.
> >
> >
> > Yaron.
> >
> > From: Li Jin
> > Sent: Monda
Thanks Weston, two follow up questions:
(1) What is the threading model when passing "exector=nullptr" to
"ExecContext" ? (Does it only uses one thread?)
(2) For the file reader, if we want to ensure batches coming out of the
reader are ordered but also have parallelism, I'd imagine doing sth like
Hi!
Since the scheduler improvement work came up in some recent discussions
about how backpresures are handled in Acero, I am curious if there has been
any more progress on this since May or any future plans?
Thanks,
Li
On Mon, May 23, 2022 at 10:37 PM Weston Pace wrote:
> > About point 2. I h
Hi,
Ivan and I are debugging some behavior of the source node this morning and
I was hoping to clarify that our understanding is correct.
We observed that when using source node with a generator:
https://github.com/apache/arrow/blob/66c66d040bbf81a4819b276aee306625dc02837c/cpp/src/arrow/compute/e
Sorry the link to the generator above is wrong - We traced into the code
and found it uses BackgroundGenerator:
https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L1581
On Mon, Jul 25, 2022 at 11:07 AM Li Jin wrote:
> Hi,
>
n how to
> obtain such a guarantee.
>
>
> Yaron.
>
> From: Li Jin
> Sent: Monday, July 25, 2022 11:10 AM
> To: dev@arrow.apache.org
> Subject: Re: [C++] Clarifying the behavior of source node and executor
>
> Sorry the link to the gen
27;ll
> look into adding this sequential-option to source-node and report back.
>
>
> Yaron.
> ________
> From: Li Jin
> Sent: Monday, July 25, 2022 11:39 AM
> To: dev@arrow.apache.org
> Subject: Re: [C++] Clarifying the behavior of source node and ex
x27;t think I can throw out any specific dates but I think it is
> safe to say that these issues are important to Voltron Data as well.
>
> [1] https://issues.apache.org/jira/browse/ARROW-16072
> [2] https://issues.apache.org/jira/browse/ARROW-15732
> [3] https://issues.apache.or
Hi!
I saw this error when linking my code against arrow flight and suspect I
didn't write my cmake correctly:
"error: undefined reference to arrow::flight::Location::Location()"
I followed https://arrow.apache.org/docs/cpp/build_system.html#cmake and
linked my executable with arrow_shared. Is th
(This is with Arrow 7.0.0)
On Fri, Jul 29, 2022 at 3:52 PM Li Jin wrote:
> Hi!
>
> I saw this error when linking my code against arrow flight and suspect I
> didn't write my cmake correctly:
>
> "error: undefined reference to arrow::flight::Location::Loca
1]. You can see a small workaround at [2].
>
> [1]: https://issues.apache.org/jira/browse/ARROW-12175
> [2]:
> https://github.com/apache/arrow-adbc/blob/41daacca08db041b52b458503e713a80528ba65a/c/drivers/flight_sql/CMakeLists.txt#L28-L31
>
> -David
>
> On Fri, Jul 29, 2022, a
Also, if it is the google re2, is there a minimum version required?
Currently my system has re2 from 20201101.
On Fri, Jul 29, 2022 at 4:45 PM Li Jin wrote:
> Thanks David!
>
> I used the code in the sql flight Cmakelist. Unfortunately I hit another
> error, I wonder if you happ
(Nvm the libre2 error, It was my mistake)
On Fri, Jul 29, 2022 at 4:49 PM Li Jin wrote:
> Also, if it is the google re2, is there a minimum version required?
> Currently my system has re2 from 20201101.
>
> On Fri, Jul 29, 2022 at 4:45 PM Li Jin wrote:
>
>> Thanks David!
Hello!
We recently updated Arrow to 7.0.0 and hit some error with our old code
(Details below). I wonder if there is a new way to do this with the current
version?
import pyarrow
import pyarrow.parquet as pq
df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]})
uri = "gs://amp_bucket_liao/tr
Thanks! Removing the "gs://" prefix indeed fixes it.
On Tue, Aug 2, 2022 at 4:01 PM Will Jones wrote:
> Hi Li Jin,
>
> I'm not sure yet what changed, but I believe you can fix that error simply
> by omitting the scheme prefix from the URI and just use the page when
&g
Hi - Gently bump this. I suspect this is an upstream issue and wonder if
this is a known issue. Is there any other information we can provide? (I
think the repro is pretty straightforward but let us know otherwise)
On Mon, Aug 8, 2022 at 8:16 PM Alex Libman wrote:
> Hi,
>
> I've hit an issue in
t;
> [1] https://issues.apache.org/jira/browse/ARROW-16072
> [2] https://issues.apache.org/jira/browse/ARROW-15732
>
> On Wed, Aug 10, 2022 at 1:15 PM Li Jin wrote:
> >
> > Hi - Gently bump this. I suspect this is an upstream issue and wonder if
> > this is a known i
Yaron, how does the asof join tests normally take?
On Wed, Aug 17, 2022 at 6:13 AM Yaron Gvili wrote:
> Sorry, yes, C++. The failed job is
> https://github.com/apache/arrow/runs/7839062613?check_suite_focus=true
> and it timed out on code I wrote (in a PR, not merged). I'd like to avoid a
> time
Hi,
I have a Flight data source (effectively a Flight::StreamReader) and I'd
like to create an Acero source node from it. I wonder if something already
exists to do that or if not, perhaps some pointers for me to take a look
at?
Thanks,
Li
Correction: I have a flight::FlightStreamReader (not Flight::StreamReader)
On Wed, Aug 17, 2022 at 12:12 PM Li Jin wrote:
> Hi,
>
> I have a Flight data source (effectively a Flight::StreamReader) and I'd
> like to create an Acero source node from it. I wonder if something al
but just wanted to mention that I am going
> to
> > > try and figure this out quite a bit in the next week. I can try to
> create
> > > some relevant cookbook recipes as I plod along.
> > >
> > > Aldrin Montana
> > > Computer Science PhD Student
> &
Hello!
I have recently started to look into integrating Flight RPC with Acero
source/sink node.
In Flight, the life cycle of a "read" request looks sth like:
- User specifies a URL (e.g. my_storage://my_path) and parameter (e.g.,
begin = "20220101", end = "20220201")
- Client issue GetF
g with various ways of getting the actual schema, depending on what
> exactly your service supports.) Once you have a Dataset, you can create an
> ExecPlan and proceed like normal.
>
> Of course, if you then want to get things into Python, R, Substrait,
> etc... that requires s
Hi,
I am trying to update Pyarrow from 7.0 to 9.0 and hit a couple of issues
that I believe are because of some API changes. In particular, two issues I
saw seems to be
(1) pyarrow.read_schema is removed
(2) pa.Table.to_batches no longer takes a keyword argument (chunksize)
What's the best way t
but just wondering in general where do
I look first if I hit this sort of issue in the future.
On Fri, Sep 9, 2022 at 12:20 PM Li Jin wrote:
> Hi,
>
> I am trying to update Pyarrow from 7.0 to 9.0 and hit a couple of issues
> that I believe are because of some API changes. In par
.0-release/
> [3]
> https://github.com/apache/arrow/blame/3eb5673597bf67246271b6c9a98e6f812d4e01a7/python/pyarrow/table.pxi#L1991
> [4]
> https://github.com/apache/arrow/blob/apache-arrow-7.0.0/python/pyarrow/__init__.py#L368
>
> On Fri, Sep 9, 2022 at 10:15 AM Li Jin wro
twork
> to get the schema on its own.
>
> Given the above, I agree with you that when the Acero node is created its
> schema would already be known.
>
>
> Yaron.
>
> From: Li Jin
> Sent: Thursday, September 1, 2022 2:49 PM
> To: dev
cept it. You would need to know the schema when configuring the
> SourceNode, but you won't need to derived from SourceNode.
>
>
> Yaron.
> ________
> From: Li Jin
> Sent: Tuesday, September 13, 2022 3:58 PM
> To: dev@arrow.apache.org
> Subje
; > and convert this into a record batch reader. Then it would create one
> > > of the node's that Yaron has contributed and return that.
> > >
> > > However, it might be nice if "open a connection to the flight
> > > endpoint" happened
Hi,
Recently I am working on adding a custom data source node to Acero and was
pointed to a few examples in the dataset code.
If I understand this correctly, the registering of dataset exec node is
currently happening when this is loaded:
https://github.com/apache/arrow/blob/master/python/pyarrow
Hello!
I am testing a custom data source node I added to Acero and found myself in
need of collecting the results from an Acero query into memory.
Searching the codebase, I found "StartAndCollect" is what many of the tests
and benchmarks are using, but I am not sure if that is the public API to d
gt;
> We could probably also add a DeclarationToReader method in the future.
>
> [1] https://github.com/apache/arrow/pull/13782
>
> On Wed, Sep 21, 2022 at 8:26 AM Li Jin wrote:
> >
> > Hello!
> >
> > I am testing a custom data source node I added to A
.pyx when the python module is loaded.
> I don't know cython well enough to know how exactly it triggers the
> datasets shared object to load.
>
> On Tue, Sep 20, 2022 at 11:01 AM Li Jin wrote:
> >
> > Hi,
> >
> > Recently I am working on adding a custom da
Hello!
I am working on adding a custom data source node in Acero. I have a few
previous threads related to this topic.
Currently, I am able to register my custom factory method with Acero and
create a Custom source node, i.e., I can register and execute this with
Acero:
MySourceNodeOptions sourc
is
later in favor of a more generic solution.
Thoughts?
Li
On Mon, Sep 26, 2022 at 10:58 AM Li Jin wrote:
> Hello!
>
> I am working on adding a custom data source node in Acero. I have a few
> previous threads related to this topic.
>
> Currently, I am able to register my cu
provide user configurable
> dispatching for named tables;
> if it doesn't address your use case then we might want to create a JIRA to
> extend it.
>
> On Tue, Sep 27, 2022 at 10:41 AM Li Jin wrote:
>
> > I did some more digging into this and have some ideas -
> >
>
own version of these files to build your Python module separately.
> This is where you would add a build flag for pulling in C++ header files
> for your Python module, under "python/pyarrow/include", and for making it.
>
>
> Yaron.
>
&
Hi,
I am testing integration between ibis-substrait and Acero but hit a
segmentation fault. I think this might be cause the way I am
integrating these two libraries are wrong, here is my code:
Li Jin
1:51 PM (1 minute ago)
to me
class BasicTests(unittest.TestCase):
"""Test
ssed"
Looking the plan reproduces by ibis-substrait, it looks like doesn't match
the expected format of Acero consumer. In particular, it looks like the
plan produced by ibis-substrait doesn't have a "relations" entry - any
thoughts on how this can be fixed? (I don't kno
For reference, this is the "relations" entry that I was referring to:
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_substrait.py#L186
On Tue, Oct 4, 2022 at 3:28 PM Li Jin wrote:
> So I made some progress with updated code:
>
> t = ibis.table([
PM Will Jones wrote:
> Hi Li Jin,
>
> The original segfault seems to occur because you are passing a Python bytes
> object and not a PyArrow Buffer object. You can wrap the bytes object using
> pa.py_buffer():
>
> pa.substrait.run_query(pa.py_buffer(result_bytes), table_provide
name {names}")
reader =
pa.substrait.run_query(pa.py_buffer(result.SerializeToString()),
table_provider)
result_table = reader.read_all()
self.assertTrue(result_table == test_table_0)
First successful run with ibis/substrait/acero - Hooray
On Wed, Oct 5, 2
Disclaimer: Not ibis-substrait dev here
ibis-substrait has a "decompiler";
https://github.com/ibis-project/ibis-substrait/blob/main/ibis_substrait/tests/compiler/test_decompiler.py
that takes substrait and returns ibis expression, then you can run ibis
expression with ibis's pandas backend:
https:
Hello!
I have some questions about how "pyarrow.substrait.run_query" works.
Currently run_query returns a record batch reader. Since Acero is a
push-based model and the reader is pull-based, I'd assume the reader object
somehow accumulates the batches that are pushed to it. And I wonder
(1) Does
te batches in a queue (just like the sink node) but it is
> not handling backpressure. I've created [1] to track this.
>
> [1] https://issues.apache.org/jira/browse/ARROW-18025
>
> On Wed, Oct 12, 2022 at 9:02 AM Li Jin wrote:
> >
> > Hello!
> >
> > I have
r;
}
"""
And then calling `pa.substrat.run_query" should pick up the custom name
table provider.
Does that sound like a reasonable way to do this?
On Tue, Sep 27, 2022 at 1:59 PM Li Jin wrote:
> Thanks both. I think NamedTableProvider is close to what I want, and like
>
ate_my_custom_options())
>
> def table_provider(names):
> return custom_sources[names[0]]
>
> pa.substrait.run_query(my_plan, table_provider=table_provider)
> ```
>
> On Thu, Oct 13, 2022 at 8:24 AM Li Jin wrote:
> >
> > We did some work around this recently and
object should I return with create_my_custom_options()?
Currently I only have a C++ class for my custom option.
On Thu, Oct 13, 2022 at 12:58 PM Li Jin wrote:
> > I may be assuming here but I think your problem is more that there is
> no way to more flexibly describe a source in python and less
dFactory("my_custom_node",
MakeMyCustomNode)
...
"""
On Thu, Oct 13, 2022 at 1:32 PM Li Jin wrote:
> Weston - was trying the pyarrow approach you suggested:
>
> >def custom_source(endpoint):
> return pc.Declaration("my_custom_source", create_my_custom_o
x27;t sound like the correct way, I am happy to do this
correctly but someone let me know the correct way :)
Li
On Thu, Oct 13, 2022 at 2:01 PM Li Jin wrote:
> Going back to the default_exec_factory_registry idea, I think ultimately
> maybe we want registration API that
Hello!
I am trying to implement an ExecNode in Acero that receives the input
batch, writes the batch to the FlightStreamWriter and then passes the batch
to the downstream node.
Looking at the API, I am thinking of doing sth like :
void InputReceived(ExecNode* input, ExecBatch batch) {
# turn
congrats!
On Thu, Oct 27, 2022 at 9:03 PM Matt Topol wrote:
> Congrats Will!
>
> On Thu, Oct 27, 2022 at 9:02 PM Ian Cook wrote:
>
> > Congratulations Will!
> >
> > On Thu, Oct 27, 2022 at 19:56 Sutou Kouhei wrote:
> >
> > > On behalf of the Arrow PMC, I'm happy to announce that Will Jones
> >
Hello,
I am working on converting some internal data sources to Arrow data. One
particularly sets of data we have contains many string columns that can be
dictionary-encoded (basically string enums)
The current internal C++ API I am using gives me an iterator of "row"
objects, for each string col
"
In this case though, it's just that we purposely hide symbols by default.
If there's a use case, we could unhide this specific symbol (we did it for
one other Protobuf symbol) which would let you externally generate and use
the headers (as long as you take care not to actually include the generat
Hello!
I have some questions about type casting memory usage with pyarrow Table.
Let's say I have a pyarrow Table with 100 columns.
(1) if I want to cast n columns to a different type (e.g., float to int).
What is the smallest memory overhead that I can do? (memory overhead of 1
column, n columns
Asking (2) because IIUC this is a metadata operation that could be zero
copy but I am not sure if this is actually the case.
On Wed, Feb 15, 2023 at 10:17 AM Li Jin wrote:
> Hello!
>
> I have some questions about type casting memory usage with pyarrow Table.
> Let's say I hav
00:00:00.09998,1970-01-01
00:00:00.0]]
On Wed, Feb 15, 2023 at 2:52 PM Rok Mihevc wrote:
> I'm not sure about (1) but I'm pretty sure for (2) doing a cast of tz-aware
> timestamp to tz-naive should be a metadata-only change.
>
> On Wed, Feb 15, 2023 at
Not sure if this is actually a bug or expected behavior - I filed
https://github.com/apache/arrow/issues/34210
On Wed, Feb 15, 2023 at 4:15 PM Li Jin wrote:
> Hmm..something feels off here - I did the following experiment on Arrow 11
> and casting timestamp-naive to int64 is much faste
Oh found this comment:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156
On Wed, Feb 15, 2023 at 4:23 PM Li Jin wrote:
> Not sure if this is actually a bug or expected behavior - I filed
> https://github.com/apache/arrow/issues/34210
he array is timezone aware.
>
> On Wed, Feb 15, 2023 at 10:37 PM Li Jin wrote:
>
> > Oh found this comment:
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc#L156
> >
> >
> >
> > On Wed, Feb
Thanks Weston for the information.
On Thu, Feb 16, 2023 at 1:32 PM Weston Pace wrote:
> There is a little bit at the end-to-end level. One goal is to be able to
> repartition a very large dataset. This means we read from something bigger
> than memory and then write to it. This workflow is te
Hi,
I recently came across some limitations in expressing timestamp type with
Substrait in the Acero substrait consumer and am curious to hear what
people's thoughts are.
The particular issue that I have is when specifying timestamp type in
substrait, the unit is "microseconds" and there is no wa
Congratulations Will!
On Mon, Mar 13, 2023 at 3:27 PM Bryce Mecum wrote:
> Congratulations, Will!
>
rk
> here will be pretty easy. The trickier part might be adapting your
> producer (Ibis?)
>
> On Thu, Mar 9, 2023 at 9:43 AM Li Jin wrote:
>
> > Hi,
> >
> > I recently came across some limitations in expressing timestamp type with
> > Substrait in the Ace
Late to the party.
Thanks Weston for sharing the thoughts around Acero. We are actually a
pretty heavy Acero user right now and are trying to take part in Acero
maintenance and development. Internally we are using Acero for a time
series streaming data processing system.
I would +1 on many of Wes
Hi,
This might be a dumb question but when Arrow code raises an invalid status,
I observe that it usually pops up to the user without stack information. I
wonder if there are any tricks to show where the invalid status is coming
from?
Thanks,
Li
a rough
> stack trace (IIRC, if a function returns the status without using one of
> the macros, it won't add a line to the trace).
>
> [1]:
> https://github.com/apache/arrow/blob/1ba4425fab35d572132cb30eee6087a7dca89853/cpp/cmake_modules/DefineOptions.cmake#L608-L609
>
> On
Hello,
I recently found myself casting an int64 (nanos from epoch) into a nano
timestamp column with the C++ cast kernel (via Acero).
I expect this to be zero copy but I wonder if there is a way to check which
casts are zero copy and which are not?
Li
Thanks Rok!
Original question is to asking for a way to "verify if a cast if zero copy
by read source code / documentation", and not "verify a cast if zero copy
programmatically" but I noticed by reading the test file that int64 to
micro is indeed zero copy and I expect nanos to be the same
https:
his, std::move(batch))
/home/icexelloss/workspace/arrow/cpp/src/arrow/acero/hash_aggregate_test.cc:271
start_and_collect.MoveResult()
```
Is this because of the ARROW_EXTRA_ERROR_CONTEXT option?
On Fri, Mar 24, 2023 at 12:04 PM Li Jin wrote:
> Thanks David!
>
> On Tue, Mar 21, 2023 at 6:32
Thanks David!
On Tue, Apr 4, 2023 at 4:58 PM David Li wrote:
> Yes, that's what the ARROW_EXTRA_ERROR_CONTEXT option does.
>
> On Tue, Apr 4, 2023, at 11:13, Li Jin wrote:
> > Picking up this conversation again, I noticed when I hit an error in
> > test I
>
Hi,
Is there a github command to rerun CI checks? (instead of pushing a new
commit?)
Thanks,
Li
UI. If you want to avoid having
> to add small changes to be able to commit you can use empty commits via
> '--allow-empty'.
>
> On Mon, Apr 17, 2023 at 5:25 PM Li Jin wrote:
>
> > Hi,
> >
> > Is there a github command to rerun CI checks? (instead of pushing a new
> > commit?)
> >
> > Thanks,
> > Li
> >
>
gt;
> > The UI was recently updated:
> >
> >
> https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs#re-running-failed-jobs-in-a-workflow
> >
> > On Mon, Apr 17, 2023 at 7:57 PM Li Jin wrote:
> >
> >> Thanks!
r doing that, so you
> should be able to give that a try.
>
> We don't have a way of running PR checks as we do with the crossbow
> command. We could investigate if there is a way to do it via API.
>
> Thanks,
> Raúl
>
> El mar, 18 abr 2023 a las 14:47, Li Jin ()
>
Hello,
I am looking for the best ways for converting Pandas DataFrame <-> Struct
Array.
Currently I have:
pa.RecordBatch.from_pandas(df).to_struct_array()
and
pa.RecordBatch.from_struct_array(s_array).to_pandas()
- I wonder if there is a direct way to go from DataFrame <-> Struct Array
withou
Gentle bump.
Not a big deal if I need to use the API above to do so, but bump in case
someone has a better way.
On Fri, Jun 9, 2023 at 4:34 PM Li Jin wrote:
> Hello,
>
> I am looking for the best ways for converting Pandas DataFrame <-> Struct
> Array.
&g
dtype(df.dtypes[col])) for col in
> > df.columns]
> > pa_type = pa.struct(fields)
> > pa.array(df.itertuples(index=False, type=pa_type)
> >
> > But this seems like a classic XY problem. What is the root issue you're
> > trying to solve? Why avoid RecordBatch?
Hi,
I am trying to write a function that takes a stream of record batches
(where the last column is group id), and produces k record batches, where
record batches k_i contain all the rows with group id == i.
Pseudocode is sth like:
def group_rows(batches, k) -> array[RecordBatch] {
builder
and I'm maybe a little uncertain what
> the difference is between this ask and the capabilities added in [1].
>
> [1] https://github.com/apache/arrow/pull/35514
>
> On Tue, Jun 13, 2023 at 8:23 AM Li Jin wrote:
>
> > Hi,
> >
> > I am trying to write a funct
(Admittedly, PR title of [1] doesn't reflect that only the scalar aggregate
UDF is implemented and not the hash one - that is an oversight on my part -
sorry)
On Tue, Jun 13, 2023 at 3:51 PM Li Jin wrote:
> Thanks Weston.
>
> I think I found what you pointed out to me before whi
1 - 100 of 357 matches
Mail list logo