Hi Li,
I've faced this issue before, and I ended up using a generic ArrayBuilder,
for example:
```cpp
auto type = int32();
std::vector> scalars = {MakeScalar(1),
MakeScalar(2)};
ARROW_ASSIGN_OR_RAISE(std::unique_ptr builder,
MakeBuilder(type));
ARROW_RETURN_NOT_OK(builder->AppendScalars(scalars)
+ dev@parquet
On Fri, Jun 16, 2023 at 7:43 AM Jacob Wujciak-Jens
wrote:
> +1 on the update but also on properly communicating the change to avoid
> surprising issues :)
>
> On Thu, Jun 15, 2023 at 7:53 PM Joris Van den Bossche <
> jorisvandenboss...@gmail.com> wrote:
>
> > On Thu, 15 Jun 2023 at
Hi Ben,
The posted benchmark [1] looks pretty good to me. However, I want to
raise a possible issue from the perspective of parquet-cpp. Parquet-cpp
uses a customized parquet::ByteArray type [2] for string/binary, I would
expect some regression of conversions between parquet reader/writer
and the
Note that you can ask pyarrow how much memory it thinks it is using with
the pyarrow.total_allocated_bytes[1] function. This can be very useful for
tracking memory leaks.
I see that memory-profiler now has support for different backends. Sadly,
it doesn't look like you can register a custom backe
Hi,
Ah, sorry. I should have written it in the original e-mail.
If we can require CMake 3.16+:
* We can always use the precompiled headers feature that
reduces build time:
https://github.com/apache/arrow/pull/35921/files#diff-1bba462ab050e89360fd88110a689e85ee037749cea091a1848ab574381d3795L
Hello,
I would like to propose the following release candidate (RC0) of Apache Arrow
ADBC version 0.5.0. This is a release consisting of 36 resolved GitHub issues
[1].
This release candidate is based on commit:
ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
The source release rc0 is hosted at [
> Even if ListView is rarely used for interoperability (if it never gains
wide adoption), some of the arrow implementations could use ListView to
offer faster computation kernels, which I think has real value
This is an important point, thanks for the clear phrasing Andrew!
On Thu, Jun 15, 2023 a
+1 on the update but also on properly communicating the change to avoid
surprising issues :)
On Thu, Jun 15, 2023 at 7:53 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:
> On Thu, 15 Jun 2023 at 19:08, Ian Cook wrote:
> >
> > It will still be possible to write files using Parquet
+1 on 3.16 and dropping amazon linux 2 (as that is recommended by aws).
@antonie 3.14+ has a number of improvements to FetchContent that we could
use to vastly improve our bundled dependency system. There are also
improvements to precompiled headers etc. an overview of some of the changes
in each
Hello!
The benchmark executables are placed in the same directory as the other
test executables:
https://github.com/apache/arrow/blob/b4ac585ecb4da610cc64e346e564ca86594aec53/cpp/cmake_modules/BuildUtils.cmake#L614.
This means that if somebody builds the benchmarks with
`ARROW_BUILD_BENCHMARK=ON
Hi,
I'd ask the question differently: what do we gain from requiring 3.16
rather than 3.13?
Le 15/06/2023 à 23:19, Sutou Kouhei a écrit :
Hi,
We require CMake 3.5+ now because Ubuntu 18.04 ships 3.5.
We dropped support for Ubuntu 18.04 because it reached EOL.
Can we require CMake 3.16+ i
Hi,
I find myself in need of a function to turn a vector of Scalar to an Array
of the same datatype. The data type is known at the runtime. e.g.
shared_ptr concat_scalars(vector values.
shared_ptr type);
I wonder if I need to use sth like Scalar::Accept(ScalarVisitor*) or is
there an easier/bett
Hi,
We require CMake 3.5+ now because Ubuntu 18.04 ships 3.5.
We dropped support for Ubuntu 18.04 because it reached EOL.
Can we require CMake 3.16+ in Apache Arrow C++ 13.0.0?
Here are CMake versions of our supported platforms:
* Ubuntu 20.04: CMake 3.16
* CentOS 7: CMake 3.17
* Debian GNU/Lin
Cool. Thanks for doing that!
On Thu, Jun 15, 2023 at 12:40 Benjamin Kietzman wrote:
> I've added https://github.com/apache/arrow/issues/36112 to track
> deduplication of buffers on write.
> I don't think it would require modification of the IPC format.
>
> Ben
>
> On Thu, Jun 15, 2023 at 1:30 PM
I've added https://github.com/apache/arrow/issues/36112 to track
deduplication of buffers on write.
I don't think it would require modification of the IPC format.
Ben
On Thu, Jun 15, 2023 at 1:30 PM Matt Topol wrote:
> Based on my understanding, in theory a buffer *could* be shared within a
> b
On Thu, 15 Jun 2023 at 19:08, Ian Cook wrote:
>
> It will still be possible to write files using Parquet 2.4 by
> explicitly specifying the 2.4 version to the Parquet writer, correct?
> If yes, that provides a simple workaround for users who encounter
> compatibility issues.
Indeed. Using the pya
Based on my understanding, in theory a buffer *could* be shared within a
batch since the flatbuffers message just uses an offset and length to
identify the buffers.
That said, I don't believe any current implementation actually does this or
takes advantage of this in any meaningful way.
--Matt
O
On 2023/06/15 16:24:44 Joris Van den Bossche wrote:
> Hi all,
>
> Bringing up https://github.com/apache/arrow/issues/35746 to the
> mailing list: this issue proposes to bump the default Parquet version
> we use for writing to Parquet files in the C++ library (and in the
> various bindings including
It will still be possible to write files using Parquet 2.4 by
explicitly specifying the 2.4 version to the Parquet writer, correct?
If yes, that provides a simple workaround for users who encounter
compatibility issues.
However we should take care to document this as a potentially breaking
change,
Hi Ben,
It's exciting to see this move along.
The buffers will be duplicated. If buffer duplication is becomes a concern,
> I'd prefer to handle
> that in the ipc writer. Then buffers which are duplicated could be detected
> by checking
> pointer identity and written only once.
Question: to be
Hi all,
Bringing up https://github.com/apache/arrow/issues/35746 to the
mailing list: this issue proposes to bump the default Parquet version
we use for writing to Parquet files in the C++ library (and in the
various bindings including pyarrow and R arrow) from the current
default of "2.4" to "2.6
Hi Alex,
I think you're misinterpreting the results. Yes, the RSS memory (as
reported by memory_profiler) doesn't seem to decrease. No, it doesn't
mean that Arrow doesn't release memory. It's actually common for memory
allocators (such as jemalloc, or the system allocator) to keep
deallocat
Hello again all,
The PR [1] to add string view to the format and the C++ implementation is
hovering around passing CI and has been undrafted. Furthermore, there is
now also a PR [2] to add string view to the Go implementation. Code review
is underway for each PR and I'd like to move toward a vote
Hi Experts,
I have come across the memory pool configurations using an environment
variable *ARROW_DEFAULT_MEMORY_POOL* and I tried to make use of them and
test it.
I could observe improvements on macOS with the *system* memory pool but no
change on linux os. I have captured more details on GH is
On Wed, Jun 14, 2023 at 5:07 PM Raphael Taylor-Davies
wrote:
> Even something relatively straightforward becomes a huge implementation
> effort when multiplied by a large number of codebases, users and
> datasets. Parquet is a great source of historical examples of the
> challenges of incremental
On Thu, Jun 15, 2023 at 7:19 AM Andy Grove wrote:
> The vote passes with 4 +1 votes (3 binding). Thanks, everyone.
>
> Source:
> https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-python-26.0.0
>
> PyPi: https://pypi.org/project/datafusion/26.0.0/
>
> On Mon, Jun 12, 2023 at 6:26 A
The vote passes with 4 +1 votes (3 binding). Thanks, everyone.
Source:
https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-python-26.0.0
PyPi: https://pypi.org/project/datafusion/26.0.0/
On Mon, Jun 12, 2023 at 6:26 AM Jeremy Dyer wrote:
> +1 (non-binding)
>
> Verified using veri
I want to be clear, insofar that ListView makes using the arrow libraries
more attractive to system developers, I am in favor of adding it.
Arrow the specification is focused on interoperability. Arrow the libraries
(specifically the compute kernels included in many implementations) also
offer fas
28 matches
Mail list logo