gt; 2. (Less likely) It could have something to do with whether exceptions
> have been compiling the parquet library (Arrow catches exceptions and
> translates them to statuses).
>
>
> -Micah
>
> On Sat, May 21, 2022 at 9:55 PM Rares Vernica wrote:
>
> > Hello,
> >
&
Hello,
We have a plugin that writes both Arrow and Parquet format files. We are
experiencing an issue when using Parquet format, while Arrow format works
just fine. More exactly, the process crashes in parquet::arrow::WriteTable.
Using gdb we identified the line when the process crashes
https://g
Hi Dragos,
It still fails after setting the environment variable. Here is the log.
Cheers,
Rares
Setup: centos:7 Docker container, R and related packages installed with yum
/> cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
/> export ARROW_R_DEV=true
/> R
R version 3.6.0 (2019-04-2
Hello,
I'm trying to do install.packages("arrow") in R 3.6.0 on CentOS 7 and it
errors out like this:
$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
$ R
R version 3.6.0 (2019-04-26) -- "Planting of a Tree"
> install.packages("arrow")
...
** building package indices
** installing
cmake#L20
>
> On Thu, Sep 23, 2021 at 6:18 PM Rares Vernica wrote:
> >
> > Hello,
> >
> > I managed to get Thrift 0.12.0 compiled and installed from source on my
> > CentOS 7 setup. I configured it like so, mimicking what
> > ThirdpartyToolchain
CentOS 7
On Mon, Sep 27, 2021 at 10:06 PM Benson Muite
wrote:
> Hi Rares,
> What operating system are you using?
> Benson
> On 9/28/21 7:38 AM, Rares Vernica wrote:
> > Hello,
> >
> > I'm still struggling to build Arrow with Parquet. I compiled Thrif
Hello,
I'm still struggling to build Arrow with Parquet. I compiled Thrift myself
but I'm running into dependency issues with Boost.
It looks like the Boost download URL provided in ThirdpartyToolchain.cmake
here
https://github.com/apache/arrow/blob/ef4e92982054fcc723729ab968296d799d3108dd/cpp/cm
a
-rwxr-xr-x. 1 root root 939 Sep 23 23:09 libthriftz.la
drwxr-xr-x. 2 root root 4096 Sep 23 23:09 pkgconfig
How could I make Arrow's cmake to pick them up instead of trying to get
Thrift again?
Thank you!
Rares
On Wed, Sep 22, 2021 at 5:33 PM Rares Vernica wrote:
> Eduardo,
apache.org/jira/browse/THRIFT-2559
> [3]
> https://centos.pkgs.org/7/epel-x86_64/thrift-0.9.1-15.el7.x86_64.rpm.html
> [4] https://github.com/apache/arrow/pull/4558
>
> On Mon, Sep 20, 2021 at 11:56 PM Rares Vernica wrote:
>
> > Hello,
Hello,
I'm compiling the C++ library for Arrow 3.0.0 in CentOS 7. It works fine,
but it breaks if I set ARROW_PARQUET=ON. I stops while trying to build
thrift_ep
> scl enable devtoolset-3 "cmake3 ..
\
-DARROW_PARQUET=ON
Hello,
I'm storing RecordBatch objects in a local cache to improve performance. I
want to keep track of the memory usage to stay within bounds. The arrays
stored in the batch are not nested.
The best way I came up to compute the size of a RecordBatch is:
size_t arrowSize = 0;
pos_buffers,
delta_data->null_count);
On Mon, Aug 2, 2021 at 5:45 PM Antoine Pitrou wrote:
> On Fri, 30 Jul 2021 18:55:33 +0200
> Rares Vernica wrote:
> > Hello,
> >
> > I have a RecordBatch that I read from an IPC file. I need to run a
> > cumulative sum
Hello,
I'm using RecordBatch:;AddColumn to update a RecordBatch. Something like
this:
std::shared_ptr rb;
...
rb = rb->AddColumn(...)
Since AddColumn creates a new RecordBatch, is the memory taken by rb before
assignment being freed as expected.
Thanks!
Rares
Hello,
I have a RecordBatch that I read from an IPC file. I need to run a
cumulative sum on one of the int64 arrays in the batch. I tried to do:
std::shared_ptr pos_data = batch->column_data(nAtts);
auto pos_values = pos_data->GetMutableValues(1);
for (auto i = 1; i < pos_data->length; i++)
pos_v
PS a is an integer array and b is an integer scalar.
On Wed, Jul 28, 2021 at 1:04 PM Rares Vernica wrote:
> Hello,
>
> I'm making use of the Compute Functions to do some basic arithmetic. One
> operation I need to perform is the modulo, i.e., a % b. I'm debating
> b
Hello,
I'm making use of the Compute Functions to do some basic arithmetic. One
operation I need to perform is the modulo, i.e., a % b. I'm debating
between two options:
1. Compute it using the available Compute Functions using a % b = a - a / b
* b, where / is the integer division. I assume that
this could be fixed by adding default values:
>
> struct DayMilliseconds {
> int32_t days = 0;
> int32_t milliseconds = 0;
> ...
> };
>
> In the meantime, you would have to suppress the warning in the
> compiler where it's happening
>
> On Tue, Jul 27, 2
Hello,
I'm getting a handful of warnings when including arrow/builder.h Is this
expected? Should I use the suggested -W flag?
In file included from
/opt/apache-arrow/include/arrow/array/builder_dict.h:29:0,
from /opt/apache-arrow/include/arrow/builder.h:26,
/opt/apache-arrow/inc
Hi,
I'm trying the example in the Compute Functions user guide
https://arrow.apache.org/docs/cpp/compute.html#invoking-functions
std::shared_ptr numbers_array =
...;std::shared_ptr increment = ...;arrow::Datum
incremented_datum;
ARROW_ASSIGN_OR_RAISE(incremented_datum,
arrow
Awesome! We would find C++ versions of these recipes very useful. From our
experience the C++ API is much much harder to deal with and error prone
than the R/Python one.
Cheers,
Rares
On Wed, Jul 7, 2021 at 9:07 AM Alessandro Molina <
alessan...@ursacomputing.com> wrote:
> Yes, that was mostly w
gt; sudo sed -i'' -e 's,bintray.com,jfrog.io/artifactory,'
> /etc/apt/sources.list.d/apache-arrow.sources
> sudo apt install -y -V libarrow-dev
>
>
> Thanks,
> --
> kou
>
> In
> "Re: Xenial 3.0.0 packages | Bintray" on Tue, 6 Jul 2
. We don't provide newer packages for Xenial.
>
>
> Thanks,
> --
> kou
>
> In
> "Xenial 3.0.0 packages | Bintray" on Tue, 6 Jul 2021 13:11:33 -0700,
> Rares Vernica wrote:
>
> > Hello,
> >
> > I realize that newer packages are on jfrog.
Hello,
I realize that newer packages are on jfrog.io Until last week, I was still
able to use bintray.com for Xenial packages of 3.0.0. Today
https://apache.bintray.com/arrow/ returns forbidden. Is this temporary? If
not, are these Xenial packages available somewhere else?
Thank you!
Rares
are trying to use the buffer level compression described
> by the specification? If so only LZ4_FRAME is currently allowed [1]
>
> [1] https://github.com/apache/arrow/blob/master/format/Message.fbs#L45
>
>
> On Tue, Jun 22, 2021 at 12:28 PM Rares Vernica wrote:
>
> > Hell
Hello,
Using Arrow 3.0.0 I tried to compress a stream with LZ4 and got this error
message:
NotImplemented: Streaming compression unsupported with LZ4 raw format. Try
using LZ4 frame format instead.
Is it because LZ4 raw was not enabled when the .so was compiled or actually
not implemented?
Is L
efault compiler for that CentOS version.
>
> Regards
>
> Antoine.
>
>
>
> If there were the package
> > maintainer bandwidth, having both devtoolset-gcc and system-gcc
> > pre-built RPMs would be potentially interesting (but there are so many
> > devtoolsets, whic
gt; Thanks,
> --
> kou
>
> In
> "Re: C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only"
> on Wed, 9 Jun 2021 21:39:04 -0700,
> Rares Vernica wrote:
>
> > I got the apache-arrow-4.0.1 source and compiled it with the Debug flag.
> No
>
rce location on segmentation fault.
>
> Thanks,
> --
> kou
>
> In
> "C++ Segmentation Fault RecordBatchReader::ReadNext in CentOS only" on
> Tue, 8 Jun 2021 12:01:27 -0700,
> Rares Vernica wrote:
>
> > Hello,
> >
> > We recently migrated our C+
Hello,
We recently migrated our C++ Arrow code from 0.16 to 3.0.0. The code works
fine on Ubuntu, but we get a segmentation fault in CentOS while reading
Arrow Record Batch files. We can successfully read the files from Python or
Ubuntu so the files and the writer are fine.
We use Record Batch St
THROW_NOT_OK((result_name).status()); \
> > > lhs = std::move(result_name).ValueUnsafe();
> > > #define ASSIGN_OR_THROW(lhs, rexpr) \
> > > ASSIGN_OR_THROW_IMPL(_maybe ## __COUNTER__, lhs, rexpr)
> > >
> > > Then lines such as
> &g
Hello,
We are trying to migrate from Arrow 0.16.0 to a newer version, hopefully up
to 4.0.0. The Arrow 0.17.0 change in AllocateBuffer from taking a
shared_ptr to returning a unique_ptr is making things very
difficult. We wonder if there is a strong reason behind the change from
shared_ptr to uniq
create the
> RecordBatch (from one thread) which will force the boxed columns to
> materialize.
>
> -Weston
>
> On Thu, May 20, 2021 at 11:40 AM Wes McKinney wrote:
> >
> > Also, is it possible that the field is not an Int64Array?
> >
> > On Wed, May 19, 202
Hello,
Just a clarifying question, when a CompressedOutputStream is used with
RecordBatchStreamWriter, are the composing Arrow arrays compressed
independently or the entire output file is compressed at once?
For example if we use GZIP, is the resulting file a valid GZIP file that we
can uncompres
Is there a better (safer) way of accessing a specific Int64 cell in a
RecordBatch? Currently I'm doing something like this:
std::static_pointer_cast(batch->column(i))->raw_values()[j]
On Wed, May 19, 2021 at 3:09 PM Rares Vernica wrote:
> > /opt/rh/devtoolset-3/root/usr/b
vial caching which
> uses std::atomic_load[1] which is not implemented properly on gcc < 5
> so our behavior is different depending on the compiler version.
>
> [1] https://en.cppreference.com/w/cpp/atomic/atomic_load
>
> On Wed, May 19, 2021 at 10:15 AM Rares Vernica wrote:
>
Hello,
I'm using Arrow for accessing data outside the SciDB database engine. It
generally works fine but we are running into Segmentation Faults in a
corner multi-threaded case. I identified two threads that work on the same
Record Batch. I wonder if there is something internal about RecordBatch
t
", line 97, in pyarrow.lib.check_status
OSError: [Errno 14] Error writing bytes to file. Detail: [errno 14] Bad
address
Cheers,
Rares
On Mon, Dec 14, 2020 at 12:30 AM Antoine Pitrou wrote:
>
> Hello Rares,
>
> Is there a complete reproducer that we may try out?
>
> Re
Hello,
As part of a test, I'm reading a record batch from an Arrow file,
re-batching the data in smaller batches, and writing back the result to the
same file. I'm getting an unexpected Bad address error and I wonder what am
I doing wrong?
reader = pyarrow.open_stream(fn)
tbl = reader.read_all()
Hi Antoine,
On Tue, Nov 17, 2020 at 2:34 AM Antoine Pitrou wrote:
>
> Le 17/11/2020 à 03:34, Rares Vernica a écrit :
> >
> > I'm using an arrow::io::BufferReader and
> > arrow::ipc::RecordBatchStreamReader to read an arrow::RecordBatch from a
> > file. There i
Hello,
I'm using an arrow::io::BufferReader and
arrow::ipc::RecordBatchStreamReader to read an arrow::RecordBatch from a
file. There is only one batc in the file so I do a single
RecordBatchStreamReader::ReadNext call. I store the populated RecordBatch
in memory for reuse (cache). The memory buffe
Hello,
I have a set of integer tuples that need to be collected and sorted at a
coordinator. Here is an example with tuples of length 2:
[(1, 10),
(1, 15),
(2, 10),
(2, 15)]
I am considering storing each column in an Arrow array, e.g., [1, 1, 2, 2]
and [10, 15, 10, 15], and have the Arrow arr
the fact that '1' is one byte
> in Py2.7 and 'foo' is 3 bytes). Try passing an open file handle
> instead
>
> On Tue, Jun 16, 2020 at 11:28 AM Rares Vernica wrote:
> >
> > Thank you for your help in getting to the bottom of this. It seems that
>
Thanks!
Rares
On Mon, Jun 15, 2020 at 10:55 PM Micah Kornfield
wrote:
> Hi Rares,
> This last issue sounds like you are trying to write data from 0.16.0
> version of the library and read it from a pre-0.15.0 version of the python
> library. If you want to do this you need to set "bool
> wri
4, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Expected to read 1886221359 metadata bytes, but
only read 4
On Mon, Jun 15, 2020 at 10:08 PM Wes McKinney wrote:
> On Mon, Jun 15, 2020 at 11:24 PM Rares Vernica wrote:
> >
> > I was able to reproduce my issue in a small,
ional
Section: libdevel
Installed-Size: 38738
Maintainer: Apache Arrow Developers
Architecture: amd64
Multi-Arch: same
Source: apache-arrow
Version: 0.17.1-1
Depends: libarrow17 (= 0.17.1-1)
> g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Does this make sense?
Cheers,
Rares
you should set a breakpoint in this function
> and see if for some reason started_ is true on the first invocation
> (in which case it makes me wonder if there is something
> not-fully-C++11-compliant about your toolchain).
>
> Otherwise I'm a bit stumped since there are lots of
owStream->Close());
On Mon, Jun 15, 2020 at 6:26 AM Wes McKinney wrote:
> Can you show the code you are writing? The first thing the stream writer
> does before writing any record batch is write the schema. It sounds like
> you are using arrow::ipc::WriteRecordBatch somewhere.
>
Hello,
I have a RecordBatch that I would like to write to a file. I'm using
FileOutputStream::Open to open the file and RecordBatchStreamWriter::Open
to open the stream. I write a record batch with WriteRecordBatch. Finally,
I close the RecordBatchWriter and OutputStream.
The resulting file size
hen you run
> ARROW_RETURN_NOT_OK(arrowBatch->Validate())?
>
> On Sun, Jun 14, 2020 at 2:09 PM Rares Vernica wrote:
> >
> > Hello,
> >
> > I'm porting a C++ program from Arrow 0.9.0 to 0.16.0. The *sender* uses
> > BufferOutputStream and RecordBatchWri
Hello,
I'm porting a C++ program from Arrow 0.9.0 to 0.16.0. The *sender* uses
BufferOutputStream and RecordBatchWriter to serialize a set of Arrow
arrays. The *receiver* uses BufferReader and RecordBatchReader to
deserialize them. I get the runtime error *Array length did not match
record batch l
://issues.apache.org/jira/browse/ARROW-921
> some time ago about adding some tools to integration test one version
> versus another to obtain hard proof of this, but this work has not
> been completed yet (any takers?).
>
> Have you encountered any problems?
>
> Thanks,
Hello,
I have a C++ library using Arrow 0.9.0 to serialize data The code looks
like this:
std::shared_ptr arrowBatch;
arrowBatch = arrow::RecordBatch::Make(_arrowSchema, nCells, _arrowArrays);
std::shared_ptr arrowBuffer(new
arrow::PoolBuffer(_arrowPool));
arrow::io::BufferOutputStream arrowStre
gs and Regex issue" on Tue, 11 Dec 2018 22:53:58 -0800,
> Rares Vernica wrote:
>
> > Hi,
> >
> > Unfortunately we need to stay on CentOS 6 for now.
> >
> > We have a locally built libboost-devel-1.54 for CentOS 6 which installs
> in
> > a custom loca
on CentOS 6. Because system Boost
> is old. It's better that you upgrade to CentOS 7.
>
> Thanks,
> --
> kou
>
> In
> "Re: C++ buildings and Regex issue" on Tue, 11 Dec 2018 22:07:20 -0800,
> Rares Vernica wrote:
>
> > Wes,
> >
> &g
n each part of your application, and in the Arrow
> + Parquet libraries
>
> 0.9.0 is over 1000 patches ago. I'd recommend that you try to upgrade
>
> $ git hist apache-arrow-0.9.0..master | wc -l
> 1540
>
> - Wes
> On Tue, Dec 11, 2018 at 10:58 PM Rares Vernica wrote:
>
Hello,
We are using the C++ bindings of Arrow 0.9.0 on our system on CentOS. Once
we load the Arrow library, our regular regex calls (outside of Arrow)
misbehave and trigger some unknown crashes. We are still trying to figure
things out but I was wondering if there are any know issues regarding re
on Linux?
>
> On Fri, Aug 17, 2018 at 11:47 PM, Rares Vernica
> wrote:
> > Hello,
> >
> > I see the latest 0.9.0 version of pyarrow is 0.9.0.post1
> > https://pypi.org/project/pyarrow/0.9.0.post1/#files but I can't convince
> > pip to install it. Do
Hello,
I see the latest 0.9.0 version of pyarrow is 0.9.0.post1
https://pypi.org/project/pyarrow/0.9.0.post1/#files but I can't convince
pip to install it. Do you have any clue on what might be going wrong?
> pip --version
pip 9.0.3 from /usr/lib/python2.7/site-packages (python 2.7)
> pip install
Hi,
The docs suggest that a RecordBatch is a collection of equal-length array
instances. It appears that this is not enforced and one could build a
RecordBatch from arrays of different length. Is this intentional?
Here is an example:
>>> b = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1,
ple.
>
> See https://gist.github.com/alendit/c6cdd1adaf7007786392731152d3b6b9
>
> Cheers,
> Dimitri.
>
> On Tue, Apr 17, 2018 at 3:52 AM, Rares Vernica wrote:
>
> > Hi,
> >
> > I'm writing a batch of records to a stream and I want to read them
> la
Hi,
I'm writing a batch of records to a stream and I want to read them later. I
notice that if I use the RecordBatchStreamWriter class to write them and
then ReadRecordBatch function to read them, I get a Segmentation Fault.
On the other hand, if I use the RecordBatchFileWriter class to write the
ne. We actually should build
> the packages with the system one: https://issues.apache.org/
> jira/browse/ARROW-2383
>
> On Tue, Apr 3, 2018, at 7:06 AM, Rares Vernica wrote:
> > Hi Wes,
> >
> > That did it! Thanks so much for the pointer.
> >
> > BTW, enjoy you
that's
> part of -DARROW_ORC=on
>
> Wes
>
> On Sun, Apr 1, 2018 at 9:48 PM, Rares Vernica wrote:
> > Hello,
> >
> > I'm using libarrow.so to build a simplified SciDB "plugin". The plugin
> gets
> > loaded dynamically into a running SciDB i
Hello,
I'm using libarrow.so to build a simplified SciDB "plugin". The plugin gets
loaded dynamically into a running SciDB instance. The plugin just does
arrow::default_memory_pool(). With Arrow 0.8.0, the plugin gets loaded
successfully and can be used in SciDB. With Arrow 0.9.0, SciDB crashes wh
ill update the issue if I have any
news.
> On Mon, Feb 26, 2018 at 10:22 AM, Rares Vernica
wrote:
>
>> - On the coordinator side, do I really need to read and write a record
>> batch? Could I copy the buffer directly somehow?
>
> No, you don't need to necessarily. The i
Rares Vernica created ARROW-2351:
Summary: [C++] StringBuilder::append(vector...) not
implemented
Key: ARROW-2351
URL: https://issues.apache.org/jira/browse/ARROW-2351
Project: Apache Arrow
Hello,
I am using the C++ API to serialize and centralize data over the network. I
am wondering if I am using the API in an efficient way.
I have multiple nodes and a coordinator communicating over the network. I
do not have fine control over the network communication. Individual nodes
write one
Rares Vernica created ARROW-2203:
Summary: [C++] StderrStream class
Key: ARROW-2203
URL: https://issues.apache.org/jira/browse/ARROW-2203
Project: Apache Arrow
Issue Type: Improvement
Rares Vernica created ARROW-2189:
Summary: [C++] Seg. fault on make_shared
Key: ARROW-2189
URL: https://issues.apache.org/jira/browse/ARROW-2189
Project: Apache Arrow
Issue Type: Bug
Hi,
This might be more a C++ question, but I'm trying to have one variable
store the output stream for both StdoutStream and FileOutputStream. I do
this:
shared_ptr f;
if (fn == "stdout")
f.reset(new StdoutStream());
else
FileOutputStream::Open(fn, false, &f);
As is, the code does not wo
Rares Vernica created ARROW-2179:
Summary: [C++] arrow/util/io-util.h missing from libarrow-dev
Key: ARROW-2179
URL: https://issues.apache.org/jira/browse/ARROW-2179
Project: Apache Arrow
Hi,
If I have multiple RecordBatchStreamReader inputs, what is the recommended
way to get all the RecordBatch from all the inputs together, maybe in a
Table? They all have the same schema. The source for the readers are
different files.
So, I do something like:
reader1 = pa.open_stream('foo')
ta
Rares Vernica created ARROW-1801:
Summary: [Docs] Update install instructions to use red-data-tools
repos
Key: ARROW-1801
URL: https://issues.apache.org/jira/browse/ARROW-1801
Project: Apache Arrow
Rares Vernica created ARROW-1676:
Summary: [C++] Featehr inserts 0 in the beginning and trims one
value at the end
Key: ARROW-1676
URL: https://issues.apache.org/jira/browse/ARROW-1676
Project
Hi,
I have a question about the Array C++ API. BinaryArray has a
raw_value_offsets() public member. Should it also have a raw_vaues() public
member to give a pointer to the start of raw data? Or is this not feasible?
Thanks,
Rares
Hi,
I have a question about chunks in Feather files.
A TableReader can be used to read a Column. For a column, the data is in a
ChunkedArray. For a Feather file, what is the chunk size? Can the chunk
size be modified?
Thanks!
Rares
Rares Vernica created ARROW-1545:
Summary: Int64Builder should not need int64() as arg
Key: ARROW-1545
URL: https://issues.apache.org/jira/browse/ARROW-1545
Project: Apache Arrow
Issue Type
Hi,
During the life of the program, can/should the Buffer or BufferOutputStream
be resused?
If the data in them is no longer needed, can they be reset? Or should I not
worry about this as they get out of scope and just create new instances?
What is the intended pattern?
Thanks!
Rares
Hi,
I am having trouble piping Feather structures between two processes. On the
receiving-process side, I get: pyarrow.lib.ArrowIOError: [Errno 29] Illegal
seek
I have process A and process B which communicate via pipes. Process A sends
the bytes of a Feather structure to process B. Process A co
Rares Vernica created ARROW-1520:
Summary: [Docs] PyArrow docs missing Feather documentation
Key: ARROW-1520
URL: https://issues.apache.org/jira/browse/ARROW-1520
Project: Apache Arrow
Issue
Rares Vernica created ARROW-1512:
Summary: [Docs] NumericArray has no member named 'raw_data'
Key: ARROW-1512
URL: https://issues.apache.org/jira/browse/ARROW-1512
Project: Ap
Rares Vernica created ARROW-1378:
Summary: whl is not a supported wheel on this platform on
Debian/Jessie
Key: ARROW-1378
URL: https://issues.apache.org/jira/browse/ARROW-1378
Project: Apache Arrow
82 matches
Mail list logo