Re: CPP : arrow symbols.map issue

2020-04-03 Thread Brian Bowman
everything with the gcc < 5 ABI otherwise (if you want the new gcc ABI) ensure that the machine where you deploy has libstd++ for gcc 7. Using Redhat's devtoolset toolchain also an option. On Fri, Apr 3, 2020, 10:01 AM Brian Bowman wrote: > Antoine/Wes, &g

Re: CPP : arrow symbols.map issue

2020-04-03 Thread Brian Bowman
, ld 2.20.51.0.2-5.43.el6 Best, -Brian On 4/2/20, 1:22 PM, "Wes McKinney" wrote: EXTERNAL On Thu, Apr 2, 2020 at 12:06 PM Antoine Pitrou wrote: > > > Hi, > > On Thu, 2 Apr 2020 16:56:06 + > Brian Bowman wrote: >

CPP : arrow symbols.map issue

2020-04-02 Thread Brian Bowman
A new high-performance file system we are working with returns an error while writing a .parquet file. The following arrow symbol does not resolve properly and the error is masked. libparquet.so: undefined symbol: _ZNK5arrow6Status8ToStringB5cxx11Ev > nm libarrow.so* | grep -i ZNK5ar

Re: Workaround for Thrift download ERRORs

2019-07-15 Thread Brian Bowman
ey" wrote: EXTERNAL hi Brian, Can you please open a JIRA issue? Does running the "get_apache_mirror.py" script work for you by itself? $ python cpp/build-support/get_apache_mirror.py https://www-eu.apache.org/dist/ - Wes

Workaround for Thrift download ERRORs

2019-07-15 Thread Brian Bowman
Is there a workaround for the following error? requests.exceptions.SSLError: hostname 'www.apache.org' doesn't match either of '*.openoffice.org', 'openoffice.org'/thrift/0.12.0/thrift-0.12.0.tar.gz I’ve inflated apache-arrow-0.14.0.tar and the thrift-0.12.0.tar.gz is not being found curing cma

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-26 Thread Brian Bowman
made on a short time horizon. Collecting feedback and building consensus (if it is even possible) with stakeholders would take some time. The appropriate place to have the discussion is here on the mailing list, though Thanks On Mon, Apr 8, 2019 at 1:37 PM Bria

Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format

2019-04-26 Thread Brian Bowman
Can non-Arrow PMC members/committers vote? If so, +1 -Brian On 4/25/19, 4:34 PM, "Wes McKinney" wrote: EXTERNAL In a recent mailing list discussion [1] Micah Kornfield has proposed to add new list and variable-size binary and unicode types to the Arrow columnar format

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-08 Thread Brian Bowman
ou expect good compression rates. > > On Fri, Apr 5, 2019 at 11:29 AM Brian Bowman wrote: > > > My hope is that these large ByteArray values will encode/compress to a > > fraction of their original size. FWIW, cpp/src/parquet/ > > column_w

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Brian Bowman
the end, it will be a lot more overhead, plus the work to make it possible. I think you'd be much better of compressing before storing in Parquet if you expect good compression rates. On Fri, Apr 5, 2019 at 11:29 AM Brian Bowman mailto:brian.bow...@sas.com>> wrote: My hope is that t

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Brian Bowman
Stream* out) -Brian On 4/5/19, 1:32 PM, "Ryan Blue" mailto:rb...@netflix.com.INVALID>> wrote: EXTERNAL Hi Brian, This seems like something we should allow. What imposes the current limit? Is it in the thrift format, or just the implementations? On Fri,

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Brian Bowman
: EXTERNAL Hi Brian, This seems like something we should allow. What imposes the current limit? Is it in the thrift format, or just the implementations? On Fri, Apr 5, 2019 at 10:23 AM Brian Bowman wrote: > All, > > SAS requires support fo

Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Brian Bowman
All, SAS requires support for storing varying-length character and binary blobs with a 2^64 max length in Parquet. Currently, the ByteArray len field is a unint32_t. Looks this the will require incrementing the Parquet file format version and changing ByteArray len to uint64_t. Have there

Re: Passing File Descriptors in the Low-Level API

2019-03-16 Thread Brian Bowman
bmit a PR to add the requisite APIs that you need for your application. Antoine or I or others should be able to give prompt feedback since we know this code pretty well. Thanks Wes On Sat, Mar 16, 2019 at 11:40 AM Brian Bowman wrote: > > Hi Wes, >

Re: Passing File Descriptors in the Low-Level API

2019-03-16 Thread Brian Bowman
c/arrow/io/file.cc#L476 - Wes On Thu, Mar 14, 2019 at 1:47 PM Brian Bowman wrote: > > The ReadableFile class (arrow/io/file.cc) has utility methods where a FileDescriptor is either passed in or returned, but I don’t see how this surfaces through the API. &g

Re: (Ab)using parquet files on S3 storage for a huge logging database

2018-09-19 Thread Brian Bowman
e. Thanks, - Paul On Wednesday, September 19, 2018, 12:30:36 PM PDT, Brian Bowman wrote: Gerlando, AFAIK Parquet does not yet support indexing. I believe it does store min/max values at the row batch (or maybe it's page) level which may hel

Re: (Ab)using parquet files on S3 storage for a huge logging database

2018-09-19 Thread Brian Bowman
Gerlando, AFAIK Parquet does not yet support indexing. I believe it does store min/max values at the row batch (or maybe it's page) level which may help eliminate large "swaths" of data depending on how actual data values corresponding to a search predicate are distributed across large Parquet

Re: (Ab)using parquet files on S3 storage for a huge logging database

2018-09-19 Thread Brian Bowman
Gerlando is correct that S3 Objects, once created are immutable. They cannot updated-in-place, appended to, nor even renamed. However, S3 supports seeking to offsets within the object being read. The challenge is knowing where to read within the S3 object, which to perform well will require

Re: IDE Prefs for Arrow/Parquet C++ development/debugging

2018-09-02 Thread Brian Bowman
che Impala has a guide for using CLion with Impala so that's a reasonable starting point https://cwiki.apache.org/confluence/display/IMPALA/IntelliJ+and+CLion+Setup+for+Impala+Development - Wes On Sun, Sep 2, 2018 at 4:49 PM Brian Bowman mailto:brian.bow...@sas.com>> wrote: Commun

IDE Prefs for Arrow/Parquet C++ development/debugging

2018-09-02 Thread Brian Bowman
Community, I’m curious what your preferred IDEs are for Arrow/Parquet C++ work? Is anyone using CLion from Jetbrains? Thanks, Brian

Re: arrow-glib 0.10.0

2018-08-23 Thread Brian Bowman
Parquet data to Arrow data and writing Arrow data as Parquet data for now. Thanks, -- kou In <5beed7f6-78f9-474e-8aa3-e40d02e5b...@sas.com> "arrow-glib 0.10.0" on Wed, 22 Aug 2018 20:17:40 +, Brian Bowman wrote: > I hope th

Re: arrow-glib 0.10.0

2018-08-22 Thread Brian Bowman
Thanks Wes, Just discovered that! -Brian On 8/22/18, 5:20 PM, "Wes McKinney" wrote: EXTERNAL Hi Brian The C GLib library is a wrapper for the C++ library, so it's the same code executing under the hood. Wes On Wed, Aug 22, 2018

arrow-glib 0.10.0

2018-08-22 Thread Brian Bowman
I hope this is not too naïve a question. Is arrow-glib 0.10.0 as capable/robust as the Arrow C++ library, especially with regarding to reading and ultimately writing the parquet file format? Thanks, Brian

Re: Parquet Build issues

2018-08-20 Thread Brian Bowman
4.1 21768264 Aug 20 18:17 build/latest/libparquet.so.1.4.1 Best, Brian On 8/17/18, 1:16 PM, "Brian Bowman" wrote: Thanks for the quick reply Wes! Indeed, I need to set up a fresh Linux system with the correct tooling

Re: Parquet Build issues

2018-08-17 Thread Brian Bowman
x27;re using GNU make. Can you detail your build environment / OS? - Wes On Fri, Aug 17, 2018 at 9:54 AM, Brian Bowman wrote: > All, > > It’s been 2-3 years since I joined this email list and I’ve not contributed yet. I’ve just begun working with Parquet/Arr

Parquet Build issues

2018-08-17 Thread Brian Bowman
Please point me to foundational reading on these tools if that’s what I’m missing. Thanks, Brian Brian Bowman Principal Software Developer Analytic Server R&D SAS Institute Inc. brian.bow...@sas.com ___ [ 14%] No update step for 'zstd_ep' [ 16%] No patch ste

Re: Arrow-Parquet integration location (Was: Arrow cpp travis-ci build broken)

2016-09-06 Thread Brian Bowman
. This is likely true for other storage layer providers as well. Brian Bowman (SAS) > On Sep 6, 2016, at 7:52 PM, Julien Le Dem wrote: > > Thanks Wes, > No worries, I know you are on top of those things. > On a side note, I was wondering if the arrow-parquet integration should