Re: [ANNOUNCE] New Arrow committer: Weston Pace

2021-07-11 Thread Ying Zhou
Congrats Weston! > On Jul 9, 2021, at 8:47 AM, Wes McKinney wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Weston has accepted an > invitation to become a committer on Apache Arrow. Welcome, and thank you > for your contributions! > > Wes

[Python] ascii_trim bug & documentation

2021-06-30 Thread Ying Zhou
Hi, It seems that pyarrow.compute.ascii_trim can not be used without a TrimOption. However a TrimOption can not be given as a keyword only argument either. This looks like a bug since utf8_trim does not have this problem. Is my understanding correct? Also it seems that there is a lot of Python

Re: [C++] Maximum type code for union types

2021-06-23 Thread Ying Zhou
luding > 127 it's a bug. > > On Sun, Jun 20, 2021 at 3:06 PM Ying Zhou wrote: >> >> Moreover it seems that negative type_codes are banned due to type.cc:622 >> <http://type.cc:622/> . Moreover in type_test.cc <http://type_test.cc/> and >> ar

[Format] Bounded numbers?

2021-06-21 Thread Ying Zhou
Hi, In data people use there are often bounded numbers, mostly integers with clear and fixed upper and lower bounds but also decimals and floats as well e.g. test scores, numerous codes in older databases, max temperature of a city, latitudes, longitudes, numerous IDs etc. I wonder whether we s

[C++] Maximum type code for union types

2021-06-20 Thread Ying Zhou
Hi, Due to the following in builder_union.cc (Line 67-70) type_id_to_children_.resize(union_type.max_type_code() + 1, nullptr); DCHECK_LT( type_id_to_children_.size(), static_cast(UnionType::kMaxTypeCode)); and type.cc (Line 640-644)

Re: [C++] Maximum type code for union types

2021-06-20 Thread Ying Zhou
d be allowed since type_codes are of type int8_t. Is this also intended? > On Jun 20, 2021, at 4:01 PM, Ying Zhou wrote: > > Hi, > > Due to the following in builder_union.cc <http://builder_union.cc/> (Line > 67-70) > > type_id_to_children_.resize(union_t

Re: [ANNOUNCE] New Arrow committer: Dominik Moritz

2021-06-04 Thread Ying Zhou
Congrats Dominik! > On Jun 2, 2021, at 5:19 PM, Wes McKinney wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Dominik has accepted an > invitation to become a committer on Apache Arrow. Welcome, and thank you > for your contributions! > > Wes

Re: [ANNOUNCE] New Arrow PMC member: Benjamin Kietzman

2021-05-07 Thread Ying Zhou
Congrats Ben! > On May 7, 2021, at 5:50 PM, Benjamin Kietzman wrote: > > Thanks, all! > > On Thu, May 6, 2021, 22:23 Fan Liya wrote: > >> Congratulations, Ben! >> >> Best, >> Liya Fan >> >> On Fri, May 7, 2021 at 4:23 AM Bryan Cutler wrote: >> >>> Congrats Ben! >>> >>> On Thu, May 6, 202

Re: [ANNOUNCE] New Arrow committer: Jonathan Keane

2021-04-29 Thread Ying Zhou
Congrats Jonathan! > On Apr 28, 2021, at 5:20 PM, David Li wrote: > > Congrats Jonathan! > > -David > > On Wed, Apr 28, 2021, at 16:55, Jorge Cardoso Leitão wrote: >> Congratulations and thank you for your contributions :) >> >> On Wed, Apr 28, 2021 at 10:37 PM Neal Richardson < >> neal.p.ric

Re: [ANNOUNCE] New Arrow committer: Ian Cook

2021-04-29 Thread Ying Zhou
Congrats Ian! > On Apr 28, 2021, at 7:01 PM, paddy horan wrote: > > Congrats Ian! > > > > From: Jorge Cardoso Leit?o > Sent: Wednesday, April 28, 2021 4:56:12 PM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow committer: Ian Cook > > Congratul

Re: [ANNOUNCE] New Arrow committer: Daniël Heres

2021-04-28 Thread Ying Zhou
Congrats Daniël! > On Apr 28, 2021, at 9:24 AM, Andy Grove wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Daniël has > > accepted an invitation to become a committer on Apache Arrow. > > Welcome, and thank you for your contributions!

Re: [Python] Who has been able to use PyArrow 4.0.0?

2021-04-28 Thread Ying Zhou
rovided by pyarrow and the two might not match. > > On Wed, Apr 28, 2021 at 10:05 AM Ying Zhou wrote: > >> Hi, >> >> It turns out that I haven’t been able to use PyArrow 4.0.0 either in Conda >> environments or python venvs. PyArrow does install using pip. Howev

Re: [Python] Who has been able to use PyArrow 4.0.0?

2021-04-28 Thread Ying Zhou
In case you guys wonder I’m on MacOS 10.15.7. Due to my environment being pretty dirty I didn’t announce it when my verification attempt failed back then. > On Apr 28, 2021, at 4:04 AM, Ying Zhou wrote: > > Hi, > > It turns out that I haven’t been able to use PyArrow 4.0.0

[Python] Who has been able to use PyArrow 4.0.0?

2021-04-28 Thread Ying Zhou
Hi, It turns out that I haven’t been able to use PyArrow 4.0.0 either in Conda environments or python venvs. PyArrow does install using pip. However this is what I get if I ever want to use it: >>> import pyarrow as pa Traceback (most recent call last): File "", line 1, in File "/Users/ka

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-27 Thread Ying Zhou
nEv? > On Apr 27, 2021, at 11:23 PM, Micah Kornfield wrote: > > Oh, nice, I thought they just missed the cutoff. > > On Tue, Apr 27, 2021 at 8:19 PM Ying Zhou wrote: > >> They actually did. >> >> Ying >> >>> On Apr 27, 2021, at 11:11 PM, Mi

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-27 Thread Ying Zhou
They actually did. Ying > On Apr 27, 2021, at 11:11 PM, Micah Kornfield wrote: > > Did the ORC additions actually make it into 4.0? > > On Tue, Apr 27, 2021 at 7:55 PM Ying Zhou wrote: > >> Sure. I just added some info about the ORC writer. I think we need to >&g

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-27 Thread Ying Zhou
Sure. I just added some info about the ORC writer. I think we need to update the documentation in both C++ and Python as well to include ORC. I will do it. Ying > On Apr 27, 2021, at 5:28 PM, Neal Richardson > wrote: > > 4.0 blog post is still pretty bare and could use some help filling in: >

[C++][Python] ORC in pyarrow wheels?

2021-04-19 Thread Ying Zhou
Hi, First of all I’d like to thank Antoine, Micah, Sutou and Uwe for reviewing and improving/helping me improve the Arrow2ORC adapter which Antoine merged into master earlier today! This community is really great! Now that we have the Arrow2ORC adapter ready I found that those who don’t use Co

Re: 4.0 release preparation

2021-04-14 Thread Ying Zhou
ut it might be tight if there is > substantial feedback. > > On Tue, Apr 13, 2021 at 5:29 PM Wes McKinney wrote: > >> I agree it would be good to get the ORC writer in this release. >> >> On Tue, Apr 13, 2021 at 7:20 PM Ying Zhou wrote: >>> >>

Re: 4.0 release preparation

2021-04-13 Thread Ying Zhou
What about this one? You know, the C++/Python ORC writer. https://github.com/apache/arrow/pull/8648 Ying > On Apr 13, 2021, at 12:52 PM, Neal Richardson > wrote: > > I think we're getting close to closing out 4.0. Let's give until the end of > Wedne

Re: [C++] Complex type traits

2021-04-12 Thread Ying Zhou
/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/type_traits.h#L491 > > On Sun, Apr 11, 2021 at 8:56 PM Ying Zhou wrote: > >> Hi, >> >> I would like to have a variant of arrow::enable_if_number good >> for numerical types, boolean as well as Date

[C++] Complex type traits

2021-04-11 Thread Ying Zhou
Hi, I would like to have a variant of arrow::enable_if_number good for numerical types, boolean as well as Date32 but not any other type so that I don’t have to repeat template specializations with essentially the same code. What’s the canonical way to achieve that? Ying

[C++] Obtain shared_ptr of an Array from a reference

2021-03-22 Thread Ying Zhou
Hi, I know this is a very silly question here but I still prefer to see it resolved rather than working on it for a day: How shall I generate an std::shared_ptr from an Array&? Just taking the address and constructing a shared_ptr from the pointer doesn’t work. Ying

[C++] Fastest method to create an Array based on an existing Array with null_bitmap changed?

2021-03-20 Thread Ying Zhou
Hi, I would like to generate an Array using an existing Array with data preserved with the exception of null_bitmap. Shall I use Array::SetData and ArrayData::Make (with dictionary and child_data)? Ying

[C++][Python] Do I have to make arrow::adapters::orc::WriterOptions and arrow::adapters::orc::ReaderOptions immutable?

2021-03-14 Thread Ying Zhou
Hi, I have a question about https://github.com/apache/arrow/pull/9702 (and another future PR) over WriterOptions and ReaderOptions that are basically code copied from the ORC project which then got Arrowized so that the names are acceptable. After go

[Python] Best practices when exposing options

2021-03-12 Thread Ying Zhou
Hi, Currently I’m working on ARROW-11297 https://github.com/mathyingzhou/arrow/tree/ARROW-11297 ) which will be filed as soon as the current PR is merged. I managed to reimplement orc::WriterOptions in Arrow (with naming conventions Arr

[C++][CMake] How to add new .cc & .h files?

2021-03-08 Thread Ying Zhou
Hi, Right now I’m working on ARROW-11297 which adds WriterOptions to the ORCWriter (and will be a separate PR). After adding new files (adapter_options.h & adapter_options.cc ) I found that src/arrow/CMakeFiles/arrow_objlib.dir/adapters/orc/adapter_options.cc.o doesn

[GLib][Ruby] Testing issues

2021-03-06 Thread Ying Zhou
Hi, As work in C++ inevitably affects C GLib and Ruby it is necessary for me to be able to test them locally. I followed instructions here for Macs. Arrow GLib for developers was installed. However I can not run GLib tests with bundle exec test/run-test.sh Looks like there might be some path pr

[C++] Generating random Date64 & Timestamp arrays

2021-03-03 Thread Ying Zhou
Hi, I’d like to generate random Date64 & Timestamp arrays with artificial max and mins. RandomArrayGenerator::ArrayOf in arrow/testing/random.h does not help. Currently the approach I’d like to take is using RandomArrayGenerator::Int64 to generate a random int64 array and then convert it to a d

Re: [VOTE] Allow source-only release vote for patch releases

2021-02-27 Thread Ying Zhou
+1 (non-binding) > On Feb 27, 2021, at 11:19 AM, Neal Richardson > wrote: > > We've had some discussion about ways to reduce the cost of releasing and > ways to allow maintainers of subprojects to make more frequent maintenance > releases. In [1] we proposed allowing maintenance/patch releases

[C++] Breakpoints and VSCode integration

2021-02-25 Thread Ying Zhou
Hi, To facilitate faster debugging I’d like to integrate make unittest debugging into VSCode (on Mac) so that when I run a test that might show some bugs breakpoints can stop the execution so that I can dig around a bit. Does anyone know how that can be done? I know it is a stupid question but

[C++] The best method to pass null from struct to its children & visitors

2021-02-18 Thread Ying Zhou
Hi, Now I’m working on fixing the last concerns on my ORC writer https://github.com/apache/arrow/pull/8648 and have two questions. I have a need to standardize an Arrow Array so that it is fit for cheaper conversion into ORC by making sure that all

Re: [C++] Why are these two tables unequal?

2021-02-10 Thread Ying Zhou
Yup. That doesn’t change anything. I have just pushed this to https://github.com/apache/arrow/pull/8648 . Please take a look. Really thanks! TEST(TestAdapterWriteNested, writeList) { std::shared_ptr table_schema = schema({field("list", list(int32()))

Re: [C++] Why are these two tables unequal?

2021-02-10 Thread Ying Zhou
pe()->ToString()); RecordProperty("input_type", expected_array->type()->ToString()); RecordProperty("array_equality", actual_array->Equals(*expected_array)); } > On Feb 10, 2021, at 12:10 PM, Antoine Pitrou wrote: > > > Hmm, perhaps the t

Re: [C++] Why are these two tables unequal?

2021-02-10 Thread Ying Zhou
, actual_array->Equals(*expected_array)); } > On Feb 10, 2021, at 3:52 AM, Antoine Pitrou wrote: > > > Hi Ying, > > Hmm, yes, this may be related to the null bitmaps, or the offsets. > Can you try to inspect or pretty-print the offsets arrays for the

[C++] Why are these two tables unequal?

2021-02-09 Thread Ying Zhou
Hi, This is an extremely weird phenomenon. There are two 2*1 tables that are supposedly different when I got a confusing error message like this: [ RUN ] TestAdapterWriteNested.writeList /Users/karlkatzen/Documents/code/arrow-dev/arrow/cpp/src/arrow/testing/gtest_util.cc:459: Failure Faile

Re: [C++] RandomArrayGenerator::List bugs

2021-02-07 Thread Ying Zhou
A Jira ticket on this bug has been filed: https://issues.apache.org/jira/browse/ARROW-11548 <https://issues.apache.org/jira/browse/ARROW-11548> > On Feb 7, 2021, at 3:29 PM, Ying Zhou wrote: > > Hi, > > Recently I found a weird bug in RandomArrayGenerator. > >

[C++] RandomArrayGenerator::List bugs

2021-02-07 Thread Ying Zhou
Hi, Recently I found a weird bug in RandomArrayGenerator. RandomArrayGenerator::List consistently produces ListArrays with their length 1 below what they should be according to their documentation. Moreover the bitmaps we have are weird. Here is some simple test: TEST(TestAdapterWriteNested,

Re: Computational Kernels: the project overview

2021-02-05 Thread Ying Zhou
Hi, Speaking of the computational kernels I found that Cast needs significant improvement. Right now it can not cast a FixedSizeBinary array to a Binary one which caused my ORC tests to be unusually long. I plan to significantly expand it within 2 months to include nested types and make ORC (an

[C++] Enhancements to random Array/ChunkedArray/Table generator as a separate PR?

2021-01-31 Thread Ying Zhou
Hi, As a part of the process of reducing test size in this pull request https://github.com/apache/arrow/pull/8648 which contains the ORC writer for C++ and Python I wrote a random chunked array generator and a random table generator. To reduce test s

Re: [C++] Shall we modify the ORC reader?

2021-01-28 Thread Ying Zhou
t 8:45 PM, Deepak Majeti wrote: > > Hi Ying, > > I can help review/merge any ORC C++ contributions. > > > On Thu, Jan 14, 2021 at 6:57 PM Ying Zhou wrote: > >> Well, I haven’t found any. Thankfully ORC does work and I can figure out >> how it works by testin

[C++] Random table generator and table converter

2021-01-27 Thread Ying Zhou
Hi, For the C++ tests for the ORC writer there are two functions I need which can significantly shorten the tests, namely a generic table generator and a table converter. For the former I know there is arrow/testing/random.h which can generate random arrays. Shall I generate random struct arr

Plasma C++ error in Travis CI

2021-01-24 Thread Ying Zhou
Hi, While refactoring my ORC writer so that Antoine and Uwe’s suggestions are implemented I found this weird Travis CI error caused by Plasma. Since Plasma is no longer maintained do we really need to have it in our Travis CI test? Thanks! Ying P.S. The job log is here: https://travis-ci.com/

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-19 Thread Ying Zhou
t;> /private/var/folders/yb/dc13kd1552vc_x61qzpgmtjhgn/T/arrow-3.0.0.X.00Y3BiKD/install/lib/libarrow.300.0.0.dylib >> Reason: image not found > > How did you install your Snappy? > > > Thanks, > -- > kou > > In > "Re: [VOTE] Release Apache Arrow

Re: [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-19 Thread Ying Zhou
There are definitely dependencies issues in at least GLib. I’m going to turn off Glib and see whether other issues exist. + make -j4 /Library/Developer/CommandLineTools/usr/bin/make all-recursive Making all in arrow-glib GEN stamp-enums.h GEN stamp-enums.c touch stamp-enums.c touch

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-19 Thread Ying Zhou
Oh I see. Yup. Thanks, Ying > On Jan 19, 2021, at 7:56 AM, Antoine Pitrou wrote: > > > Plasma is deprecated and unmaintained, I don't think we should hold the > release for that. > > Regards > > Antoine. > > > Le 19/01/2021 à 13:21, Ying Zhou a écr

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-19 Thread Ying Zhou
24] Too many open files > > You need to increase the max number of files you can > open in a process. But I don't know how to do it on macOS... > (We can do it by /etc/security/limits.d/ on Linux.) > > > Thanks, > -- > kou > > > In > "Re: [VO

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-18 Thread Ying Zhou
that way so please open a Jira. I don't > think this is a blocking bug; if you'd like to verify without Plasma > you can disable it in the verification script. > > On Mon, Jan 18, 2021 at 8:46 PM Ying Zhou wrote: >> >> Hi, >> >> Thanks for help

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-18 Thread Ying Zhou
raise RuntimeError("plasma_store exited unexpectedly with "\ "code %d" % (rc,))\ \ yield plasma_store_name, proc\ finally:\ > if proc.poll() is None:\ \f1\b \cf3 E

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-18 Thread Ying Zhou
Thanks! This works. I do need to mention that it is not in master yet. Moreover after the C# test succeeded we failed the python one on Line 376 (from master). Ying + python setup.py build_ext --inplace running build_ext creating /private/var/folders/yb/dc13kd1552vc_x61qzpgmtjhgn/T/arrow-3.

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-15 Thread Ying Zhou
Hi, This is what happens when I’m following the procedure on my macOS 10.15. Is it because of some environmental issue? Is it because the verification failed? Thanks! 100 161M 100 161M0 0 13.3M 0 0:00:12 0:00:12 --:--:-- 9.8M + PATH=/var/folders/yb/dc13kd1552vc_x61qzpgmtjh0

Re: [C++] Shall we modify the ORC reader?

2021-01-14 Thread Ying Zhou
/orc.apache.org/specification/ORCv1/ but those docs don't seem > to explain the intent and constraints of each of the data types. > > Regards > > Antoine. > > > > > On Mon, 11 Jan 2021 21:15:05 -0500 > Ying Zhou wrote: >> Thanks! What about 3?

When will my PR be available in a release?

2021-01-13 Thread Ying Zhou
Hi, I have implemented the ORC writer in C++ and Python here: https://github.com/apache/arrow/pull/8648/ I’d like to know when will it be available in a release so that I can file a related PR in Pandas to use my ORC writer. Since it hasn’t been rev

Re: [C++] Shall we modify the ORC reader?

2021-01-11 Thread Ying Zhou
of users > reading 2gb strings or lists with 2B objects in them. Saying we just don't > support that pattern seems fine for now. I also believe the string and list > types have better cross-language support than the large variants. > > On Sun, Jan 10, 2021 at 8:49 AM Ying Zhou wrot

[C++] Shall we modify the ORC reader?

2021-01-10 Thread Ying Zhou
Hi, While finishing the ORC writer in C++ I found that the ORC reader treats certain types in rather awkward ways. Hence I filed this Jira ticket: https://issues.apache.org/jira/browse/ARROW-7 After starting to work on ORC tickets mostly

[C++][CI] openssl not installed in AMD64 Windows 2019 C++

2021-01-07 Thread Ying Zhou
Hi, Thanks Neal for fixing the numpy blocker (ARROW-11152)! It seems that there is another weird recent dependency error here: https://github.com/apache/arrow/pull/8648/checks?check_run_id=1660537561 Do we know what’s g

Re: Github Actions feedback time

2021-01-06 Thread Ying Zhou
Hi, Sorry for not noticing this thread earlier. Looks like in addition to unusually slow feedback time that did not happen last Sunday or earlier there are also weird installation errors such as ‘can not install numpy’ as well. Can these be due to some form of timeout? Here is my C++ PR: https

[C++] Weird Rust Linter error in CICD & Float/Double equality

2020-12-30 Thread Ying Zhou
Hi, When finalizing my Arrow2ORC C++ pull request I found a weird Rust-related and IPC-related error in the Linter that didn’t happen just 2 days ago despite my code having nothing to do with either Rust or IPC. Here is the check: https://github.com/apache/arrow/pull/8648/checks?check_run_id=16

[C++] Includes and failing checks in Python and C Glib & Ruby

2020-12-18 Thread Ying Zhou
Hi, As I try to finalize this pull request (https://github.com/apache/arrow/pull/8648 ) I found that a single necessary ORC include (liborc::WriterOptions) in arrow/adapters/orc/adapter.h broke one Python check and two C Glib & Ruby checks. Since ther

[C++] Are stream adapters necessary for the Arrow2ORC adapter?

2020-12-12 Thread Ying Zhou
to exclusively use classes in arrow/io to open files given how the Arrow integration with Parquet and ORC2Arrow adapter work it seems that I should wrap arrrow::io::OutputStream in an implementation of orc::OutputStream . Is it one of the right ways to do it? Thanks! Ying Zhou

[C++] Sparse Unions and CICD tests

2020-11-29 Thread Ying Zhou
Hi, Really thanks for the help you guys gave me in the past! Tonight I would like to ask two questions. First of all it seems that in the C++ implementation of sparse unions it is possible to construct a union array of length 8 from two child arrays of length 4 with dense union-like behavior.

Re: [C++] 0x00 in Binary type

2020-11-18 Thread Ying Zhou
s 0 instead of 4 and data[i] are 255, 0, 0 and 0 respectively. My JIRA ID is yingzhou474. > On Nov 18, 2020, at 1:49 PM, Antoine Pitrou wrote: > > > Hello, > > Le 18/11/2020 à 19:06, Ying Zhou a écrit : >> >> According to the documentation BINARY is "Vari

[C++] 0x00 in Binary type

2020-11-18 Thread Ying Zhou
Hello, According to the documentation BINARY is "Variable-length bytes (no guarantee of UTF8-ness)”. However in practice if I embed 0x00 in the middle of a char array and Append it to a BinaryBuilder the 0x00 is converted to 0xff, everything after it is not appended and the length is computed a

Re: [ANNOUNCE] New Arrow committer: Andrew Lamb

2020-11-10 Thread Ying Zhou
Congratulations Andrew! > On Nov 10, 2020, at 10:42 AM, Andy Grove wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Andrew Lamb has > accepted an invitation to become a committer on Apache Arrow. > > Welcome, and thank you for your contributions!

[C++] Type_codes and child_ids for Unions & test time concerns

2020-11-08 Thread Ying Zhou
using type_codes alone. What’s the point of using child_ids? Secondly I would like to ask about the maximum amount of time permitted when running unit tests. I will definitely profile and speed up my tests prior to the pull request so I would like to know about the expectation first. Thanks, Ying

[C++] Arrow debug with ORC & unittest can not be built

2020-10-24 Thread Ying Zhou
Hi, I’m using the master version of Arrow. In order to test my Arrow2ORC feature I got a new copy of Arrow and tried to make it with debug on. It turns out that one ORC dependency, libhdfspp_static.a, can not be found which caused linking of arrow-orc-adapter-test to be impossible. Here is my

Re: [ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-24 Thread Ying Zhou
Congratulations Wes! :) Ying > On Oct 23, 2020, at 7:35 PM, Jacques Nadeau wrote: > > I am pleased to announce that we have a new PMC chair and VP as per our > newly started tradition of rotating the chair once a year. I have resigned > and Wes was duly elected by the PMC and approved unanimous

Re: [C++] AppendValues for numeric types with invalid slots omitted from source

2020-10-20 Thread Ying Zhou
t;> deals specifically with this issue -- I'm not sure if we have a >> ready-made function that will efficiently append the "compressed" >> value efficiently to a builder, but we certianly have all the tools >> you need to do so (e.g. the BitRunReader is helpful h

[C++] AppendValues for numeric types with invalid slots omitted from source

2020-10-18 Thread Ying Zhou
zes which doesn’t seem to be reasonable. May I ask whether this is actually valid usage of AppendValues? Thanks! Best, Ying Zhou

[C++] Arrow to ORC type conversion

2020-10-18 Thread Ying Zhou
iborc::TypeKind::MAP Type::type::DENSE_UNION -> liborc::TypeKind::UNION Type::type::SPARSE_UNION -> liborc::TypeKind::UNION Type::type::DICTIONARY -> the ORC version of its value type There are some concerns particularly related to duration types which don’t exist for Apache ORC which I have to convert to integers. Is my current mapping reasonable? Thanks! Best, Ying Zhou

ORC writer

2020-08-29 Thread Ying Zhou
levant Python/Cython files, right? Moreover I would like to ask whether there is any existing branch with partly finished work on ORC writers. Thanks! Ying Zhou