Re: [DISCUSS] Semantics of extension types

2023-12-14 Thread Jin Shang
I'm in favor of Antoine's proposal of storage equivalence traits[1]. For the sake of clarity I'll paste it here: I would suggest we perhaps need a more general semantic description of > storage type equivalence. > Draft: > class ExtensionType { > public: > // Storage equivalence for equality testi

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-07 Thread Jin Shang
Congrats! On Fri, Dec 8, 2023 at 12:23 AM Ian Cook wrote: > Congratulations Felipe!!! > > On Thu, Dec 7, 2023 at 10:43 AM Benjamin Kietzman > wrote: > > > > On behalf of the Arrow PMC, I'm happy to announce that Felipe Oliveira > > Carvalho > > has accepted an invitation to become a committer o

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-22 Thread Jin Shang
Congrats! On Mon, Oct 23, 2023 at 1:35 PM vin jake wrote: > Congrats Xuwei! > > On Mon, Oct 23, 2023 at 11:28 AM Sutou Kouhei wrote: > > > On behalf of the Arrow PMC, I'm happy to announce that Xuwei Fu > > has accepted an invitation to become a committer on Apache > > Arrow. Welcome, and thank

Re: Apache Arrow file format

2023-10-19 Thread Jin Shang
Honestly I don't understand why this VLDB paper [1] chooses to include Feather in their evaluations. This paper studies OLAP DBMS file formats. Feather is clearly not optimized for the workload and performs badly in most of their benchmarks. This paper also has several inaccurate or outdated claims

Re: [DISCUSS][Gandiva] External function registry proposal

2023-09-26 Thread Jin Shang
I agree with Antoine that we don't need to define a JSON format or a directory structure for Gandiva. To support external functions, we essentially need two things: 1. Gandiva's function registry needs to be aware of the function metadata: We can achieve this by having a `FunctionRegistry::AddFunct

Re: [MATLAB] Using GitHub Projects for Project Planning

2023-08-22 Thread Jin Shang
Hi, I notice that this project can be seen directly from Apache's github page[1], with no indication of Arrow. It seems like the Github Project is organization level v.s. repo level. I fear the naming may cause confusion for people from other Apache projects. [1] https://github.com/orgs/apache/pr

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Jin Shang
std::string_view v) { > > > > uint8_t hash[32]; > > sha256(v, hash); > > > > memcpy(*out++, hash, 32); > > > > return arrow::Status::OK(); > > } > > > > uint8_t ** out; > > }; > > > > arrow::Status Sha256Func(cp::Ker

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Jin Shang
Hi Wenbo, I'd like to known what's the *three* `buffers` are in ArraySpan. What are > `1` means when `GetValues` called? The meaning of buffers in an ArraySpan depends on the layout of its data type. FixedSizeBinary is a fixed-size primitive type, so it has two buffers, one validity buffer and on

Re: Turn a vector of Scalar to an Array/ArrayData of the same datatype

2023-06-15 Thread Jin Shang
Hi Li, I've faced this issue before, and I ended up using a generic ArrayBuilder, for example: ```cpp auto type = int32(); std::vector> scalars = {MakeScalar(1), MakeScalar(2)}; ARROW_ASSIGN_OR_RAISE(std::unique_ptr builder, MakeBuilder(type)); ARROW_RETURN_NOT_OK(builder->AppendScalars(scalars)

Re: [ANNOUNCE] New Arrow committer: Gang Wu

2023-05-15 Thread Jin Shang
Congrats! On Tue, May 16, 2023 at 9:51 AM Chao Sun wrote: > Congrats Gang! > > On Mon, May 15, 2023 at 6:08 PM Jacob Wujciak > wrote: > > > > Congrats! > > > > On Mon, May 15, 2023 at 6:53 PM Andrew Lamb > wrote: > > > > > Congratulations! > > > > > > On Mon, May 15, 2023 at 10:00 AM Matthew T

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Jin Shang
Congrats Will! > 2023年3月14日 11:06,Vibhatha Abeykoon 写道: > > Congratulations Will. > > On Tue, Mar 14, 2023 at 6:53 AM Gang Wu wrote: > >> Congrats, Will! >> >> Best, >> Gang >> >> On Tue, Mar 14, 2023 at 9:21 AM Junming Chen >> wrote: >> >>> Congrats, Will!😄 >>> __

Re: [VOTE] Release Apache Arrow nanoarrow 0.1.0 - RC1

2023-03-01 Thread Jin Shang
+1 (non-binding). Verified on macOS 12.5 aarch64 and Ubuntu 22.04 aarch64. Dependencies were installed via homebrew and apt. Everything went smoothly. On Thu, Mar 2, 2023 at 1:04 AM Dewey Dunnington wrote: > Hello, > > I would like to propose the following release candidate (RC1) of Apache > Arr

Re: Array::GetValue ?

2022-11-14 Thread Jin Shang
Hi John, In addition to Micah’s reply, does the member method Value(int64_t i)[1][2][3] satisfy your need? It is defined for all array types with a primitive value representation, i.e. all primitive arrays and binary arrays. [1] https://github.com/js8544/arrow/blob/master/cpp/src/arrow/array/a

Re: Parser for expressions

2022-10-09 Thread Jin Shang
xpressions. > > Sasha > >> 6 окт. 2022 г., в 22:20, Jin Shang написал(а): >> >> Hi Sasha and Weston, >> >> I'm the author of the mentioned Gandiva parser. I agree that having one >> unified syntax is ideal. I think one critical divergence be

Re: Parser for expressions

2022-10-06 Thread Jin Shang
Hi Sasha and Weston, I'm the author of the mentioned Gandiva parser. I agree that having one unified syntax is ideal. I think one critical divergence between Sasha's and my proposals is that mine is with C++/Python imperative style (foo(x, y, z), a+b…) and Sasha's is with Lisp functional style ((f

Re: nightly job failures

2022-09-26 Thread Jin Shang
Verify release candidate on macOS: https://github.com/ursacomputing/crossbow/actions/runs/3125257793/jobs/5069437237#step:7:2844 It’s the same issue as: https://github.com/apache/arrow/pull/14187#iss

Re: [C++][Gandiva] Proposal to Add A Parser Frontend for Gandiva

2022-09-19 Thread Jin Shang
r questions! Best regards, Jin > [1] https://www.postgresql.org/docs/current/sql-expressions.html > > On Sun, Sep 18, 2022 at 9:12 AM Antoine Pitrou wrote: > >> >> Hello, >> >> I would add that Gandiva does not seem to have a lot of active >> m

[C++][Gandiva] Proposal to Add A Parser Frontend for Gandiva

2022-09-16 Thread Jin Shang
/compare/master...js8544:arrow:jinshang/gandiva/type_inference The main files are: 1. cpp/src/gandiva/grammar.yy: grammar rules for Bison. 2. cpp/src/gandiva/lex.ll: lex rules for Flex. 3. cpp/src/gandiva/typeinference.h/cc: type inference procedure. 4. cpp/src/gandiva/parser.cc: the driver class that combines the three components. 5. cpp/src/gandiva/parser_test.cc: unit tests containing examples of the proposed syntax and the result expression trees the parser generates. You can run the tests by running cmake .. --preset=ninja-debug-gandiva and ninja test-gandiva-tests. Any suggestion/question is appreciated! Best regards, Jin Shang