Text data structures-optimized layout in Arrow

2019-03-02 Thread Edmon Begoli
Colleagues: A colleague and I are working on optimized structures for memory and disk layout for raw and pre-processed text using specialized data structures, and with a goal of efficient I/O, inter-process transmissions, and media/memory storage of text-oriented data (e.g. clinical narratives, ra

Re: Text data structures-optimized layout in Arrow

2019-03-02 Thread Edmon Begoli
o tell if this is a good fit for > Arrow from your description. > > Thanks, > Micah > > On Sat, Mar 2, 2019 at 7:39 PM Edmon Begoli wrote: > > > Colleagues: > > > > A colleague and I are working on optimized structures for memory and disk > > layout fo

Re: Text data structures-optimized layout in Arrow

2019-03-03 Thread Edmon Begoli
" directory (either > cpp/src/arrow/contrib or cpp/contrib) for new things where we aren't > sure what is to become of the code. > > - Wes > > On Sat, Mar 2, 2019 at 10:33 PM Edmon Begoli wrote: > > > > Hi Micah, > > > > In short, we recogniz

Re: [C++] Failing constructors and internal state

2019-03-10 Thread Edmon Begoli
Do you guys have an example somewhere of this validated vs. unvalidated code, and suspected performance impacts, and has anyone benchmarked any of this? On Sun, Mar 10, 2019 at 5:45 PM Wes McKinney wrote: > I think having consistent methods for both validated and unvalidated > construction is

Re: [C++] Failing constructors and internal state

2019-03-11 Thread Edmon Begoli
curity re: unit testing > edge cases. > > By the way, we need more help with systematic and automated > benchmarking so we can use commit-by-commit numbers to assist in our > decision making. > > - Wes > > On Sun, Mar 10, 2019 at 6:29 PM Edmon Begoli wrote: >

Intel CPU architecture

2016-03-02 Thread Edmon Begoli
Hey folks, How could I get more details on what and how Arrow uses Intel CPUs for whatever computational advantage? At JICS, we run very large experimental Intel HPC systems, and I would like to learn how can we possibly run some interesting Arrow on Intel CPUs experiments. Thank you, Edmon

Re: Intel CPU architecture

2016-03-02 Thread Edmon Begoli
rimentation tools available > for users to run on their hardware would also be great. > > best, > Wes > > On Wed, Mar 2, 2016 at 10:21 AM, Edmon Begoli > wrote: > > Hey folks, > > > > How could I get more details on what and how Arrow uses Intel CPUs for > >

Re: Intel CPU architecture

2016-03-02 Thread Edmon Begoli
little bit harder to access. It will soon be replaced with Summit anyway. On Wednesday, March 2, 2016, Venkat Krishnamurthy wrote: > Is JICS the Joint Institute for Comp Sciences at ORNL/UT? If so, is one of > the target platforms Titan@ORNL? > > On Wed, Mar 2, 2016 at 2:56 PM,

Re: I setup a slack team to have a live channel to discuss Arrow

2016-03-11 Thread Edmon Begoli
send me an invite to ebeg...@gmail.com please. On Fri, Mar 11, 2016 at 2:48 PM, Jacques Nadeau wrote: > I added a bunch of company domains but if you aren't in one of those and > want an invite, just let me know and I'll add you. > > http://apachearrow.slack.com > > thanks, > Jacques >

Roadmap

2016-03-11 Thread Edmon Begoli
Is there a development/feature roadmap yet for Arrow releases? If so, can we put it on Github/project wiki?

Architectural similarity of Arrow and Parquet

2016-04-13 Thread Edmon Begoli
I am writing a research paper and making references to Arrow as it relates to future developments of efficient data placement structures. Can someone please comment on how similar or related are Arrow and Parquet architecturally. Thanks in advance, Edmon

Re: Architectural similarity of Arrow and Parquet

2016-04-13 Thread Edmon Begoli
the list > ( > http://mail-archives.apache.org/mod_mbox/arrow-dev/201602.mbox/%3CCAFy6k10JGE5nuPHnkXp24jg5tOxw%3DaeFZnqG%2Bv7bwQG6zRtVcw%40mail.gmail.com%3E > ) > > Cheers, > Micah > > On Wed, Apr 13, 2016 at 10:29 AM, Edmon Begoli wrote: > > I am writing a research pap

Arrow format, Rust implementation

2016-05-28 Thread Edmon Begoli
Two questions: 1) Has anyone shown interest to implement a support for Arrow in Rust? 2) Is there a complete and usable description of the Arrow format somewhere? Is this it: https://github.com/apache/arrow/blob/master/format/Layout.md Thank you, Edmon

Re: IO considerations for PyArrow

2016-06-03 Thread Edmon Begoli
Let me throw a thought - what about looking to support access to different systems (including Alluxio) through a common POSIX interface such as FUSE? Will there be a significant performance impact or a loss of control of the layout? On Fri, Jun 3, 2016 at 9:26 AM, Uwe Korn wrote: > Hello, > > I

A paper of interest - transactional data structures

2016-06-24 Thread Edmon Begoli
Dear colleagues, This paper on transactional data structures might be of your interest. Although Arrow at a lower level (layout) and not necessarily intended for transactional processing, a topic of concurrent data structures in a multi-core setting might still be of interest: https://classes.soe

[jira] [Created] (ARROW-4753) Support optionally, and as an extension, an encoding layout for text-optimized data structures

2019-03-03 Thread Edmon Begoli (JIRA)
Edmon Begoli created ARROW-4753: --- Summary: Support optionally, and as an extension, an encoding layout for text-optimized data structures Key: ARROW-4753 URL: https://issues.apache.org/jira/browse/ARROW-4753