Re: Text data structures-optimized layout in Arrow

2019-03-02 Thread Edmon Begoli
Hi Micah, In short, we recognize that storing text as arrow is possible and easy if we are to store text as array of bytes representing characters. What we are trying to do is to use arrow as the format/carrier between high performance text processing steps which like to operate on binary data st

Re: java/format: Windows build fails due to no flatc binary available

2019-03-02 Thread Micah Kornfield
I'm not too aware of the issues involved in adding the windows artifact to the POM but if it is easy (and not hacky), I think it would be a good thing to do, if that is what is happening for mac/linux. I wonder why this hasn't been an issue for our CI (maybe we aren't running java tests on window

Re: Text data structures-optimized layout in Arrow

2019-03-02 Thread Micah Kornfield
Hi Edmon, This sound interesting, I'm not aware of any optimized text memory layout beyond our standard string layout. Are there more details about the work you are doing? It is a little bit hard to tell if this is a good fit for Arrow from your description. Thanks, Micah On Sat, Mar 2, 2019 a

[C++] BUILD_WARNING_LEVEL=EVERYTHING?

2019-03-02 Thread Micah Kornfield
As part of trying to fix the mingw C++ build [1], I tried compiling with BUILD_WARNING_LEVEL=EVERYTHING and it seems like it highlights a lot of possible warnings that aren't in CHECKIN. Have we not turned on the additional warnings because there was too much to fix at the time this was added? O

Text data structures-optimized layout in Arrow

2019-03-02 Thread Edmon Begoli
Colleagues: A colleague and I are working on optimized structures for memory and disk layout for raw and pre-processed text using specialized data structures, and with a goal of efficient I/O, inter-process transmissions, and media/memory storage of text-oriented data (e.g. clinical narratives, ra

Re: [C++] Help with windows build failure

2019-03-02 Thread Micah Kornfield
Yeah, I can do that (or at least a close approximation to help the next person). Created ARROW-4745 [1] to track it. [1] https://issues.apache.org/jira/browse/ARROW-4745 On Sat, Mar 2, 2019 at 2:38 PM Wes McKinney wrote: > Would it be possible to round up this information and put it in the > w

[jira] [Created] (ARROW-4745) Document process for replicating static_crt builds on windows

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4745: -- Summary: Document process for replicating static_crt builds on windows Key: ARROW-4745 URL: https://issues.apache.org/jira/browse/ARROW-4745 Project: Apache Arrow

Re: Flaky Travis CI builds on master

2019-03-02 Thread Wes McKinney
I just gave you edit access. If any PMC member would like to be an admin on the Confluence space (and you are not already), please let me know and I'll add you so you can help with the wiki admin requests On Fri, Mar 1, 2019 at 8:09 PM Francois Saint-Jacques wrote: > > Could someone give me writ

Re: [C++] Help with windows build failure

2019-03-02 Thread Wes McKinney
Would it be possible to round up this information and put it in the wiki or under https://github.com/apache/arrow/tree/master/docs/source/cpp somewhere for the next person who needs to debug the static CRT build on Windows? I haven't had to do this personally yet and I can imagine similarly losing

[jira] [Created] (ARROW-4744) [CI][C++] Mingw32 builds failing

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4744: -- Summary: [CI][C++] Mingw32 builds failing Key: ARROW-4744 URL: https://issues.apache.org/jira/browse/ARROW-4744 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-4743) Fix documentation in arrow memory module

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4743: -- Summary: Fix documentation in arrow memory module Key: ARROW-4743 URL: https://issues.apache.org/jira/browse/ARROW-4743 Project: Apache Arrow Issue Type:

Re: java/format: Windows build fails due to no flatc binary available

2019-03-02 Thread Sebastian Piu
Just to clarify, the current pom under java/format is downloading it for linux/osx and fails for windows since there is no pre-packaged artifact or is not attempting to look for the standard binary in the path either from what I could see. I'd be happy to contribute with a fix in the same way that'

Re: java/format: Windows build fails due to no flatc binary available

2019-03-02 Thread Wes McKinney
I would be sort of inclined to expect Java users to have flatc installed on their system rather than try to maintain an automatic download. This project is intended for intermediate to advanced developers; to require a system-level package be installed in addition to the JDK does not seem unreasona

[jira] [Created] (ARROW-4742) [Java] Add checker framework to java build and enable clean run of null analysis

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4742: -- Summary: [Java] Add checker framework to java build and enable clean run of null analysis Key: ARROW-4742 URL: https://issues.apache.org/jira/browse/ARROW-4742 Pr

Re: [Discuss][Java] Codebase Housekeeping?

2019-03-02 Thread Micah Kornfield
I created issues for 1,2 and 4. For #3 (Intellij hints/warnings), I'm not sure there is a good way to enforce this, so a one time cleanup could be helpful, but fear it going stale. Thanks, Micah On Wed, Feb 27, 2019 at 1:16 PM Bryan Cutler wrote: > These all sound good to me Micah, thanks for

[jira] [Created] (ARROW-4741) [Java] Add documentation to all classes and enable checkstyle for class javadocs

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4741: -- Summary: [Java] Add documentation to all classes and enable checkstyle for class javadocs Key: ARROW-4741 URL: https://issues.apache.org/jira/browse/ARROW-4741 Pr

[jira] [Created] (ARROW-4740) [Java] Upgrade to JUnit 5

2019-03-02 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-4740: -- Summary: [Java] Upgrade to JUnit 5 Key: ARROW-4740 URL: https://issues.apache.org/jira/browse/ARROW-4740 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-4739) [Rust] [DataFusion] It should be possible to share a logical plan between threads

2019-03-02 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4739: - Summary: [Rust] [DataFusion] It should be possible to share a logical plan between threads Key: ARROW-4739 URL: https://issues.apache.org/jira/browse/ARROW-4739 Project: Ap

java/format: Windows build fails due to no flatc binary available

2019-03-02 Thread Sebastian Piu
Doing mvn install on arrow/java fails on a windows machine due to no suitable dependency available for flatc in maven central: com.github.icexelloss:flatc-windows-x86_64:exe:1.9.0 in central ( https://repo.maven.apache.org/maven2) The solution from reading the pom is: 1) manually download flatc 1