[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318393#comment-17318393
 ] 

Julian Hyde edited comment on CALCITE-2040 at 4/10/21, 12:32 AM:
-----------------------------------------------------------------

As [~mmior] [pointed out on 
dev@calcite|https://lists.apache.org/thread.html/r56003ae9392e9b759f46a5d94b7571a887a38712134753f7c9b33514%40%3Cdev.calcite.apache.org%3E],
 [PR 2133|https://github.com/apache/calcite/pull/2133] is ready for review.

I plan to fix it up so that it builds and runs in CI (except in AppVeyor, due 
to issues noted in [Arrow/Gandiva dependency management in 
Java|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E].

Assigning to myself, since I am reviewing and fixing up. My dev branch will be 
[julianhyde/2040-arrow|https://github.com/julianhyde/calcite/tree/2040-arrow].


was (Author: julianhyde):
As [~mmior] [pointed out on 
dev@calcite|https://lists.apache.org/thread.html/r56003ae9392e9b759f46a5d94b7571a887a38712134753f7c9b33514%40%3Cdev.calcite.apache.org%3E],
 [PR 2133|https://github.com/apache/calcite/pull/2133] is ready for review.

I plan to fix it up so that it builds and runs in CI (except in AppVeyor, due 
to issues noted in [Arrow/Gandiva dependency management in 
Java|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E].

> Create adapter for Apache Arrow
> -------------------------------
>
>                 Key: CALCITE-2040
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2040
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Assignee: Michael Mior
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to