GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/5043
[FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource. ## What is the purpose of the change * Adds `OrcRowInputFormat` to read [ORC files](https://orc.apache.org) as `DataSet<Row>`. The input format supports projection and filter push-down. * Adds `OrcTableSource` to read [ORC files](https://orc.apache.org) as a `Table` in a batch Table API or SQL query. The table source supports projection and filter push-down. ## Brief change log * Creates a new module `flink-connectors/flink-orc` * Add `OrcRowInputFormat` * Add `OrcTableSource` * Add tests for input format and table source * Adjust cost model of batch table scans to favor table sources with pushed-down filters over those without pushed-down filters. * Add static method to `RowTypeInfo` to project on fields. * Improve translation of literals in `RexProgramExtractor` ## Verifying this change * `OrcRowInputFormatTest` verifies * Correct configuration of ORC readers. * Results when reading ORC files * Schema evolution support * Computation of split boundaries * `OrcTableSourceTest` verifies * Correct implementation of TableSource interface methods * Correct configuration of `OrcRowInputFormat` for test queries (predicate and filter push-down) * `OrcTableSourceITCase` runs end-to-end tests with SQL queries. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): **yes**, adds a new Maven module `flink-orc` with a dependency on `org.apache.orc/orc-core` - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **yes** - If yes, how is the feature documented? **yes**, documentation for `RowTableSource` was added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink table-ORC Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5043.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5043 ---- commit 2f524dfa0c4f8468691151925a622ba7fee55f0f Author: uybhatti <uybha...@gmail.com> Date: 2017-03-03T22:55:22Z [FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource. commit d80506e3785268f541457a69ade3118c634cf7e7 Author: Fabian Hueske <fhue...@apache.org> Date: 2017-11-13T13:54:54Z [FLINK-2170] [connectors] Add OrcRowInputFormat and OrcTableSource. ---- ---