GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/1127
[FLINK-2167] [table] Add fromHCat() to TableEnvironment This PR introduces input format interfaces (so-called `TableSource`s) for the Table API. There are two types of TableSources: - `AdaptiveTableSource`s can adapt their output to the requirements of the plan. Although the output schema stays the same, the TableSource can react on field resolution and/or predicates internally and can return adapted DataSet/DataStream versions in the "translate" step. - `StaticTableSource`s are an easy way to provide the Table API with additional input formats without much implementation effort (e.g. for fromCsvFile()) TableSource have been deeply integrated into the Table API. The TableEnvironment now requires the newly introduced `AbstractExecutionEnvironment` (common super class of all ExecutionEnvironments for DataSets and DataStreams). An example of an AdaptiveTableSources can be found in `HCatTableSource`. HCatTableSource supports predicate pushdown as well as selection pushdown to HCatalog. Only those predicates are pushed to HCatalog that are partioned columns. Unresolved fields will not be read from HCatalog and remain `null` within the Table APIs rows. A an easy example looks like: ``` TableEnironment t = new TableEnvironment(env); t.fromHCat("database", "table") .select("col1, col2") .filter("partCol==='5'"); ``` Here's what a TableSource can see from more complicated queries: ``` getTableJava(tableSource1) .filter("a===5 || a===6") .select("a as a4, b as b4, c as c4") .filter("b4===7") .join(getTableJava(tableSource2)) .where("a===a4 && c==='Test' && c4==='Test2'") // Result predicates for tableSource1: // List("a===5 || a===6", "b===7", "c==='Test2'") // Result predicates for tableSource2: // List("c==='Test'") // Result resolved fields for tableSource1 (true = filtering, false=selection): // Set(("a", true), ("a", false), ("b", true), ("b", false), ("c", false), ("c", true)) // Result resolved fields for tableSource2 (true = filtering, false=selection): // Set(("a", true), ("c", true)) ``` HCatTableSource has no tests yet, but I will implement it them soon. First I would be happy about some general feedback. You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink TableApiHcat Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1127 ---- commit f245604caccd8f97c1d6eabf16968dab3aa47572 Author: twalthr <twal...@apache.org> Date: 2015-07-09T09:57:05Z [FLINK-2167] [table] Add fromHCat() to TableEnvironment ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---