GitHub user twalthr opened a pull request:

    https://github.com/apache/flink/pull/1127

    [FLINK-2167] [table] Add fromHCat() to TableEnvironment

    This PR introduces input format interfaces (so-called `TableSource`s) for 
the Table API. There are two types of TableSources:
    
    - `AdaptiveTableSource`s can adapt their output to the requirements of the 
plan. Although the output schema stays the same, the TableSource can react on 
field resolution and/or predicates internally and can return adapted 
DataSet/DataStream versions in the "translate" step.
    - `StaticTableSource`s are an easy way to provide the Table API with 
additional input formats without much implementation effort (e.g. for 
fromCsvFile())
    
    TableSource have been deeply integrated into the Table API. 
    
    The TableEnvironment now requires the newly introduced 
`AbstractExecutionEnvironment` (common super class of all ExecutionEnvironments 
for DataSets and DataStreams).
    
    An example of an AdaptiveTableSources can be found in `HCatTableSource`. 
HCatTableSource supports predicate pushdown as well as selection pushdown to 
HCatalog. Only those predicates are pushed to HCatalog that are partioned 
columns. Unresolved fields will not be read from HCatalog and remain `null` 
within the Table APIs rows.
    
    A an easy example looks like:
    ```
    TableEnironment t = new TableEnvironment(env);
    t.fromHCat("database", "table")
      .select("col1, col2")
      .filter("partCol==='5'");
    ```
    
    Here's what a TableSource can see from more complicated queries:
    
    ```
    getTableJava(tableSource1)
      .filter("a===5 || a===6")
      .select("a as a4, b as b4, c as c4")
      .filter("b4===7")
      .join(getTableJava(tableSource2))
      .where("a===a4 && c==='Test' && c4==='Test2'")
    
    // Result predicates for tableSource1:
    //  List("a===5 || a===6", "b===7", "c==='Test2'")
    // Result predicates for tableSource2:
    //  List("c==='Test'")
    // Result resolved fields for tableSource1 (true = filtering, 
false=selection):
    //  Set(("a", true), ("a", false), ("b", true), ("b", false), ("c", false), 
("c", true))
    // Result resolved fields for tableSource2 (true = filtering, 
false=selection):
    //  Set(("a", true), ("c", true))
    ```
    
    
    HCatTableSource has no tests yet, but I will implement it them soon. First 
I would be happy about some general feedback.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twalthr/flink TableApiHcat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1127
    
----
commit f245604caccd8f97c1d6eabf16968dab3aa47572
Author: twalthr <twal...@apache.org>
Date:   2015-07-09T09:57:05Z

    [FLINK-2167] [table] Add fromHCat() to TableEnvironment

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to