Re: Partitioned tables in SQL client configuration.

Maciek Próchniak Thu, 03 Dec 2020 08:17:44 -0800

Hi Jark,

thanks for answer. I'm a bit puzzled, because in my yaml I'm using "connector: filesystem" (not connector.type). I don't think I end up using

https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#file-system-connector- this connector as partitioning and orc format are handled correctly.

It's also not clear for me what is "not legacy" connector for readingfiles directly from filesystem (no Hive). I don't see any implementationof DynamicTableSourceFactory which would do this.

I assumed that using DDL I wrote below also gives meFileSystemTableFactory, am I wrong?



thanks,

maciek



On 03.12.2020 16:26, Jark Wu wrote:

Only legacy connectors (`connector.type=kafka` instead of`connector=kafka`) are supported in the YAML at the moment. You canuse regular DDL instead. There is a similar discussion inhttps://issues.apache.org/jira/browse/FLINK-20260<https://issues.apache.org/jira/browse/FLINK-20260> these days.


Best,
Jark

On Thu, 3 Dec 2020 at 00:52, Till Rohrmann <trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:


    Hi Maciek,

    I am pulling in Timo who might help you with this problem.

    Cheers,
    Till

    On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak <m...@touk.pl
    <mailto:m...@touk.pl>> wrote:

        Hello,

        I try to configure SQL Client to query partitioned ORC data on
        local
        filesystem. I have directory structure like that:

        /tmp/table1/startdate=2020-11-28

        /tmp/table1/startdate=2020-11-27

        etc.


        If I run SQL Client session and create table by hand:

        create table tst (column1 string, startdate string)
        partitioned by
        (startdate) with ('connector'='filesystem', 'format'='orc',
        'path'='/tmp/table1');

        everything runs fine:

        explain select * from tst where startdate='2020-11-27'

        shows that only one partition in 'readPartitions'


        However, I struggle to configure table in .yaml config.

        I tried like this (after some struggle, as "partition.keys"
        setting
        doesn't seem to be documented...) :

        tables:
           - name: tst2
             type: source-table
             connector: filesystem
             path: "/tmp/table1"
             format: orc
             partition.keys:
               - name: startdate
             schema:
               - name: column1
                 data-type: string
               - name: startdate
                 data-type: string

        and it more or less works - queries are executed properly.
        However,
        partitions are not pruned:

        explain select * from tst2 where startdate='2020-11-27'

        show all partitions in 'readPartitions'


        Any idea what can be wrong? I'm using Flink 1.11.2


        thanks,

        maciek

Re: Partitioned tables in SQL client configuration.

Reply via email to