Hi Jark,
thanks for answer. I'm a bit puzzled, because in my yaml I'm using
"connector: filesystem" (not connector.type). I don't think I end up using
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#file-system-connector
- this connector as partitioning and orc format are handled correctly.
It's also not clear for me what is "not legacy" connector for reading
files directly from filesystem (no Hive). I don't see any implementation
of DynamicTableSourceFactory which would do this.
I assumed that using DDL I wrote below also gives me
FileSystemTableFactory, am I wrong?
thanks,
maciek
On 03.12.2020 16:26, Jark Wu wrote:
Only legacy connectors (`connector.type=kafka` instead of
`connector=kafka`) are supported in the YAML at the moment. You can
use regular DDL instead. There is a similar discussion in
https://issues.apache.org/jira/browse/FLINK-20260
<https://issues.apache.org/jira/browse/FLINK-20260> these days.
Best,
Jark
On Thu, 3 Dec 2020 at 00:52, Till Rohrmann <trohrm...@apache.org
<mailto:trohrm...@apache.org>> wrote:
Hi Maciek,
I am pulling in Timo who might help you with this problem.
Cheers,
Till
On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak <m...@touk.pl
<mailto:m...@touk.pl>> wrote:
Hello,
I try to configure SQL Client to query partitioned ORC data on
local
filesystem. I have directory structure like that:
/tmp/table1/startdate=2020-11-28
/tmp/table1/startdate=2020-11-27
etc.
If I run SQL Client session and create table by hand:
create table tst (column1 string, startdate string)
partitioned by
(startdate) with ('connector'='filesystem', 'format'='orc',
'path'='/tmp/table1');
everything runs fine:
explain select * from tst where startdate='2020-11-27'
shows that only one partition in 'readPartitions'
However, I struggle to configure table in .yaml config.
I tried like this (after some struggle, as "partition.keys"
setting
doesn't seem to be documented...) :
tables:
- name: tst2
type: source-table
connector: filesystem
path: "/tmp/table1"
format: orc
partition.keys:
- name: startdate
schema:
- name: column1
data-type: string
- name: startdate
data-type: string
and it more or less works - queries are executed properly.
However,
partitions are not pruned:
explain select * from tst2 where startdate='2020-11-27'
show all partitions in 'readPartitions'
Any idea what can be wrong? I'm using Flink 1.11.2
thanks,
maciek