Re: Table API throws "No FileSystem for scheme: file" when loading local parquet

2020-07-11 Thread Leonard Xu
Hi, Leon The exception comes from Hadoop side, looks like you missed some Hadoop dependencies. Hadoop is needed for Parquet, compared to add Hadoop-related dependencies directly, it’s recommended to set HADOOP_CLASSPATH or use flink shaded hadoop uber jar[1]. Best, Leonard Xu [1]https://ci

Re: Flink 1.11 Table API cannot process Avro

2020-07-11 Thread Leonard Xu
Hi, Jiang > > jobmanager_1 | Available factory identifiers are: > jobmanager_1 | > jobmanager_1 | csv > jobmanager_1 | json > jobmanager_1 | parquet After added the flink-avro dependency, did you restart your cluster/sql-client? It looks flink-avro dependency did not l

Re: Flink 1.11 Table API cannot process Avro

2020-07-11 Thread 方盛凯
It seems that you don't add additional dependencies. org.apache.avro avro 1.8.2 Lian Jiang 于2020年7月12日周日 下午1:08写道: > i am using flink playground as the base: > > https://github.com/apache/flink-playgrounds/blob/master/table-walkthrough/pom.xml > > I observed "PhysicalLegacyTableSo

Re: Flink 1.11 Table API cannot process Avro

2020-07-11 Thread Lian Jiang
i am using flink playground as the base: https://github.com/apache/flink-playgrounds/blob/master/table-walkthrough/pom.xml I observed "PhysicalLegacyTableSourceScan". Not sure whether this is related. Thanks. Regards! On Sat, Jul 11, 2020 at 3:43 PM Lian Jiang wrote: > Thanks Jörn! > > I added

Re: Flink 1.11 Table API cannot process Avro

2020-07-11 Thread Lian Jiang
Thanks Jörn! I added the documented dependency in my pom.xml file: org.apache.flink flink-avro 1.11.0 The newly generated jar does have: $ jar tf target//spend-report-1.0.0.jar | grep FileSystemFormatFactory org/apache/flink/formats/parquet/ParquetFileSystemFormatFactory.class org/apache

Re: Savepoint fails due to RocksDB 2GiB limit

2020-07-11 Thread Rafi Aroch
Hi Ori, In your code, are you using the process() API? .process(new MyProcessWindowFunction()); if you do, the ProcessWindowFunction is getting as argument an Iterable with ALL elements collected along the session. This will make the state per key potentially huge (like you're experiencing). As

RE: History Server Not Showing Any Jobs - File Not Found?

2020-07-11 Thread Hailu, Andreas
Thanks for the clarity. To this point you made: (Note that by configuring "historyserver.web.tmpdir" to some permanent directory subsequent (re)starts of the HistorySserver can re-use this directory; so you only have to download things once) The HistoryServer process in fact deletes this local c

Re: map JSON to scala case class & off-heap optimization

2020-07-11 Thread Georg Heiler
Hi, Many thanks. So do I understand correctly that: 1) similarly to spark the Table API works on some optimized binary representation 2) this is only available in the SQL way of interaction - there is no programmatic API This leads me then to some questions: q1) I have read somewhere (I think

Re: Avro from avrohugger still invalid

2020-07-11 Thread Georg Heiler
Hi, Many thanks for the PR! However, I have tried to build an updated version of Flink and still run into issues. https://issues.apache.org/jira/browse/FLINK-18478 I have documented in a GIST (see the last comment) how to replicate it. Best, Georg Am Fr., 3. Juli 2020 um 12:00 Uhr schrieb Aljos

Re: Timeout when using RockDB to handle large state in a stream app

2020-07-11 Thread Congxian Qiu
Hi Felipe I'm just wandering does increase the heartbeat.timeout with RocksDBStateBackend works for you. If not, what does the GC log say? thanks. Best, Congxian Felipe Gutierrez 于2020年7月7日周二 下午10:02写道: > I figured out that for my stream job the best was just to use the > default MemoryStateB

Re: Flink 1.11 Table API cannot process Avro

2020-07-11 Thread Jörn Franke
You are missing additional dependencies https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/formats/avro.html > Am 11.07.2020 um 04:16 schrieb Lian Jiang : > >  > Hi, > > According to > https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/conne

Re: Savepoint fails due to RocksDB 2GiB limit

2020-07-11 Thread Congxian Qiu
Hi Ori AFAIK, current the 2GB limit is still there. as a workaround, maybe you can reduce the state size. If this can not be done using the window operator, can the keyedprocessfunction[1] be ok for you? [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/process

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-11 Thread Congxian Qiu
Hi Si-li Thanks for the notice. I just want to double-check is the original problem has been solved? As I found that the created issue FLINK-18464 has been closed with reason "can not reproduce". Am I missing something here? Best, Congxian Si-li Liu 于2020年7月10日周五 下午6:06写道: > Sorry > > I can'

Re: Possible bug clean up savepoint dump into filesystem and low network IO starting from savepoint

2020-07-11 Thread Congxian Qiu
Hi David As you say the savepoint use local disk, I assume that you use RocksDBStateBackend. What's the flink version are you using now? What do you mean "The task manager did not clean up the state"?, does that mean the local disk space did not clean up, do the task encounter failover in this p