Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread Felipe Gutierrez
Hi JingsongLee, Sorry for not explain very well. I am gonna try a clarification of my idea. 1 - I want to use InMemoryExternalCatalog in a way to save some statistics which I create by listening to a stream. 2 - Then I will have my core application using Table API to execute some aggregation/join.

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Biao Liu
Hi Ken again, In regard to TimeoutException, I just realized that there is no akka.remote.OversizedPayloadException in your log file. There might be some other reason caused this. 1. Have you ever tried increasing the configuration "akka.ask.timeout"? 2. Have you ever checked the garbage collectio

Re: Best Flink SQL length proposal

2019-06-26 Thread JingsongLee
Hi Simon, Hope you can wrap them simply. In our scenario, there are also many jobs that have so many columns, the huge generated code not only lead to compile exception, but also lead to the code cannot be optimized by JIT. We are planning to introduce a Java code Splitter (analyze Java code an

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread JingsongLee
Hi Felipe: Yeah, you can use InMemoryExternalCatalog and CalciteConfig, but I don't quite understand what you mean. InMemoryExternalCatalog provides methods to create, drop, and alter (sub-)catalogs or tables. And CalciteConfig is for defining a custom Calcite configuration. They are two separa

Re: Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-26 Thread Xintong Song
Hi Singh, You can use the environment variable "FLINK_CONF_DIR" to specify path to the directory of config files. You can also override config options with command line arguments prefixed -D (for yarn-session.sh) or -yD (for 'flink run' command). Thank you~ Xintong Song On Wed, Jun 26, 2019 a

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Biao Liu
Hi Ken, In regard to oversized input splits, it seems to be a rare case beyond my expectation. However it should be fixed definitely since input split can be user-defined. We should not assume it must be small. I agree with Stephan that maybe there is something unexpectedly involved in the input s

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread qi luo
Hi Stephan, We have met similar issues described as Ken. Would all these issues be hopefully fixed in 1.9? Thanks, Qi > On Jun 26, 2019, at 10:50 PM, Stephan Ewen wrote: > > Hi Ken! > > Sorry to hear you are going through this experience. The major focus on > streaming so far means that the

Re: Best Flink SQL length proposal

2019-06-26 Thread Simon Su
Hi Jiongsong Thanks for your reply. It seems that to wrap fields is a feasible way for me now. And there already exists another JIRA FLINK-8921 try to improve this. Thanks, Simon On 06/26/2019 19:21,JingsongLee wrote: Hi Simon: Does your code include the PR[1]? If include: try set

Re: HDFS checkpoints for rocksDB state backend:

2019-06-26 Thread Congxian Qiu
Hi Andrea As the NoClassDefFoundError, could you please verify that there exist `org.apache.hadoop.hdfs.protocol.HdfsConstants*` *in your jar. Or could you use Arthas[1] to check if there exists the class when running the job? [1] https://github.com/alibaba/arthas Best, Congxian Andrea Spina

Re: open() setup method not being called for AggregateFunctions?

2019-06-26 Thread Piyush Narang
Circling back on this as I was able to dig in a bit more our specific use-case (Datastream API and we perform a window + groupby). It seems as though the planner is creating an AggregateAggFunction which currently isn’t a RichFunction. From what I understand, not allowing rich aggregation functi

Any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job?

2019-06-26 Thread Elkhan Dadashov
Hey Flink community, Just getting started with Flink. Wanted to ask if there is any tutorial/example/blogpost/doc for Hive Source and Hive Sink with Flink streaming job? Thanks.

HDFS checkpoints for rocksDB state backend:

2019-06-26 Thread Andrea Spina
Dear community, I'm trying to use HDFS checkpoints in flink-1.6.4 with the following configuration state.backend: rocksdb state.checkpoints.dir: hdfs:// rbl1.stage.certilogo.radicalbit.io:8020/flink/checkpoint state.savepoints.dir: hdfs:// rbl1.stage.certilogo.radicalbit.io:8020/flink/savepoints

Re: [ANNOUNCEMENT] June 2019 Bay Area Apache Flink Meetup

2019-06-26 Thread Xuefu Zhang
Hi all, As a gentle reminder, the meetup [1] will be on today at 6:30pm at Zendesk, 1019 Market Street, SF. Come on in for enlightening talks as well as foods and drinks. See you there! Regards, Xuefu [1] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262216929/ On Fri, Jun 21, 20

Error when creating InMemoryExternalCatalog to populate using another stream.

2019-06-26 Thread Felipe Gutierrez
Hi, I am trying to use the InMemoryExternalCatalog to register a table using the Java Table API 1.8 I want to update this table during with another stream that I will be reading. Then I plan to use the values of my InMemoryExternalCatalog to execute other queries. Is that a reasonable plan to exec

Re: Batch mode with Flink 1.8 unstable?

2019-06-26 Thread Stephan Ewen
Hi Ken! Sorry to hear you are going through this experience. The major focus on streaming so far means that the DataSet API has stability issues at scale. So, yes, batch mode in current Flink version can be somewhat tricky. It is a big focus of Flink 1.9 to fix the batch mode, finally, and by add

Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-26 Thread M Singh
Hi: I have a single EMR cluster with Flink and want to run multiple applications on it with different flink configurations.  Is there a way to  1. Pass the config file name for each application, or2. Overwrite the config parameters via command line arguments for the application.  This is similar

Re: Apache Flink - How to pass configuration params in the flink-config.yaml file to local execution environment

2019-06-26 Thread M Singh
Hey Folks:  Just wanted to see if you have any advice on this issue of passing config parameters to the application.  I've tried passing parameters by using ParameterTool parameterTool = ParameterTool.fromMap(config);StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvir

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread Felipe Gutierrez
Hi JingsongLee, it is still not very clear to me. I imagine that I can create an InMemoryExternalCatalog and insert some tuples there (which will be in memory). Then I can use Calcite to use the values of my InMemoryExternalCatalog and change my plan. Is that correct? Do you have an example of ho

Re: Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread JingsongLee
Hi Felipe: I think your approach is absolutely right. You can try to do some plan test just like [1]. You can find more CalciteConfigBuilder API test in [2]. 1.https://github.com/apache/flink/blob/release-1.8/flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/api/batch/table/Co

Re: Best Flink SQL length proposal

2019-06-26 Thread JingsongLee
Hi Simon: Does your code include the PR[1]? If include: try set TableConfig.setMaxGeneratedCodeLength smaller (default 64000)? If exclude: Can you wrap some fields to a nested Row field to reduce field number. 1.https://github.com/apache/flink/pull/5613 ---

Hello-world example of Flink Table API using a edited Calcite rule

2019-06-26 Thread Felipe Gutierrez
Hi, does someone have a simple example using Table API and a Calcite rule which change/optimize the query execution plan of a query in Flink? >From the official documentation, I know that I have to create a CalciteConfig object [1]. Then, I based my firsts tests on this stackoverflow post [2] and

Best Flink SQL length proposal

2019-06-26 Thread Simon Su
Hi all, Currently I faced a problem caused by a long Flink SQL. My sql is like “insert into tableA select a, b, c …….from sourceTable”, I have more than 1000 columns in select target, so that’s the problem, flink code generator will generate a RichMapFunction class and contains a map fu

[FLINK-10868] job cannot be exited immediately if job manager is timed out for some reason

2019-06-26 Thread Anyang Hu
Hi ZhenQiu && Rohrmann: Currently I backport the FLINK-10868 to flink-1.5, most of my jobs (all batch jobs) can be exited immediately after applying for the failed container to the upper limit, but there are still some jobs cannot be exited immediately. Through the log, it is observed that these j