Re: Flink TPC-DS 3TB BenchMark result is not good.

2021-06-22 Thread Jingsong Li
Thanks Yingjie for pinging me. Hi vtygoss, Leonard is right, maybe you are using the wrong statistics information. This caused the optimizer to select the **BROADCAST JOIN** incorrectly. Unfortunately, Flink needs to broadcast a huge amount of data, even gigabytes. This is really the performance

Re: multiple jobs in same flink app

2021-06-22 Thread Qihua Yang
Hi Robert, But I saw Flink doc shows application mode can run multiple jobs? Or I misunderstand it? https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/ *Compared to the Per-Job mode, the Application Mode allows the submission of applications consisting of multiple jobs.

Re: PoJo to Avro Serialization throw KryoException: java.lang.UnsupportedOperationException

2021-06-22 Thread Yun Tang
Hi Rommel, I wonder why avro type would use kryo as its serializer to serialize, could you check what kind of type information could get via TypeInformation.of(class) [1] [1] https://github.com/apache/flink/blob/cc3f85eb4cd3e5031a84321e62d01b3009a00577/flink-core/src/main/java/org/apache/flink

Re: multiple jobs in same flink app

2021-06-22 Thread Robert Metzger
Hi Qihua, Application Mode is meant for executing one job at a time, not multiple jobs on the same JobManager. If you want to do that, you need to use session mode, which allows managing multiple jobs on the same JobManager. On Tue, Jun 22, 2021 at 10:43 PM Qihua Yang wrote: > Hi Arvid, > > Do

PoJo to Avro Serialization throw KryoException: java.lang.UnsupportedOperationException

2021-06-22 Thread Rommel Holmes
My Unit test was running OK under Flink 1.11.2 with parquet-avro 1.10.0, once I upgrade to 1.12.0 with parquet-avro 1.12.0, my unit test will throw com.esotericsoftware.kryo.KryoException: java.lang.UnsupportedOperationException Serialization trace: reserved (org.apache.avro.Schema$Field) fieldMap

Issues when using a file system as a plugin

2021-06-22 Thread Yaroslav Tkachenko
Hi everyone, I need to add support for the GCS filesystem. I have a working example where I add two JARs to the */opt/flink/lib*/ folder: - GCS Hadoop connector - *Shaded* Hadoop using flink-shaded-hadoop-2-uber-2.8.3-10.0.jar Now I'm trying to follow the advice from https://ci.apache.org/projec

Re: Flink TPC-DS 3TB BenchMark result is not good.

2021-06-22 Thread Leonard Xu
Hi, vtygoss Thanks for the detail report, a quick reply as I wrote the org.apache.flink.table.tpcds.TpcdsTestProgram in community, I guess you missed table statistics information. The table statistics information used in the TPC-DS e2e tests is constant for 1GB verification data set, I wrote

High Flink checkpoint Size

2021-06-22 Thread Vijayendra Yadav
Hi Team, I have two flink Streaming Jobs 1) Flink streaming from KAFKA and writing to s3 2) Fling Streaming from KINESIS (KDS) and writing to s3 Both Jobs have similar checkpoint duration. Job #1 (KAFKA) checkpoint size is only 85KB Job #2 (KINESIS) checkpoint size is 18MB There are no checkpoi

Re: NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-22 Thread Rommel Holmes
To give more information parquet-avro version 1.10.0 with Flink 1.11.2 and it was running fine. now Flink 1.12.1, the error msg shows up. Thank you for help. Rommel On Tue, Jun 22, 2021 at 2:41 PM Thomas Wang wrote: > Hi, > > We recently upgraded our Flink version from 1.11.2 to 1.12.1 a

Re: Flink Resource Management

2021-06-22 Thread Chesnay Schepler
They will consume some memory because there are still objects in memory to be kept around, and some CPU cycles during checkpointing, but overall the impact should be negligible. On 6/23/2021 12:39 AM, Jerome Li wrote: Thanks for the response! Let’s assume some of the Sink does not have incom

Re: Flink Resource Management

2021-06-22 Thread Jerome Li
Thanks for the response! Let’s assume some of the Sink does not have incoming message. Would idle Sink hurt the performance to the other instances in the same taskmanager? I did explore the CPU and memory usages for the entire pipeline. When there are not data flowing, the whole cpu and memory u

Use Flink to write a Kafka topic to s3 as parquet files

2021-06-22 Thread Thomas Wang
Hi, I'm trying to tail a Kafka topic and copy the data to s3 as parquet files. I'm using StreamingFileSink with ParquetAvroWriters. It works just fine. However, it looks like I have to generate the Avro schema and convert my POJO class to GenericRecord first (i.e. convert DataStream to DataStream)

NoSuchMethodError - getColumnIndexTruncateLength after upgrading Flink from 1.11.2 to 1.12.1

2021-06-22 Thread Thomas Wang
Hi, We recently upgraded our Flink version from 1.11.2 to 1.12.1 and one of our jobs that used to run ok, now sees the following error. This error doesn't seem to be related to any user code. Can someone help me take a look? Thanks. Thomas java.lang.NoSuchMethodError: org.apache.parquet.column.

Re: multiple jobs in same flink app

2021-06-22 Thread Qihua Yang
Hi Arvid, Do you know if I can start multiple jobs for a single flink application? Thanks, Qihua On Thu, Jun 17, 2021 at 12:11 PM Qihua Yang wrote: > Hi, > > I am using application mode. > > Thanks, > Qihua > > On Thu, Jun 17, 2021 at 12:09 PM Arvid Heise wrote: > >> Hi Qihua, >> >> Which ex

Re: Flink Resource Management

2021-06-22 Thread Chesnay Schepler
No; Flink does not cleanuo idle operators. On 6/22/2021 9:19 PM, Jerome Li wrote: Hi and Dear Flink users, I am new to Flink. My project is using Flink v1.12.4+. I am curious about how Flink manage the cpu and memory when an instance of a ProcessFunction/Sink is idle? Will Flink resource ma

Recommended way to submit a SQL job via code without getting tied to a Flink version?

2021-06-22 Thread Sonam Mandal
Hello, We've written a simple tool which takes SQL statements as input and uses a StreamTableEnvironment​ to eventually submit this to the Flink cluster. We've noticed that the Flink library versions we depend on must match the Flink version running in our Kubernetes cluster for the job submiss

Flink Resource Management

2021-06-22 Thread Jerome Li
Hi and Dear Flink users, I am new to Flink. My project is using Flink v1.12.4+. I am curious about how Flink manage the cpu and memory when an instance of a ProcessFunction/Sink is idle? Will Flink resource manager take deallocate cpu and memory from it? Because I am trying to expand the parall

How would Flink job react to change of partitions in Kafka topic?

2021-06-22 Thread Thomas Wang
Hi, I'm wondering if anyone has changed the number of partitions of a source Kafka topic. Let's say I have a Flink job read from a Kafka topic which used to have 32 partitions. If I change the number of partitions of that topic to 64, can the Flink job still guarantee the exactly-once semantics?

Flink Kubernetes HA

2021-06-22 Thread Ivan Yang
Hi Dear Flink users, We recently implemented enabled the zookeeper less HA in our kubernetes Flink deployment. The set up has high-availability.storageDir: s3://some-bucket/recovery Since we have a retention policy on the s3 bucket, relatively short 7 days. So the HA will fail if the submitte

Re: unsubscribe

2021-06-22 Thread Leonard Xu
You should send an email to user-unsubscr...@flink.apache.org if you want to unsubscribe the mails from user@flink.apache.org Best, Leonard > 在 2021年6月21日,15:33,steven chen 写道: > > unsubscribe > > >

Re: Re: Flink sql case when problem

2021-06-22 Thread houying910523
Thank a lot, best wishes 发自我的小米手机在 JING ZHANG ,2021年6月22日 下午5:59写道:Hi houyin,> I just thought flink engine will help me to do the type implicit conversion.I agree with you, it is a reasonable expectation. There is a FLIP about implicit type coercion [1]. Hope it helps.[1] https://cwiki.apache.or

Re: Re: Flink sql case when problem

2021-06-22 Thread JING ZHANG
Hi houyin, > I just thought flink engine will help me to do the type implicit conversion. I agree with you, it is a reasonable expectation. There is a FLIP about implicit type coercion [1]. Hope it helps. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-154%3A+SQL+Implicit+Type+Coercion

Re:Re: Flink sql case when problem

2021-06-22 Thread 纳兰清风
Hi JING ZHANG, Thank for your reply. I got the points why the exception cause which is I use the b as varchar to compare when '' and '1' instead of intType. I just thought flink engine will help me to do the type implicit conversion. So for now, I'd better fix it in a right way such as select *

Re: Flink Protobuf serialization messages does not contain a setter for field bitField0_

2021-06-22 Thread Chesnay Schepler
It's not an error. It's just a bit of information from the type extractor; it still analyzes the type to check whether it is a POJO, because if it was then it wouldn't need to use Kryo. IOW, unless you experience issues with running your job, you can safely ignore it. On 6/21/2021 9:11 PM, Deb

Re: Monitoring Exceptions using Bugsnag

2021-06-22 Thread Chesnay Schepler
Are you only interested in exceptions that result in the job failing? If so, then https://issues.apache.org/jira/browse/FLINK-20833 may be of interest to you. On 6/18/2021 5:15 PM, Kevin Lam wrote: Hi all, I'm interested in instrumenting an Apache Flink application so that we can monitor exc