Re: fink sql client not able to read parquet format table

2020-04-08 Thread Jark Wu
Hi Lei, Are you using the newest 1.10 blink planner? I'm not familiar with Hive and parquet, but I know @Jingsong Li and @li...@apache.org are experts on this. Maybe they can help on this question. Best, Jark On Tue, 7 Apr 2020 at 16:17, wangl...@geekplus.com.cn < wangl...@geekplus.com.cn> wr

Re: upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-04-08 Thread Robert Metzger
Hey Anuj, can you post the "header" of the jobmanager log, I'm interested in seeing the classpath of your jobmanager. Most likely, there's a mixup in your dependency versions in your classpath. On Tue, Apr 7, 2020 at 8:08 AM aj wrote: > Hello All, > > I am running Flink on AWS EMR, as currentl

Re: State size Vs keys number perfromance

2020-04-08 Thread Congxian Qiu
Hi In the last email, I just wanted to express that the overall state size(and the access pattern, but I assume that the access pattern is the same between the two states) affects the final performance (which has to do with RocksDB's architecture), and if you use MapState and ValueState to end up

Re: Checkpoint Error Because "Could not find any valid local directory for s3ablock-0001"

2020-04-08 Thread Congxian Qiu
Hi LU I'm not familiar with S3 file system, maybe others in Flink community can help you in this case, or maybe you can also reach out to s3 teams/community for help. Best, Congxian Lu Niu 于2020年4月8日周三 上午11:05写道: > Hi, Congxiao > > Thanks for replying. yeah, I also found those references. How

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-08 Thread Salva Alcántara
I agree with your point Piotrek, AsyncIO would handle all the pending data for me. However, the reason why I did not want to use it is because in my case, the callbacks are not always called in response of new data being sent to the third party lib. Indeed, the callback will be called rather uncomm

Re: How to to in Flink to support below HIVE SQL

2020-04-08 Thread Jark Wu
Hi Xiaohua, I'm not very familiar with Hive SQL, I will try to answer some of them: COALESCE => there is also a COALESCE built-in function in Flink [1]. From the documentation, I think they are identical. STR_TO_MAP => there is also a STR_TO_MAP built-in function in Flink blink planner[1]. But t

Flink job didn't restart when a task failed

2020-04-08 Thread Hanson, Bruce
Hello Flink folks: We had a problem with a Flink job the other day that I haven’t seen before. One task encountered a failure and switched to FAILED (see the full exception below). After the failure, the task said it was notifying the Job Manager: 2020-04-06 08:21:04.329 [flink-akka.actor.defau

How to do when Scala Object as input in Flink

2020-04-08 Thread Xiaohua
Hi, Can Scala object be input in Flink? eg, class AA; select AA.id,AA.name from table where AA.id >'10' Could you please give some example about scala object as input in Flink? Specially in Select and Where case. Thank you so much~ BR Xiaohua -- Sent from: http://apache-flink-user-mail

How to to in Flink to support below HIVE SQL

2020-04-08 Thread Xiaohua
Hi, We meet some issue when migrate from Hive/Spark to Flink, Could you please help me? Below is HIVE SQL we used: DISTRIBUTE BY named_struct COALECE LATERAL VIEW row format delimited fields STR_TO_MAP OVERWRITE FULL OUTER JOIN Rlike Array How to do use Flink SQL? Thank you~ B

1. 如下HIVE SQL如何用用Flink SQL或者Table API替换

2020-04-08 Thread Xiaohua
Hi, We meet some issue when migrate from Hive/Spark to Flink, Could you please help me? Below is HIVE SQL we used: DISTRIBUTE BY named_struct COALECE LATERAL VIEW row format delimited fields STR_TO_MAP OVERWRITE FULL OUTER JOIN Rlike Array How to do use Flink SQL? Thank you~ BR Xiaohua --

Parquet S3 Sink Part files are not rolling over with checkpoint

2020-04-08 Thread Roshan Punnoose
Hi, I am trying to get the parquet writer to write to s3; however, the files do not seem to be rolling over. The same file "part-0-0.parquet" is being created each time. Like the 'partCounter" is not being updated? Maybe the Bucket is being recreated each time? I don't really know... Here are some

Re: StateFun - Multiple modules example

2020-04-08 Thread Tzu-Li (Gordon) Tai
Hi Oytun! You can see here an example of how to package a StateFun application image that contains multiple modules: https://ci.apache.org/projects/flink/flink-statefun-docs-stable/deployment-and-operations/packaging.html#images Essentially, for each module you want to include in your application

StateFun - Multiple modules example

2020-04-08 Thread Oytun Tez
Hi there, Does anyone have any statefun 2.0 examples with multiple modules? -- [image: MotaWord] Oytun Tez M O T A W O R D | CTO & Co-Founder oy...@motaword.com

Re: Possible memory leak in JobManager (Flink 1.10.0)?

2020-04-08 Thread Yun Tang
Hi Marc I think the occupied memory is due to the to-remove complete checkpoints which are stored in the workQueue of io-executor [1] in ZooKeeperCompletedCheckpointStore [2]. One clue to prove this is that Executors#newFixedThreadPool would create a ThreadPoolExecutor with a LinkedBlockingQue

Re: Using multithreaded library within ProcessFunction with callbacks relying on the out parameter

2020-04-08 Thread Piotr Nowojski
Hi Salva, Can not you take into account the pending element that’s stuck somewhere in the transit? Snapshot it as well and during recovery reprocess it? This is exactly that’s AsyncWaitOperator is doing. Piotrek > On 5 Apr 2020, at 15:00, Salva Alcántara wrote: > > Hi again Piotr, > > I hav

Re: ListState with millions of elements

2020-04-08 Thread Seth Wiesman
There is a limitation in RocksDB's JNI bridge that will cause applications to fail if list state exceeds 2GB. I am not aware of anyone working on this issue. Seth. [1] https://github.com/facebook/rocksdb/issues/2383 On Wed, Apr 8, 2020 at 12:02 PM Aaron Levin wrote: > Hello friendly Flink comm

job doesn't start via cli after migrating Flink from 1.8 to 1.10

2020-04-08 Thread Vitaliy Semochkin
Hi, I've recently migrated from Flink 1.8 to Flink 1.10 And when I start the job using YarnClusterDescriptor.deployJobCluster method everything works fine. However when I start the job from shell script, the job fails with messages: *Shell script reports:* Cluster specification: ClusterSpecificat

ListState with millions of elements

2020-04-08 Thread Aaron Levin
Hello friendly Flink community! I'm curious if anyone has operational experience with jobs that store ListState where occasionally, due to skew, some small number of lists stored in ListState (stored in RocksDB) will have millions of elements. Here are the stats: * millions of keys * p95 size of

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-08 Thread Yun Tang
Excited to see the stateful functions release! Thanks for the great work of manager Gordon and everyone who ever contributed to this. Best Yun Tang From: Till Rohrmann Sent: Wednesday, April 8, 2020 14:30 To: dev Cc: Oytun Tez ; user Subject: Re: [ANNOUNCE] Apa

Re: TCP streams to multiple clients

2020-04-08 Thread Robert Metzger
Hey Nick, Yes, Flink is able to send data to many destinations (we call them sinks). You don't need to manually replicate the data. In pseudocode, it would look like this: DataStream stream = env.source(ProxyReader()) // potentially do data processing such as filtering, cleaning, windowing, .. on

Re: State size Vs keys number perfromance

2020-04-08 Thread KristoffSC
Thanks Congxian Qiu, I'm aware about your second point. In Value state I will keep String or very simple POJO, without any collections inside. I didn't get your third point, could you clarify it please? "disk read/write is somewhat about the whole state size" Actually what I will keep in Value s

Is it possible to emulate keyed state with operator state?

2020-04-08 Thread Salva Alcántara
Just for the sake of experimenting and learning. Let's assume that we have a keyed process function using keyed state and we want to rewrite it using operator state. The question is, would that be possible to keep the exact same behaviour? For example, one could use operator union list state and th