Re: savepoint failed for finished tasks

2020-01-17 Thread Fanbin Bu
Do you have a rough ETA when this issue would be resolved? Thanks Fanbin On Fri, Jan 17, 2020 at 12:32 AM Biao Liu wrote: > Hi Fanbin, > > Congxian is right. We can't support checkpoint or savepoint on finite > stream job now. > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Fri, 17 Jan 2020 at 16:26, Co

Re: How to handle startup for mandatory config parameters?

2020-01-17 Thread John Smith
Ok, perfect. Thanks! On Fri, 17 Jan 2020 at 11:39, Seth Wiesman wrote: > Yes, the preferred method is to log and throw an exception prior to > calling `execute`. > > The logs will be on the flink dispatcher and the exception will be > returned wrapped in a failed deployment exception. You do not

ValueState with pure Java class keeping lists/map vs ListState/MapState, which one is a recommended way?

2020-01-17 Thread Elkhan Dadashov
Hi Flinkers, Was curious about if there is any performance(memory/speed) difference between these two options: in window process functions, when keeping state: *1) Create a single ValueState, and store state in pure Java objects* class MyClass { List listOtherClass; Map mapKeyToSomeValue;

Re: How to handle startup for mandatory config parameters?

2020-01-17 Thread John Smith
Hi, let me see if I can be more clear When the job is launched, before the 2 calls below in the main() we read some configs, regardless if it's Paramtools or file or what ever doesn't matter. Some of those params are mandatory. I'm guessing it's better to log and throw exception so the main()

Re: Table API: Joining on Tables of Complex Types

2020-01-17 Thread Timo Walther
Hi Andreas, if dataset.getType() returns a RowTypeInfo you can ignore this log message. The type extractor runs before the ".returns()" but with this method you override the old type. Regards, Timo On 15.01.20 15:27, Hailu, Andreas wrote: Dawid, this approach looks promising. I’m able to fl

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Aaron Levin
+1. I personally found it a little confusing when I discovered I had to configure this after already choosing RocksDB as a backend. Also very strongly in favour of "safe and scalable" as the default. Best, Aaron Levin On Fri, Jan 17, 2020 at 4:41 AM Piotr Nowojski wrote: > +1 for making it con

RE: EXT :Flink solution for having shared variable between task managers

2020-01-17 Thread Martin, Nick J [US] (IS)
I think you’re looking for Broadcast State. Here’s a detailed guide. https://flink.apache.org/2019/06/26/broadcast-state.html From: Soheil Pourbafrani [mailto:soheil.i...@gmail.com] Sent: Friday, January 17, 2020 6:50 AM To: user Subject: EXT :Flink solution for having shared variable between ta

Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-17 Thread Senthil Kumar
Hello all, Newbie here! We are running in Amazon EMR with the following installed in the EMR Software Configuration Hadoop 2.8.5 JupyterHub 1.0.0 Ganglia 3.7.2 Hive 2.3.6 Flink 1.9.0 I am trying to get a Streaming job from one S3 bucket into an another S3 bucket using the StreamingF

Re: How to declare the Row object schema

2020-01-17 Thread Fabian Hueske
Hi, Which version are you using? I can't find the error message in the current code base. When writing data to a JDBC database, all Flink types must be correctly matched to a JDBC type. The problem is probably that Flink cannot match the 8th field of your Row to a JDBC type. What's the type of th

Re: Filter with large key set

2020-01-17 Thread Fabian Hueske
Hi Eleanore, A dynamic filter like the one you need, is essentially a join operation. There is two ways to do this: * partitioning the key set and the message on the attribute. This would be done with a KeyedCoProcessFunction. * broadcasting the key set and just locally forwarding the messages. T

Flink solution for having shared variable between task managers

2020-01-17 Thread Soheil Pourbafrani
Hi, According to the processing logic, I need to have a HashMap variable that should be shared between the taskmanagers. The scenario is the HashMap data will be continuously updated according to the incoming stream of data. What I observed is declaring the HashMap variable as a class attribute,

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-17 Thread Dian Fu
Thanks everyone for your warm welcome. It's my pleasure to be part of the community and looking forward to contribute more in the future. Regards, Dian On Fri, Jan 17, 2020 at 4:03 PM Yang Wang wrote: > Congratulations! > > > Best, > Yang > > Terry Wang 于2020年1月17日周五 下午3:28写道: > >> Congratulat

Re: Flink ParquetAvroWriters Sink

2020-01-17 Thread Arvid Heise
Hi Anuj, I originally understood that you would like to store data in the same Kafka topic and also want to store it in the same parquet file. In the past, I mostly used schema registry with Kafka Streams, where you could only store a schema for a key and value respectively. To use different recor

Re: Taskmanager fails to connect to Jobmanager [Could not find any IPv4 address that is not loopback or link-local. Using localhost address.]

2020-01-17 Thread Yangze Guo
Hi, Harshith As a supplementary note to Yang, the issue seems to be that something went wrong when trying to connect to the ResourceManager. There may be two possibilities, the leader of ResourceManager does not write the znode or the TaskExecutor fails to connect to it. If you turn on the DEBUG l

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Piotr Nowojski
+1 for making it consistent. When using X state backend, timers should be stored in X by default. Also I think any configuration option controlling that needs to be well documented in some performance tuning section of the docs. Piotrek > On 17 Jan 2020, at 09:16, Congxian Qiu wrote: > > +1

Re: Flink Dataset to ParquetOutputFormat

2020-01-17 Thread Arvid Heise
Hi Anuj, as far as I know, there is nothing like that on the Dataset side. Could you implement your query on Datastream with bounded inputs? In the long term, Dataset API should be completely replaced with Datastream API. Best, Arvid On Thu, Jan 16, 2020 at 12:35 PM aj wrote: > Hi Arvid, > >

Re: Taskmanager fails to connect to Jobmanager [Could not find any IPv4 address that is not loopback or link-local. Using localhost address.]

2020-01-17 Thread Yang Wang
Hi Kumar Bolar, Harshith, Could you please check the jobmanager log to find out what address the akka is listening? Also the address could be used to connected to the jobmanager on the taskmanger machine. BTW, if you could share the debug level logs of jobmanger and taskmanger. It will help a lot

Re: Why would indefinitely growing state an issue for Flink while doing stream to stream joins?

2020-01-17 Thread Fabian Hueske
Hi, Large state is mainly an issue for Flink's fault tolerance mechanism which is based on periodic checkpoints, which means that the state is copied to a remote storage system in regular intervals. In case of a failure, the state copy needs to be loaded which takes more time with growing state si

Re: Job cannot be deployed when use detached mode

2020-01-17 Thread Biao Liu
Ah, thanks Yang for the fixup. I misunderstood the original answer. Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 16:39, Yang Wang wrote: > Hi sysuke, > > >> Why the Yarn per-job attach mode could work, but detach mode could not? > It is just becausein 1.9 and previous versions, the per-job ha

Re: How to handle startup for mandatory config parameters?

2020-01-17 Thread Biao Liu
Hi John, ParameterTools is just a utility to help user to handle arguments. I guess you are using ParameterTools in main method. If it is, it should be in client log file, like Yang said, it's under "{FLINK_HOME}/log". > Do I check someConfig for what ever requirement and just throw an exception

Re: Job cannot be deployed when use detached mode

2020-01-17 Thread Yang Wang
Hi sysuke, >> Why the Yarn per-job attach mode could work, but detach mode could not? It is just becausein 1.9 and previous versions, the per-job have very different code path for attach and detach mode. For attach mode, Flink client will start a session cluster, and then submit a job to the exist

Re: savepoint failed for finished tasks

2020-01-17 Thread Biao Liu
Hi Fanbin, Congxian is right. We can't support checkpoint or savepoint on finite stream job now. Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 16:26, Congxian Qiu wrote: > Hi > > Currently, Checkpoint/savepoint only works if all operators/tasks are > still running., there is an issue[1] track

Re: Job cannot be deployed when use detached mode

2020-01-17 Thread Biao Liu
Hi Sysuke, Could you check the JM log (YARN AM container log) first? You might find the direct failure message there. Thanks, Biao /'bɪ.aʊ/ On Fri, 17 Jan 2020 at 12:02, sysuke Lee wrote: > Hi all, > We've got a jar with hadoop configuration files in it. > > Previously we use blocking mode

Re: Job Manager heap metrics

2020-01-17 Thread Biao Liu
Hi RKandoji, > Could someone please tell me what is the best way to check amount of heap consumed by Job Manager? I'm not sure if there is a best way to achieve this. However there are some workaround ways. 1. Through metrics [1] 2. Print GC in stdout. You could achieve it through config item "en

Re: savepoint failed for finished tasks

2020-01-17 Thread Congxian Qiu
Hi Currently, Checkpoint/savepoint only works if all operators/tasks are still running., there is an issue[1] tracking this [1] https://issues.apache.org/jira/browse/FLINK-2491 Best, Congxian Fanbin Bu 于2020年1月17日周五 上午6:49写道: > Hi, > > I couldn't make a savepoint for the following graph: > [

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

2020-01-17 Thread Congxian Qiu
+1 to store timers in RocksDB default. Store timers in Heap can encounter OOM problems, and make the checkpoint much slower, and store times in RocksDB can get ride of both. Best, Congxian Biao Liu 于2020年1月17日周五 下午3:10写道: > +1 > > I think that's how it should be. Timer should align with other

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-17 Thread Yang Wang
Congratulations! Best, Yang Terry Wang 于2020年1月17日周五 下午3:28写道: > Congratulations! > > Best, > Terry Wang > > > > 2020年1月17日 14:09,Biao Liu 写道: > > Congrats! > > Thanks, > Biao /'bɪ.aʊ/ > > > > On Fri, 17 Jan 2020 at 13:43, Rui Li wrote: > >> Congratulations Dian, well deserved! >> >> On Thu,