Re: Unit / Integration Test Timer

2018-09-17 Thread Till Rohrmann
Hi Ashish, I think you are right. In the current master, the system should wait until all timers have completed before terminating. Could you check whether this is the case? If not, then this might indicate a problem. Which version of Flink are you using? I guess it would also be helpful to have a

Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-17 Thread Till Rohrmann
With Flink 1.6.0 it is no longer needed to specify the number of started containers (-yn 145). Flink will dynamically allocate containers. That's also the reason why you don't registered TMs without a running job. Moreover it it recommended to start every container with a single slot (no -ys 5). Th

Re: Window operator schema evolution - savepoint deserialization fail

2018-09-17 Thread tisonet
Hi Andrey thanks for answer. It seems that is not possible to handle case class evolution in version which I currently use (1.5.3). Do you have any recommendation how to avoid such problem in future? Adding a new field with default value to an existing class seems to me as a common use case. Can

Re: [Kerberos] JAAS module content not generated? javax.security.auth.callback.UnsupportedCallbackException: Could not login: the client is being asked for a password, but the Kafka client code does n

2018-09-17 Thread Sebastien Pereira
The root cause was the property `sasl.mechanism` that we forced to `PLAIN`. For Kerberos authentication the value must be `GSSAPI` (default value). FYI from the source code it seems normal that the JAAS file have no content: the configuration is dynamically set, ie: https://github.com/apache/fl

Re: 1.5.1

2018-09-17 Thread Juho Autio
For the record, this hasn't been a problem for us any more. Successfully running Flink 1.6.0. We set "web.timeout: 12" in flink-conf.yaml, but so far I've gathered that this setting doesn't have anything to do with heartbeat timeouts (?). Most likely the heartbeat timeouts were caused by some

Re: Window operator schema evolution - savepoint deserialization fail

2018-09-17 Thread Andrey Zagrebin
Hi Zdenek, the schema evolution can be tricky in general. I would suggest to plan possible schema extensions in advance, use more custom serialisation and make sure that it supports the required types of evolution. E.g. some custom Avro serialiser might tolerate better adding a field with a de

Re: Client failed to get cancel with savepoint response

2018-09-17 Thread Till Rohrmann
Hi Paul, you're analysis is right. The JobManager does not wait for pending operation results to be properly served. See https://issues.apache.org/jira/browse/FLINK-10309 for more details. I think a way to solve it is to wait for some time if the RestServerEndpoint still has some responses to serv

Flink 1.3.2 RocksDB map segment fail if configured as state backend

2018-09-17 Thread Andrea Spina
Hi everybody, I run with a Flink 1.3.2 installation on a Red Hat Enterprise Linux Server and I'm not able to set rocksdb as state.backend due to this error whenever I try to deploy any job: *java.lang.IllegalStateException: Could not initialize keyed state backend.* *at org.apache.flink.strea

Lookup from state

2018-09-17 Thread Taher Koitawala
Hi All, As per my knowledge, all windowing operators in flink are stateful. So let's say I have 2 streams, Stream1 and Stream2. Stream1 and Stream2 are aggregated over some key and which is then windowed on EventTime. So record X from Stream1 reaches flink on time, however, record X' from

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
Hi Till, I am still in 1.4.2 version and will need some time before we can get later version certified in our Prod env. Timers are definitely not completing in my tests with 1.4.2 utils, I can see them being registered in debugger though. Having said that, should I pull latest test utils only a

Re: Flink 1.3.2 RocksDB map segment fail if configured as state backend

2018-09-17 Thread Kostas Kloudas
Hi Andrea, I think that Andrey and Stefan (cc’ed) may be able to help. Kostas > On Sep 17, 2018, at 11:37 AM, Andrea Spina wrote: > > Hi everybody, > > I run with a Flink 1.3.2 installation on a Red Hat Enterprise Linux Server > and I'm not able to set rocksdb as state.backend due to this er

Re: Lookup from state

2018-09-17 Thread Kostas Kloudas
Hi Taher, To understand your use case, you have something like the following: stream1.keyBy(…) .connect(stream2.keyBy(…)) .window(…).apply(MyWindowFunction) and you want from within the MyWindowFunction to access the state for a FIRED window when a late element arrives for that

Re: ListState - elements order

2018-09-17 Thread Kostas Kloudas
Hi all, Flink does not provide any guarantees about the order of the elements in a list and it leaves it to the state-backends. This means that semantics between different backends may differ, and even if something holds now for one of them, if RocksDB or a filesystem decides to change its sem

Re: ListState - elements order

2018-09-17 Thread Kostas Kloudas
Oops, Sorry but I lost part of the discussion that had already been made. Please ignore my previous answer. Kostas > On Sep 17, 2018, at 4:37 PM, Kostas Kloudas > wrote: > > Hi all, > > Flink does not provide any guarantees about the order of the elements in a > list and it leaves it to the

[ANNOUNCE] Weekly community update #38

2018-09-17 Thread Till Rohrmann
Dear community, this is the weekly community update thread #38. Please post any news and updates you want to share with the community to this thread. # Release vote for Flink 1.6.1 and 1.5.4 The community is currently voting on the next bug fix releases Flink 1.6.1 [1] and Flink 1.5.4 [2]. Pleas

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
Hi Till, A quick update. I added a MockFlatMap function with the following logic: public MockStreamPause(int pauseSeconds) { this.pauseSeconds = pauseSeconds; } @Override public void flatMap(PlatformEvent event, Collector out) throws Exception { if(event.getSrc().startsWith(Even

Re: Flink 1.3.2 RocksDB map segment fail if configured as state backend

2018-09-17 Thread Stefan Richter
Hi, I think the exception pretty much says what is wrong, the native library cannot be mapped into the process because of some access rights problem. Please make sure that your path /tmp has the exec right. Best, Stefan > Am 17.09.2018 um 11:37 schrieb Andrea Spina : > > Hi everybody, > > I

Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread John Stone
Hello, I'm checking if this is intentional or a bug in Apache Flink SQL (Flink 1.6.0). I am using processing time with a RocksDB backend. I have not checked if this issue is also occurring in the Table API. I have not checked if this issue also exists for event time (although I suspect it does)

Re: questions about YARN deployment and HDFS integration

2018-09-17 Thread Kostas Kloudas
Hi Chiang, Some of the answers you can find in line: > On Sep 17, 2018, at 3:47 PM, Chang Liu wrote: > > Dear All, > > I am helping my team setup a Flink cluster and we would like to have high > availability and easy to scale. > > We would like to setup a minimal cluster environment but can

Re: Question regarding state cleaning using timer

2018-09-17 Thread Kostas Kloudas
Hi Bhaskar, If you want different TTLs per key, then you should use timers with a process function as shown in [1]. This is though an old presentation, so now the RichProcessFunction is a KeyedProcessFunction. Also please have a look at the training material in [2] and the process function doc

Re: Question regarding state cleaning using timer

2018-09-17 Thread Vijay Bhaskar
Thanks Kostas! Regards Bhaskar On Mon, Sep 17, 2018 at 9:05 PM Kostas Kloudas wrote: > Hi Bhaskar, > > If you want different TTLs per key, then you should use timers with a > process function > as shown in [1]. This is though an old presentation, so now the > RichProcessFunction is a KeyedProce

Re: Setting a custom Kryo serializer in Flink-Python

2018-09-17 Thread Kostas Kloudas
Hi Joe, Probably Chesnay (cc’ed) may have a better idea on why this is happening. Cheers, Kostas > On Sep 14, 2018, at 7:30 PM, Joe Malt wrote: > > Hi, > > I'm trying to write a Flink job (with the Python streaming API) that handles > a custom type that needs a custom Kryo serializer. > > W

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Fabian Hueske
Hi John, Are you sure that the first rows of the first window are dropped? When a query with processing time windows is terminated, the last window is not computed. This in fact intentional and does not apply to event-time windows. Best, Fabian 2018-09-17 17:21 GMT+02:00 John Stone : > Hello,

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Xingcan Cui
Hi John, I’ve not dug into this yet, but IMO, it shouldn’t be the case. I just wonder how do you judge that the data in the first five seconds are not processed by the system? Best, Xingcan > On Sep 17, 2018, at 11:21 PM, John Stone wrote: > > Hello, > > I'm checking if this is intentional

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread elliotstone
Yes, I am certain events are being ignored or dropped during the first five seconds. Further investigation on my part reveals that the "ignore" period is exactly the first five seconds of the stream - regardless of the size of the window. Situation I have a script which pushes an event into K

Expire records in FLINK dynamic tables?

2018-09-17 Thread burgesschen
Hi everyone, I'm trying out the SQL API in flink 1.6 I first convert a data stream to a table (let's call it tableA) using tableEnv.registerDataStream, then perform queries on tableA and finally convert the result to a retract stream. My question is: Is there a way to clean up/ remove records/exp

Re: Utilising EMR's master node

2018-09-17 Thread Gary Yao
Hi Averell, According to the AWS documentation [1], the master node only runs the YARN ResourceManager and the HDFS NameNode. Containers can only by launched on nodes that are running the YARN NodeManager [2]. Therefore, if you want TMs or JMs to be launched on your EMR master node, you have to st

Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-17 Thread Subramanya Suresh
Thanks Till, "That's also the reason why you don't registered TMs without a running job." > I am not sure what you mean. We see 0 TMs in Flink (attached earlier and also in the TaskManagers link) despite running/submitting the Job (the RM seems to show lot of containers though, attached) . > Also

Re: Expire records in FLINK dynamic tables?

2018-09-17 Thread Fabian Hueske
Hi Chen, Yes, this is possible. Have a look at the configuration of idle state retention time [1]. Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/streaming.html#idle-state-retention-time 2018-09-17 20:10 GMT+02:00 burgesschen : > Hi everyone, > I'm tryin

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Fabian Hueske
Hmm, that's interesting. HOP and TUMBLE window aggregations are directly translated into their corresponding DataStream counterparts (Sliding, Tumble). There should be no filtering of records. I assume you tried a simple query like "SELECT * FROM MyEventTable" and received all expected data? Fabi

Re: CombinableGroupReducer says The Iterable can be iterated over only once

2018-09-17 Thread Fabian Hueske
Hi Alejandro, asScala calls iterator() the first time and reduce() another time. These iterators can only be iterated once because they are possibly backed by multiple sorted files which have been spilled to disk and are merge-sorted while iterating. I'm actually surprised that you found this cod

Re: Flink 1.6.0 not allocating specified TMs in Yarn

2018-09-17 Thread Subramanya Suresh
I got these logs from one of the Yarn logs. Not sure what changed in 1.6.0, couldn't find anything relevant in the release notes. Looking through the code i am not sure the JVM Heap Size is < 8GB. We start the TM with 20GB, so with the cutoff we should have totalJavaMemorySizeMB = 20GB - 5GB i.e. 1

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Rong Rong
This is in fact a very strange behavior. To add to the discussion, when you mentioned: "raw Flink (windowed or not) nor when using Flink CEP", how were the comparisons being done? Also, were you able to get the results correct without the additional GROUP BY term of "foo" or "userId"? -- Rong On

Re: why same Sliding ProcessTime TimeWindow triggered twice

2018-09-17 Thread Rong Rong
I haven't dug too deep into the content. But seems like this line was the reason: .keyBy(s => s.endsWith("FRI")) essentially you are creating two key partitions (True, False) where each one of them has its own sliding window I believe. Can you printout the key space for each of th

Re: Potential bug in Flink SQL HOP and TUMBLE operators

2018-09-17 Thread Xingcan Cui
Hi John, I suppose that was caused by the groupBy field “timestamp”. You were actually grouping on two time fields simultaneously, the processing time and the time from your producer. As @Rong suggested, try removing the additional groupBy field “timestamp” and check the result again. Best, Xi

Re: How to get taskmanager hostname and port on runtime

2018-09-17 Thread 郑舒力
Hi, Thanks for your reply. port means akka tcp port. Because I want to unique identify a taskmanager. container_id is an other choice. Kafka metric reporter implement kafka interface, that is not a flink MetricReporter. So i can’t get flink runtime info directly. > 在 2018年9月13日,下午4:25,Dawid W

In which case the StreamNode has multiple output edges?

2018-09-17 Thread 徐涛
Hi All, I am reading the source code of flink, when I read the StreamGraph generate part, I found that there is a property named outEdges in StreamNode. I know there is a case a StreamNode has multiple inEdges, but in which case the StreamNode has multiple outEdges? Thanks a lot.

Re: Utilising EMR's master node

2018-09-17 Thread Averell
Thank you Gary. Regarding the option to use a smaller server for the master node, when starting a flink job, I would get an error like the following; /Caused by: org.apache.flink.configuration.IllegalConfigurationException: *The number of virtual cores per node were configured with 16 but Yarn on

Re: In which case the StreamNode has multiple output edges?

2018-09-17 Thread vino yang
Hi tao, The Dataflow abstraction of Flink runtime is a DAG. In a graph, there may be more than one in-edge and one out-edge. A simple example of multiple out margins is that an operator is followed by multiple sinks. For example, a sink to kafka and a sink to elasticsearch. Thanks, vino. 徐涛 于20

Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

2018-09-17 Thread Subramanya Suresh
Hi, We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). - We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv", myCSV.csv) - In a UDF (ScalarFunction), in the open function

InpoolUsage & InpoolBuffers inconsistence

2018-09-17 Thread aitozi
Hi, I found that these two metric is inconsistent, the inpoolQueueLength is positive, but the inpoolUsage is always zero. Is this a bug? cc @Chesnay Schepler -- Sent from: http://apac

Re: InpoolUsage & InpoolBuffers inconsistence

2018-09-17 Thread aitozi
And my doubt for that comes from the debug of problem of checkpoint expiration. I encountered the checkpoint expiration with no backpressure shown in web ui. But after i add many log, i found that the barrier send to the downstream, And the downstream may be set to autoread = false , and block the

回复:InpoolUsage & InpoolBuffers inconsistence

2018-09-17 Thread Zhijiang(wangzhijiang999)
Hi, The inpoolQueueLength indicates hown many buffers are received and queued. But if the buffers in the queue are the events (like barrier), it will not be calculated in the inpoolUsage. So in your case it may be normal for these two metrics. If you monitored that the autoread=false in downstr

Re: 回复:InpoolUsage & InpoolBuffers inconsistence

2018-09-17 Thread aitozi
Hi,Zhijiang Thanks for your reply. But i still have little question. Let me make my debug process more clearly。I log in PartitionRequestQueue to ensure the action of write a barrier and the callback of writeAndFlush is almost at the same time. (Which is done by check the buffer to be sent whether