Re: Network PartitionNotFoundException when run on multi nodes

2018-07-23 Thread Steffen Wohlers
Hi Zhijiang, Minglei, all, your both hints and explanations work well. Thank you very much! Thanks, Steffen > On 23. Jul 2018, at 08:10, Zhijiang(wangzhijiang999) > wrote: > > Hi Steffen, > > This exception indicates that when the downstream task requests partition > from the upstream task,

RE: FlinkCEP and scientific papers ?

2018-07-23 Thread Esa Heikkinen
Hi Thank you. This was very good paper for me ☺ How much current FlinkCEP works like this (the paper was written 2008) ? Are there exist newer papers related to current FlinkCEP ? BR Esa From: Till Rohrmann Sent: Wednesday, July 18, 2018 9:38 AM To: vino yang Cc: Esa Heikkinen ; Chesnay Sche

Re: Flink job hangs using rocksDb as backend

2018-07-23 Thread Stefan Richter
Hi, let me first clarify what you mean by „stuck“, just because your job stops consuming events for some time does not necessarily mean that it is „stuck“. That is very hard to evaluate from the information we have so far, because from the stack trace you cannot conclude that the thread is „stu

Re: Events can overtake watermarks

2018-07-23 Thread Aljoscha Krettek
Hi, this happens only with Flink 1.5.1? I would expect that in a parallel setting it can happen that a watermark is delayed downstream because the watermark from the other inputs has not caught up yet to "watermark1" (because the watermark at an operator is the minimum of the input watermarks).

Re: Events can overtake watermarks

2018-07-23 Thread Stefan Richter
Hi, events overtaking watermarks doesn’t sound like a „wrong“ behaviour, only watermarks overtaking events would be bad. Do you think this only stated from Flink 1.5? To me this does not sound like a problem, but not sure if it is intended. Looping in Aljoscha, just in case. Best, Stefan > Am

Re: Events can overtake watermarks

2018-07-23 Thread Gyula Fóra
Hi guys, Let me clarify. There is a single source with parallelism 1 and a single downstream operator with parallelism > 1. So the watermark is strictly controlled by the source. Also I am talking about calls to the processWatermark function of the downstream operator not about the watermark compu

Re: Memory Leak in ProcessingTimeSessionWindow

2018-07-23 Thread Stefan Richter
Hi, for most windows, all state is cleared through FIRE_AND_PURGE, except for windows that are subtypes of merging windows, such as session windows. Here, the state still remembers the window itself until the watermark passes the session timeout+allowed lateness. This is done so that elements t

Re: Events can overtake watermarks

2018-07-23 Thread Gyula Fóra
Hm I wonder it could be because the downstream operator is a 2 input operator and I do some filtering on the source elements to direct to one of the inputs. The filtering logic is chained but I guess in this case it can happen that the downstream operators reads 2 events from one input channel even

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread Stefan Richter
Hi, I am wondering how this can even work properly if you are using a local fs for checkpoints instead of a distributed fs. First, what happens under node failures, if the SSD becomes unavailable or if a task gets scheduled to a different machine, and can no longer access the disk with the cor

Re: Events can overtake watermarks

2018-07-23 Thread Gyula Fóra
Yea, now that I think about it, thats probably the case. Sorry to bother :) Gyula Gyula Fóra ezt írta (időpont: 2018. júl. 23., H, 11:04): > Hm I wonder it could be because the downstream operator is a 2 input > operator and I do some filtering on the source elements to direct to one of > the i

Re: org.apache.flink.streaming.runtime.streamrecord.LatencyMarker cannot be cast to org.apache.flink.streaming.runtime.streamrecord.StreamRecord

2018-07-23 Thread Aljoscha Krettek
Hi Greg, just making sure but is there any asynchrony in your user functions? Any Async I/O operator maybe? Best, Aljoscha > On 20. Jul 2018, at 21:53, Gregory Fee wrote: > > This is on Flink 1.4.2. I filed it as Flink-9905. Thanks! > > On Fri, Jul 20, 2018 at 2:51 AM, Dawid Wysakowicz

Re: Parallelism and keyed streams

2018-07-23 Thread Fabian Hueske
Hi, Flink guarantees order only within a partition. For example, if you have the program map_1 -> map_2 and both map functions run with parallelism 4, the order of records in each of the 4 partitions is not changed.. In case of a shuffle (such as a keyBy or change in parallelism) records are shipp

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-23 Thread Gerard Garcia
Hi Till, I can't post the full log (as there is internal info in them) but I've found this. Is that what you are looking for? 11:29:17.351 [main] INFO org.apache.flink.client.cli.CliFrontend - 11:29:17.372 [main] I

Re: Re: Bootstrapping the state

2018-07-23 Thread Fabian Hueske
Hi Henkka, You might want to consider implementing a dedicated job for state bootstrapping that uses the same operator UUID and state names. That might be easier than integrating the logic into your regular job. I think you have to use the monitoring file source because AFAIK it won't be possible

Re: Flink job hangs/deadlocks (possibly related to out of memory)

2018-07-23 Thread Gerard Garcia
Thanks Zhijiang, Yes, I guess our best option right now is to just reduce the structure of the output record and see if that solves the problem. Gerard On Tue, Jul 17, 2018 at 4:56 PM Zhijiang(wangzhijiang999) < wangzhijiang...@aliyun.com> wrote: > Hi Gerard, > > From the jstack you provided, t

Re: IoT Use Case, Problem and Thoughts

2018-07-23 Thread Fabian Hueske
Thanks for the feedback Ashish, I think the closest solution to your problem could be to enable local recovery [1] and somehow avoid writing remote checkpoints. In that case, checkpoint data would be written to the local disk but not to a remote filesystem. In case of a recovery, all tasks are res

Re: Flink job hangs using rocksDb as backend

2018-07-23 Thread shishal singh
Thanks Stefan, You are correct , I learned the hard way that when timers fires it stops processing new events till the time all timers callback completes. This is the points when I decided to isolate the problem by scheduling only 5-6K timers in total so that even if its taking time in timers it s

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Stefan, I did have first point at the back of my mind. I was under the impression though for checkpoints, cleanup would be done by TMs as they are being taken by TMs. So for a standalone cluster with its own zookeeper for JM high availability, a NAS is a must have? We were going to go with local

Re: FlinkCEP and scientific papers ?

2018-07-23 Thread vino yang
Hi Esa, I think the core implementation pattern is still based that paper, there is a package named "nfa"[1] contains the main thought. The latest CEP module added more features and enhanced the old versio. What's more, there is a FLIP-20 which has been accepted, it described how to integrate wit

Re: Kinesis Producer in 1.4.2: testing locally with Kinesalite not working

2018-07-23 Thread vino yang
Hi Philipp, It sounds this is a bug, can you create a JIRA issue for this problem? Thanks, vino. 2018-07-23 2:01 GMT+08:00 Philipp Bussche : > Hi there, > > when trying to use a KinesisProducer which has both aws.region and > aws.endpoint set I am receiving the following error message: > > 07/2

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Sorry, Just a follow-up. In absence of NAS then the best option to go with here is checkpoint and savepoints both on HDFS and StateBackend using local SSDs then? We were trying to not even hit HDFS other than for savepoints. - Ashish On Monday, July 23, 2018, 7:45 AM, ashish pok wrote: Stefan

Trying to implement UpsertStreamTableSink in Java

2018-07-23 Thread Porritt, James
I put this class together when trying to create my own upsertable table sink in Java: public class MyTableSink implements UpsertStreamTableSink { @Override public TableSink> configure(String[] fieldNames, TypeInformation[] fieldTypes) { return null; } @Override publi

Re: Flink job hangs using rocksDb as backend

2018-07-23 Thread Stefan Richter
Hi, yes, timers cannot easily fire in parallel to event processing for correctness reasons because they both manipulate the state and there should be a distinct order of operations. If it is literally stuck, then it is obviously a problem. From the stack trace it looks pretty clear that the cul

Modify EventTimeTrigger only for event-time session windows

2018-07-23 Thread Dongwon Kim
Hi all, I want to be sure about when EventTimeTrigger.onEventTime() method is called with event-time session windows. It returns TriggerResult.FIRE only when the timestamp of the registered timer equals to the max timestamp of the current window: @Override public TriggerResult onEventTime(long ti

Re: Modify EventTimeTrigger only for event-time session windows

2018-07-23 Thread Aljoscha Krettek
Hi, If you set an allowed lateness that is greater than zero you will get another call to onEventTime() on window.maxTimestamp + allowedLateness. Does that help answer your question? Best, Aljoscha > On 23. Jul 2018, at 15:40, Dongwon Kim wrote: > > Hi all, > > I want to be sure about when

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-23 Thread Gerard Garcia
We have just started experiencing a different problem that could be related, maybe it helps to diagnose the issue. In the last 24h the jobmanager lost connection to Zookeeper a couple of times. Each time, a new jobmanager (in a different node) was elected leader correctly but the taskamangers kept

Re: Modify EventTimeTrigger only for event-time session windows

2018-07-23 Thread 김동원
Hi Aljoscha, If that is the only case, I need to return TriggerResult.CONTINUE when time > window.maxTimestamp. It is very fortunate that onEventTime is not called when time < window.maxTimestamp except my timer. Thanks a lot for your quick reply. Best, - Dongwon 2018. 7. 23. 오후 10:58, Aljo

Re: Behaviour of triggers in Flink

2018-07-23 Thread Harshvardhan Agrawal
Thanks for the response Hequn. I also see a weird behavior with purging trigger. It skips messages. Here is the repro: public class WindowTest { public static void main (String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironmen

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread Stefan Richter
Hi, ok, let me briefly explain the differences between local working director, checkpoint directory, and savepoint directory and also outline their best practises/requirements/tradeoffs. First easy comment is that typically checkpoints and savepoints have similar requirements and most users wri

Re: ProcessFunction example from the documentation giving me error

2018-07-23 Thread anna stax
It is good now. Sorry, my fault. I had multiple applications running and both were using the socket stream . Thanks. On Sun, Jul 22, 2018 at 8:22 PM, vino yang wrote: > Hi anna, > > From the stack trace you provided, it's socket connect error not about > Flink. > > So, Have you start a socket se

Re: Trying to implement UpsertStreamTableSink in Java

2018-07-23 Thread Timo Walther
Hi James, the method `Table.writeToSink()` calls `configure(String[] fieldNames, TypeInformation[] fieldTypes)` internally. Since you return null, you are trying to register null instead of a table sink. I hope this helps. Regards, Timo Am 23.07.18 um 14:33 schrieb Porritt, James: I put

Re: Modify EventTimeTrigger only for event-time session windows

2018-07-23 Thread Aljoscha Krettek
Out of curiosity, why don't you want to keep it like this? @Override public TriggerResult onEventTime(long time, TimeWindow window, TriggerContext ctx) { return time == window.maxTimestamp() ? TriggerResult.FIRE : Tri

Re: org.apache.flink.streaming.runtime.streamrecord.LatencyMarker cannot be cast to org.apache.flink.streaming.runtime.streamrecord.StreamRecord

2018-07-23 Thread Gregory Fee
Hi Aljoscha! I am not using any async i/o. I do use a trick similar to ContinuousFileReaderOperator where I use another thread to write

Re: org.apache.flink.streaming.runtime.streamrecord.LatencyMarker cannot be cast to org.apache.flink.streaming.runtime.streamrecord.StreamRecord

2018-07-23 Thread Aljoscha Krettek
Ha! Would you be able to share the code for that? If you don't acquire the "checkpoint lock" before writing this would explain the exception. > On 23. Jul 2018, at 17:37, Gregory Fee wrote: > > Hi Aljoscha! I am not using any async i/o. I do use a trick similar to > ContinuousFileReaderOperato

Re: org.apache.flink.streaming.runtime.streamrecord.LatencyMarker cannot be cast to org.apache.flink.streaming.runtime.streamrecord.StreamRecord

2018-07-23 Thread Gregory Fee
Ah yes! This started happening after I moved the code to do the write outside the lock. The reason I did that is because I've run into some situations where checkpoints seem to stall indefinitely without progress. My suspicion was that there is a deadlock condition but putting more thought into it

Question regarding State in full outer join

2018-07-23 Thread Darshan Singh
Hi I was looking at the new full outer join. This seems to be working fine for my use case however I have a question regarding the state size. I have 2 streams each will have 100's of million unique keys. Also, Each of these will get the updated value of keys 100's of times per day. As per my un

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Stefan,  Can’t thank you enough for this write-up. This is awesome explanation. I had misunderstood concepts of RocksDB working directory and Checkpoint FS. My main intent is to boost performance of RocksDB with SSD available locally. Recovery time from HDFS is not much of a concern but load on

Re: Modify EventTimeTrigger only for event-time session windows

2018-07-23 Thread 김동원
Sorry for the late response due to time difference between Berlin and Seoul :-) What you are asking is why I'm trying to modify the condition of the original ternary operator used in EventTimeTrigger.onEventTime? If that is your question, I've already answered it on the first in the thread as f

Flink 1.5 batch job fails to start

2018-07-23 Thread Alex Vinnik
Trying to migrate existing job that runs fine on Flink 1.3.1 to Flink 1.5 and getting a weird exception. Job reads json from s3a and writes parquet files to s3a with avro model. Job is uber jar file built with hadoop-aws-2.8.0 in order to have access to S3AFileSystem class. Fails here https://git

NoClassDefFoundError when running Twitter Example

2018-07-23 Thread syed
I am facing the *java.lang.NoClassDefFoundError: com/twitter/hbc/httpclient/auth/Authentication* error when running the tweeter example. The example works well with the sample data, but I am unable to run it with real tweet data. Please guide me how to fix this issue. I am running Flink 1.3.2. Tha

Implement Joins with Lookup Data

2018-07-23 Thread Harshvardhan Agrawal
Hi, We are using Flink for financial data enrichment and aggregations. We have Positions data that we are currently receiving from Kafka. We want to enrich that data with reference data like Product and Account information that is present in a relational database. From my understanding of Flink so

Re: Implement Joins with Lookup Data

2018-07-23 Thread Jörn Franke
For the first one (lookup of single entries) you could use a NoSQL db (eg key value store) - a relational database will not scale. Depending on when you need to do the enrichment you could also first store the data and enrich it later as part of a batch process. > On 24. Jul 2018, at 05:25, Ha

S3 file source - continuous monitoring - many files missed

2018-07-23 Thread Averell
Good day everyone, I have a Flink job that has an S3 folder as a source, and we keep putting thousands of small (around 1KB each) gzip files into that folder, with the rate of about 5000 files per minute. Here is how I created that source in Scala: / val my_input_format = new TextInputFormat(n

Re: Question regarding State in full outer join

2018-07-23 Thread vino yang
Hi Darshan, In your use case, I think you can implement the outer join with DataStream API ( use State + ProcessFunction + Timer ). Using suitable statue, you can store 1 value per key and do not need to keep all the value's history for every key. And you can refer to Flink's implementation of Da

Re: S3 file source - continuous monitoring - many files missed

2018-07-23 Thread Jörn Franke
It could be related to S3 that seems to be configured for eventual consistency. Maybe it helps to configure strong consistency. However, I recommend to replace S3 with a NoSQL database (since you are amazon Dynamo would help + Dynamodb streams, alternatively sns or sqs). The small size and high