Re: API for delayed/scheduled interval input source for integration tests

2018-08-31 Thread Yee-Ning Cheng
I was able to use the AbstractStreamOperatorTestHarness to write more of a unit test for windowing operators. However, I'm still trying to figure out a way to have a "delayed iterator". I tried implementing an iterator that Thread.sleeps for the interval and passed it to the stream as an input, b

multiple input streams

2018-08-31 Thread Eric L Goodman
If I have a standalone cluster running flink, what is the best way to ingest multiple streams of the same type of data? For example, if I open a socket text stream, does the socket only get opened on the master node and then the stream is partitioned out to the worker nodes? DataStream text = env

Does Flink plan to support JDK 9 recently?

2018-08-31 Thread 陈梓立
Hi, Recently I see a PR mentions "for jdk9 compatibility", and I wonder if Flink considered to support JDK9 recently? If so, what is the plan? Best, tison.

Re: Low Performance in High Cardinality Big Window Application

2018-08-31 Thread Ning Shi
Hi Konstantin, > could you replace the Kafka Source by a custom SourceFunction-implementation, > which just produces the new events in a loop as fast as possible. This way we > can rule out that the ingestion is responsible for the performance jump or > the limit at 5000 events/s and can benchm

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
Thanks TIll, I do not see any Akka related messages in that Taskmanager after the initial startup. It seemed like all is well. So after the remotewatcher detects it unreachable and the TaskManager unregisters it, I do not see any other activity in the JobManager with regards to reallocation etc. -

Re: akka.ask.timeout setting not honored

2018-08-31 Thread Gary Yao
Hi Greg, Unfortunately the environment information [1] is not logged. Can you set the log level for all Flink packages to DEBUG? Do you install Flink yourself on EMR, or do you use the pre-installed one? Can you show us the command with which you start the cluster/submit the job? I do not know i

API for delayed/scheduled interval input source for integration tests

2018-08-31 Thread Yee-Ning Cheng
Hi, I've been trying to write an integration test for my Flink application that has managed state with TTL expiration. However, I can't seem to find a good way to create a stream of elements that waits X amount of time before each element is sent in. I'm using the simple API: val stream = env.f

Re: akka.ask.timeout setting not honored

2018-08-31 Thread Greg Finch
Well ... that didn't take long. The next time I tried, I got the Akka timeout again. Attached are the logs from the last attempt. They're very similar to the other logs I sent. On Fri, Aug 31, 2018 at 2:04 PM Greg Finch wrote: > Thanks Gary. Attached is the jobmanager log. You are correct t

Re: akka.ask.timeout setting not honored

2018-08-31 Thread Greg Finch
Thanks Gary. Attached is the jobmanager log. You are correct that this is running on YARN. I changed web.timeout as you suggested - that seems to be working the few times I tested it. This problem comes and goes though - sometimes it starts before it times out. I'll keep the web.timeout settin

Re: akka.ask.timeout setting not honored

2018-08-31 Thread Gary Yao
Hi Greg, Can you describe the steps to reproduce the problem, or can you attach the full jobmanager logs? Because JobExecutionResultHandler appears in your log, I assume that you are starting a job cluster on YARN. Without seeing the complete logs, I cannot be sure what exactly happens. For now, y

akka.ask.timeout setting not honored

2018-08-31 Thread Greg Finch
I'm having a problem with akka timeout when starting my cluster. The error is "Ask timed out after 1 ms.". I have changed the akka.ask.timeout config setting to be 30 ms, but it still times out and fails after 10 seconds. I confirmed that the config is properly set by both checking the J

Get stream of rejected data from Elasticsearch6 sink

2018-08-31 Thread Nick Triller
Hi all, is it possible to further process data that could not be persisted by the Elasticsearch6 sink without breaking checkpointing? As I understand, the onFailure callback can't be used to forward rejected data into a separate stream. I would like to extend the sink if this use case is not co

Re: test windows

2018-08-31 Thread David Anderson
The flink training exercises have a simpler example of using a TwoInputStreamOperatorTestHarness from outside of the Flink code base that you can refer to. The two input test harness is more or less the same as the one input test harness. https://github.com/dataArtisans/flink-training-exercises/bl

Re: Usage of "onTime" in ProcessFunction

2018-08-31 Thread Boris Lublinsky
Thanks Andrey I do not have event time, dealing only with process time. My process gets 2 types of messages: 1. Start processing, which starts the timer, creates a GUID and outputs event to another stream for the actual processing. Lets say at time 45s and I want to make sure that my result will

Re: Using Managed Keyed State

2018-08-31 Thread Andrey Zagrebin
Hi, If you ask about keyed state, you probably mean ListState, because in any case List is just java object for a concrete value of state. ListState is also scoped by current record key as ValueState but adds some list specific functionality. They are state object handles. Keyed state is also d

Re: Usage of "onTime" in ProcessFunction

2018-08-31 Thread Andrey Zagrebin
Hi, the timers are scoped to the current key when you apply a processing function to a KeyedStream. If you register more than one timer for a particular key and timestamp, you will get only one onTimer callback, see also in docs [1]. Timers registered in a processing function will trigger only

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread vino yang
Hi Laura: Perhaps this is possible because the path to the completed checkpoint on HDFS does not have a hierarchical relationship to identify which job it belongs to, it is just a fixed prefix plus a random string generated name. My personal advice: 1) Verify it with a clean cluster (clean up the

Usage of "onTime" in ProcessFunction

2018-08-31 Thread Boris Lublinsky
I am effectively trying to simulate processing windows - drop the results that are not complete in time and was trying to use onTimer method in my Processor implementation. I am not sure that I understand exactly how this works. When I start execution (in a different processor) I am executing ct

Using Managed Keyed State

2018-08-31 Thread Boris Lublinsky
Documentation https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/state.html#using-managed-keyed-state lists ValueState and List, but their semantics seem

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Till Rohrmann
Could you check whether akka.tcp:// fl...@blahabc.sfdc.net:123/user/taskmanager is still running? E.g. tries to reconnect to the JobManager? If this is the case, then the container is still running and the YarnFlinkResourceManager thinks that everything is alright. You can activate that a TaskManag

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Subramanya Suresh
Hi Till, Greatly appreciate your reply. We use version 1.4.2. I do not see nothing unusual in the logs for TM that was lost. Note: I have looked at many such failures and see the same below pattern. The JM logs above had most of what I had, but the below is what I have when I search for flink.yarn

Re: Table API: get Schema from TableSchema, TypeInformation[] or Avro

2018-08-31 Thread françois lacombe
Ok, looking forward reading your document. All the best François 2018-08-31 9:09 GMT+02:00 Timo Walther : > Thanks for your response. I think we won't need this utility in the near > future. As mentioned, I'm working on a design document that allows for > better abstraction. I think I will pub

Re: Configuring Ports for Job/Task Manager Metrics

2018-08-31 Thread Till Rohrmann
Hi Deirdre, Flink does not support to control where Yarn containers are placed. This is the responsibility of Yarn as the cluster manager. In Yarn 3.1.0 it is possible to specify placement constraints for containers but also this won't fully solve your problem. Imagine that you have a single Yarn

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread Laura Uzcátegui
Hi Stephan and Vino, Thanks for the quick reply and hints. The configuration for the checkpoints that should remain is set to 1. Since this is a unbounded job run and I can't see it finishing, I suspect as we tear down the cluster every time we finish with the integration test being run, the com

Re: Configuring Ports for Job/Task Manager Metrics

2018-08-31 Thread Deirdre Kong
Hi Vino, Yeah, I mean node. Thanks, Deirdre On Fri, Aug 31, 2018 at 12:13 AM vino yang wrote: > Hi Deirdre, > > Usually, we don't recommend JM and TM in a container. @Chesnay, right? I > want to confirm, is your container here meaning node? > > Thanks, vino. > > Deirdre Kong 于2018年8月31日周五 下午3

Re: CompletedCheckpoints are getting Stale ( Flink 1.4.2 )

2018-08-31 Thread Stephan Ewen
Hi Laura! Vino had good pointers. There really should be no case in which this is not cleaned up. Is this a bounded job that ends? Is it always the last of the bounded job's checkpoints that remains? Best, Stephan On Fri, Aug 31, 2018 at 5:02 AM, vino yang wrote: > Hi Laura, > > First of all

Re: Configuring Ports for Job/Task Manager Metrics

2018-08-31 Thread vino yang
Hi Deirdre, Usually, we don't recommend JM and TM in a container. @Chesnay, right? I want to confirm, is your container here meaning node? Thanks, vino. Deirdre Kong 于2018年8月31日周五 下午3:03写道: > Or is there a way to specify in the command line to have the jm and tm run > in different containers o

Re: ElasticSearch 6 - error with UpdateRequest

2018-08-31 Thread Timo Walther
The problem is that BulkProcessorIndexer is located in flink-connector-elasticsearch-base which is compiled against a very old ES version. This old version is source code compatible but apparently not binary compatible with newer Elasticsearch classes. By copying this class you force to compile

Re: Table API: get Schema from TableSchema, TypeInformation[] or Avro

2018-08-31 Thread Timo Walther
Thanks for your response. I think we won't need this utility in the near future. As mentioned, I'm working on a design document that allows for better abstraction. I think I will publish it next week. Regards, Timo Am 31.08.18 um 08:36 schrieb françois lacombe: Hi Timo Yes it helps, thank y

Re: ElasticSearch 6 - error with UpdateRequest

2018-08-31 Thread Averell
Hi Timo, Thanks for your help. I don't get that error anymore after putting that file into my project. However, I don't understand how it could help. I have been using the Flink binary built on my same laptop, then how could it be different between having that java class in Flink project vs in my

Re: Configuring Ports for Job/Task Manager Metrics

2018-08-31 Thread Deirdre Kong
Or is there a way to specify in the command line to have the jm and tm run in different containers on YARN? On Thu, Aug 30, 2018 at 11:51 PM Deirdre Kong wrote: > @Chesnay, can you elaborate on how to map specific ports to a specific > JM/TM process? > > @Vino, I can only update Prometheus confi