Re: Jira issue Flink-11127

2019-02-21 Thread Konstantin Knauf
Hi Boris, the exact command depends on the docker-entrypoint.sh script and the image you are using. For the example contained in the Flink repository it is "task-manager", I think. The important thing is to pass "taskmanager.host" to the Taskmanager process. You can verify by checking the Taskmana

[flink :: connected-streams :: integration-tests]

2019-02-21 Thread Rinat
Hi mates, I got some troubles with the implementation of integration tests for the job, based on connected streams. It has the following logic: I got two streams, first one is a stream of rules, and another one is a stream of events to apply events on each rule, I’ve implemented a KeyedBroadcast

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
Hi Stephen, Watermark for a single operator is the minimum of Watermarks received from all inputs, therefore if one of your shards/operators does not have incoming data it will not produce Watermarks thus the Watermark of WindowOperator will not progress. So this is sort of an expected behavior.

Re: StreamingFileSink causing AmazonS3Exception

2019-02-21 Thread Padarn Wilson
Thanks Kostas! On Mon, Feb 18, 2019 at 5:10 PM Kostas Kloudas wrote: > Hi Padarn, > > This is the jira issue: https://issues.apache.org/jira/browse/FLINK-11187 > and the fix, as you can see, was first included in version 1.7.2. > > Cheers, > Kostas > > On Mon, Feb 18, 2019 at 3:49 AM Padarn Wil

Re: Stream enrichment with static data, side inputs for DataStream

2019-02-21 Thread Averell
Hi Artur, Is that possible to make that "static" stream a keyedStream basing on that foreign key? If yes, then just connect the two streams, keyed on that foreign key. In the CoProcessFunction, for every single record from the static stream, you write its content into a ValueState; and for every r

Re: Broadcast state before events stream consumption

2019-02-21 Thread Averell
Hi Konstantin, The statement below is mentioned at the end of the page broadcast_state.html#important-considerations /"No RocksDB state backend: Broadcast state is kept in-me

Re: Broadcast state before events stream consumption

2019-02-21 Thread Dawid Wysakowicz
Hi Averell, BroadcastState is a special case of OperatorState. Operator state is always kept in-memory at runtime (must fit into memory), no matter what state backend you use. Nevertheless it is snapshotted and thus fault tolerant. Best, Dawid On 21/02/2019 11:50, Averell wrote: > Hi Konstantin

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
Yes, it was the "watermarks for event time when no events for that shard" problem. I am now investigating whether we can use a blended watermark of max(lastEventTimestamp - 1min, System.currentTimeMillis() - 5min) to ensure idle shards do not cause excessive data retention. Is that the best solut

Can I make an Elasticsearch Sink effectively exactly once?

2019-02-21 Thread Stephen Connolly
>From how I understand it: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html#elasticsearch-sinks-and-fault-tolerance the Flink Elasticsearch Sink guarantees at-least-once delivery of action > requests to Elasticsearch clusters. It does so by waiting for

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
It is definitely a solution ;) You should be aware of the downsides though: * you might get different results in case of reprocessing * you might drop some data as late, due to some delays in processing, if the events arrive later then the "ProcessingTime" threshold Best, Dawid On 21/0

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 13:36, Dawid Wysakowicz wrote: > It is definitely a solution ;) > > You should be aware of the downsides though: > >- you might get different results in case of reprocessing >- you might drop some data as late, due to some delays in processing, >if the events ar

Re: Reduce one event under multiple keys

2019-02-21 Thread Stephen Connolly
Thanks! On Mon, 18 Feb 2019 at 12:36, Fabian Hueske wrote: > Hi Stephen, > > Sorry for the late response. > If you don't need to match open and close events, your approach of using a > flatMap to fan-out for the hierarchical folder structure and a window > operator (or two for open and close) fo

Re: Flink Standalone cluster - production settings

2019-02-21 Thread Hung
/ Each job has 3 asynch operators with Executors with thread counts of 20,20,100/ Flink handles parallelisms for you. If you want a higher parallelism of a operator, you can call setParallelism() for example, flatMap(new Mapper1()).setParallelism(20) flatMap(new Mapper2()).setParallelism(20) fla

Docker using flink socketwordcount example

2019-02-21 Thread Samet Yılmaz
I had a question about a topic. but I could not find a solution. Could you help? My question: https://stackoverflow.com/questions/54806830/docker-using-flink-socketwordcount-example-apache-flink -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Dawid Wysakowicz
If an event arrived at WindowOperator before the Watermark, then it will be accounted for window aggregation and put in state. Once that state gets checkpointed this same event won't be processed again. In other words if a checkpoint succeeds elements that produced corresponding state won't be proc

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 14:00, Dawid Wysakowicz wrote: > If an event arrived at WindowOperator before the Watermark, then it will > be accounted for window aggregation and put in state. Once that state gets > checkpointed this same event won't be processed again. In other words if a > checkpoint s

Re: Submitting job to Flink on yarn timesout on flip-6 1.5.x

2019-02-21 Thread Gary Yao
Hi, Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I am currently working on removing references to the legacy mode in the documentation [3]. Is there any reason, you cannot use the "new mode"? Best, Gary [1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html [

Re: Starting Flink cluster and running a job

2019-02-21 Thread Boris Lublinsky
The relevant dependencies are val flinkScala= "org.apache.flink" %% "flink-scala"% flinkVersion % "provided" val flinkStreamingScala = "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided" val fl

Re: Can I make an Elasticsearch Sink effectively exactly once?

2019-02-21 Thread Dawid Wysakowicz
Hi, The procedure you described will not give you exactly once semantics. What the cited excerpt means is that a checkpoint will not be considered finished until pending requests are acknowledged. It does not mean that those requests are stored on the flink side. That said if an error occurs befor

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Stephan Ewen
Hi Rong Rong! I would add the security / kerberos threads to the roadmap. They seem to be advanced enough in the discussions so that there is clarity what will come. For the window operator with slicing, I would personally like to see the discussion advance and have some more clarity and consensu

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Stephan Ewen
Hi Shaoxuan! I think adding the web UI improvements makes sense - there is not much open to discuss there. Will do that. For the machine learning improvements - that is a pretty big piece and I think the discussions are still ongoing. I would prefer this to advance a bit before adding it to the r

Re: FLIP-16, FLIP-15 Status Updates?

2019-02-21 Thread Stephan Ewen
Hi John! I know some committers are working on iterations, but on a bigger update. That might subsume the FLIPs 15 and 16 eventually. I believe they will share some part of that soon (in a few weeks). Best, Stephan On Tue, Feb 19, 2019 at 5:45 PM John Tipper wrote: > Hi Timo, > > That’s great

Why don't Tuple types implement Comparable?

2019-02-21 Thread Frank Grimes
Hi, I've recently started to evaluate Flink and have found it odd that its Tuple types, while Serializable, don't implement java.lang.Comparable.This means that I either need to provide an KeySelector for many operations or subtype the Tuple types and provide my own implementation of compareTo f

Is there a Flink DataSet equivalent to Spark's RDD.persist?

2019-02-21 Thread Frank Grimes
Hi, I'm trying to port an existing Spark job to Flink and have gotten stuck on the same issue brought up here: https://stackoverflow.com/questions/46243181/cache-and-persist-datasets Is there some way to accomplish this same thing in Flink?i.e. avoid re-computing a particular DataSet when multipl

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Rong Rong
Hi Stephan, Yes. I completely agree. Jincheng & Jark gave some very valuable feedbacks and suggestions and I think we can definitely move the conversation forward to reach a more concrete doc first before we put in to the roadmap. Thanks for reviewing it and driving the roadmap effort! -- Rong O

Re: FLIP-16, FLIP-15 Status Updates?

2019-02-21 Thread Paris Carbone
I created these FLIPs a while back, sorry for being late to this discussion but I can try to elaborate. The idea from FLIP-16 is proven to be correct [1] (see chapter 3) and I think it is the only way to go but I have been in favour of providing channel implementations with checkpoint behaviour

Re: Jira issue Flink-11127

2019-02-21 Thread Boris Lublinsky
Konstantin, it still does not quite work The IP is still in place, but… Here is Job manager log metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9249 Starting Job Manager config file: jobmanager.rest.address:

Re: Metrics for number of "open windows"?

2019-02-21 Thread Rong Rong
Hi Andrew, I am assuming you are actually using customized windowAssigner, trigger and process function. I think the best way for you to keep in-flight, not-yet-triggered windows is to emit metrics in these 3 pieces. Upon looking at the window operator, I don't think there's a a metrics (guage) t

Re: Jira issue Flink-11127

2019-02-21 Thread Boris Lublinsky
Boris Lublinsky FDP Architect boris.lublin...@lightbend.com https://www.lightbend.com/ > On Feb 21, 2019, at 2:05 AM, Konstantin Knauf > wrote: > > Hi Boris, > > the exact command depends on the docker-entrypoint.sh script and the image > you are using. For the example contained in the Flin

Re: Jira issue Flink-11127

2019-02-21 Thread Boris Lublinsky
Adding metric-query port makes it a bit better, but there is still an error 019-02-22 00:03:56,173 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@maudlin-ibis-fdp-flink-jobmanager:6123/user/resourcemanager, retry

SinkFunction.Context

2019-02-21 Thread Durga Durga
HI Folks, Was following the documentation for https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/functions/sink/SinkFunction.Context.html long currentProcessingTime

Re: SinkFunction.Context

2019-02-21 Thread Rong Rong
Hi Durga, 1. currentProcessingTime: refers to this operator(SinkFunction)'s system time at the moment of invoke 1a. the time you are referring to as "flink window got the message" is the currentProcessingTime() invoked at the window operator (which provided by the WindowContext similar to this one

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Hequn Cheng
Hi Stephan, Thanks for summarizing the great roadmap! It is very helpful for users and developers to track the direction of Flink. +1 for putting the roadmap on the website and update it per release. Besides, would be great if the roadmap can add the UpsertSource feature(maybe put it under 'Batch