Custom Sink Checkpointing errors

2017-10-19 Thread vipul singh
Hello all, I am working on a custom sink implementation, but having weird issues with checkpointing. I am using a custom ListState to checkpoint, and it looks like this: private var checkpointMessages: ListState[Bucket] =_ My snapshot function looks like: @throws[IOException] def snapshotStat

flink can't read hdfs namenode logical url

2017-10-19 Thread 邓俊华
hi, I start yarn-ssession.sh on yarn, but it can't read hdfs logical url. It always connect to hdfs://master:8020, it should be 9000, my hdfs defaultfs is hdfs://master.I have config the YARN_CONF_DIR and HADOOP_CONF_DIR, it didn't work.Is it a bug? i use flink-1.3.0-bin-hadoop27-scala_2.10 2017

[no subject]

2017-10-19 Thread Navneeth Krishnan
Hello All, I have an in-memory cache created inside a user function and I need to assign the max capacity for it. Since the program can be run on any hardware, I'm thinking if I cloud assign based on flink's allocated managed memory. Is there a way to get the flink managed memory size inside a us

Testing GlobalWindows

2017-10-19 Thread Philip Doctor
I have a GlobalWindow with a custom trigger (I leave windows open for a variable length of time depending on how much data I have vs the expected amount, so I’m manipulating triggerContext.registerProcessingTimeTimer()). When I emit data into my data stream, the flink execution environment appea

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
Thanks, Piotr but my app code is self-contained in a fat-jar with maven-shade, so why would the class path affect this? by shade commons-compress do you mean : it doesn't have effect either as a last resort i may try to rebuild Flink to use 1.14, but don't want to go there yet =/ Best regard

Re: problem scale Flink job on YARN

2017-10-19 Thread Lei Chen
Hi Aljoscha, I'm using version 1.3.0 and changing job-wide parallelism. Lei On Thu, Oct 19, 2017 at 9:47 AM, Aljoscha Krettek wrote: > Hi Lei, > > Which version of Flink would that be? I'm guessing >= 1.3.x? In Flink 1.1 > the hash of an operator was tied to the parallelism but starting with 1

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
I’m not 100% sure, so treat my answer with a grain of salt. I think when you start the cluster this way, dependencies (some? all?) are being loaded to the class path before loading user’s application. At that point, it doesn’t matter whether you have excluded commons-compress 1.4.1 in yours app

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
With aitozi we have a hat trick oO > On 19 Oct 2017, at 17:08, Tzu-Li (Gordon) Tai wrote: > > Ah, sorry for the duplicate answer, I didn’t see Piotr’s reply. Slight delay > on the mail client. > > > On 19 October 2017 at 11:05:01 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org >

Re: problem scale Flink job on YARN

2017-10-19 Thread Aljoscha Krettek
Hi Lei, Which version of Flink would that be? I'm guessing >= 1.3.x? In Flink 1.1 the hash of an operator was tied to the parallelism but starting with 1.2 that shouldn't happen anymore. Also, are you changing the parallelism job-wide or are there operators with differing parallelism? For exam

Re: Watermark on connected stream

2017-10-19 Thread aitozi
Hi, You can see the field in AbstractStreamOperator // We keep track of watermarks from both inputs, the combined input is the minimum // Once the minimum advances we emit a new watermark for downstream operators private long combinedWatermark = Long.MIN_VALUE; it will chose the Min watermark

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
flink is started with bin/start-local.sh there is no classpath variable in the environment; flink/lib/flink-dist_2.11-1.3.2.jar contains commons-compress, still it should be overridden by the dependencyManagement directive here is the stacktrace: The program finished with the following excepti

Re: Flink Streaming example: Kafka010Example.scala doesn't work

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Michal, I can’t seem to access the link you provided for the logs. As for confirming whether or not some data was read / written, how exactly did you test that? In the procedure you laid out, it seems like you only performed some consumer group offset checks using the Kafka CLI. AFAIK, since

Re: Flink CEP State Change Pattern

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Philip! I’m looping in Kostas to this thread. He might be able to provide some insights for your question. Cheers, Gordon On 14 October 2017 at 8:54:45 PM, Philip Limbeck (philiplimb...@gmail.com) wrote: Hi! I am quite new to Flink CEP and try to define a state change pattern with it.

Re: Watermark on connected stream

2017-10-19 Thread Tzu-Li (Gordon) Tai
Ah, sorry for the duplicate answer, I didn’t see Piotr’s reply. Slight delay on the mail client. On 19 October 2017 at 11:05:01 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote: Hi Kien, The watermark of an operator with multiple inputs will be determined by the current minimum watermark

Re: Accumulator with Elasticsearch Sink

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Sendoh, That sounds like a reasonable metric to add directly to the Elasticsearch connector. Could you perhaps write a comment on that in  https://issues.apache.org/jira/browse/FLINK-7697? Cheers, Gordon On 19 October 2017 at 8:57:23 PM, Sendoh (unicorn.bana...@gmail.com) wrote: Hi Flink us

Re: Watermark on connected stream

2017-10-19 Thread Tzu-Li (Gordon) Tai
Hi Kien, The watermark of an operator with multiple inputs will be determined by the current minimum watermark across all inputs. Cheers, Gordon On 19 October 2017 at 8:06:11 PM, Kien Truong (duckientru...@gmail.com) wrote: Hi,  If I connect two stream with different watermark, how are the w

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
Hi, As you can see in org.apache.flink.streaming.api.operators.AbstractStreamOperator#processWatermark1 it takes a minimum of both of the inputs. Piotrek > On 19 Oct 2017, at 14:06, Kien Truong wrote: > > Hi, > > If I connect two stream with different watermark, how are the watermark of >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
Hi, What is the full stack trace of the error? Are you sure that there is no commons-compresss somewhere in the classpath (like in the lib directory)? How are you running your Flink cluster? Piotrek > On 19 Oct 2017, at 13:34, r. r. wrote: > > Hello > I have a job that runs an Apache Tika pip

Execution Failed (cluster setup Flink+Hadoop), Task Manager was lost/killed

2017-10-19 Thread Oleksandra Levchenko
Hi, I am running Flink batch job on Standalone Cluster (16 nodes), on top of Hadoop. The chain looks like: DataSet1 = env.readTextFile (csv on hdfs) .map .flatMap .groupBy .reduce .map .writeAsCsv (DataSet 1) DataSet2 = env.readTextFile .map .flatMap env.readCsvFile (DataSet1) DataSet1.flat

Accumulator with Elasticsearch Sink

2017-10-19 Thread Sendoh
Hi Flink users, Did someone use accumulator with Elasticsearch Sink? So we can better compare the last timestamps in the sink and the last timestamps in Elasticsearch, in order to see how long does it take from the Elasticsearch sink to Elasticsearch. Best, Sendoh -- Sent from: http://apache-

Watermark on connected stream

2017-10-19 Thread Kien Truong
Hi, If I connect two stream with different watermark, how are the watermark of the resulting stream determined ? Best regards, Kien

java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
Hello I have a job that runs an Apache Tika pipeline and it fails with "Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect(Ljava/io/InputStream;)Ljava/lang/String;" Flink includes commons-compress 1.4.1, while Tika needs 1.14. I also have A

Re: FlinkCEP: pattern application on a KeyedStream

2017-10-19 Thread Kostas Kloudas
Hi Federico, If I understand your question correctly, then yes, the application of a Pattern on a keyed stream is similar to the application of a map function. It will search for the pattern on each per-key stream of data. So there will be state (buffer with partial matches, queued elements, et

FlinkCEP: pattern application on a KeyedStream

2017-10-19 Thread Federico D'Ambrosio
Hi all, I was wondering if it is correct to assume the application of a pattern on a KeyedStream similar to the application, e.g., of a MapFunction when it comes to state. For example, the following val pattern = ... val keyedStream = stream.keyBy("id") val patternKeyedStream = CEP.pattern(patt

Re: Set heap size

2017-10-19 Thread Piotr Nowojski
Hi, Just log into the machine and check it’s memory consumption using htop or a similar tool under the load. Remember about subtracting Flink’s memory usage and and file system cache. Piotrek > On 19 Oct 2017, at 10:15, AndreaKinn wrote: > > About task manager heap size Flink doc says: > >

Re: SLF4j logging system gets clobbered?

2017-10-19 Thread Piotr Nowojski
Hi, What versions of Flink/logback are you using? Have you read this: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html#use-logback-when-running-flink-out-of-the-ide--from-a-java-application

Set heap size

2017-10-19 Thread AndreaKinn
About task manager heap size Flink doc says: ... If the cluster is exclusively running Flink, the total amount of available memory per machine minus some memory for the operating system (maybe 1-2 GB) is a good value But my nodes have 2GB of ram each. There isn't an empirical count to set ram

Re: Flink REST API async?

2017-10-19 Thread Francisco Gonzalez Barea
Hello, Going back on this thread, quick question: Will this be supported in next Flink version? If not, when is it expected to be included? Regards On 8 Aug 2017, at 15:46, Aljoscha Krettek mailto:aljos...@apache.org>> wrote: I quickly talked to Till about this. The new JobManager, once FLIP