DynamoStreams Consumer millisBehindLatest metric

2019-10-28 Thread Vinay Patil
Hi, I am currently using FlinkDynamoStreamsConsumer in Production, for monitoring the lag I am relying on millisBehindLatest metric but this always returns -1 even if the dynamo stream contains million records upfront. Also, it would be great if we can add a documentation mentioning that Flink su

Re: Issue with writeAsText() to S3 bucket

2019-10-28 Thread Nguyen, Michael
Hi Fabian, Thank you for the response. So I am currently using .writeAsText() to print out 9 different datastreams in one Flink job as I am printing my original datastream with various filters applied to it. I usually see around 6-7 of my datastreams successfully list the JSON file in my S3 buc

Complex SQL Queries on Java Streams

2019-10-28 Thread Mohammed Tabreaz
Recently we moved from Oracle to Cassandra. In Oracle we were using advance analytical functions such as lag, lead and Macth_Recognize heavily. I was trying to identify equivalent functionality in java, and came across Apache Flink, however I'm not sure if I should use that library in stand-alone

Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread Nguyen, Michael
Hello everbody, Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could utiliz

Re: Complex SQL Queries on Java Streams

2019-10-28 Thread Jörn Franke
Flink is merely StreamProcessing. I would not use it in a synchronous web call. However, I would not make any complex analytic function available on a synchronous web service. I would deploy a messaging bus (rabbitmq, zeromq etc) and send the request there (if the source is a web app potentially

Re: Complex SQL Queries on Java Streams

2019-10-28 Thread Mohammed Tabreaz
Thanks for the feedback, clear about non blocking interfaces. However, can you clarify or guide me to any other libraries which can be used with java collections for complex analytics. On Mon, Oct 28, 2019, 11:29 Jörn Franke wrote: > Flink is merely StreamProcessing. I would not use it in a syn

Re: Cannot modify parallelism (rescale job) more than once

2019-10-28 Thread vino yang
Hi Pankaj, It seems it is a bug. You can report it by opening a Jira issue. Best, Vino Pankaj Chand 于2019年10月28日周一 上午10:51写道: > Hello, > > I am trying to modify the parallelism of a streaming Flink job (wiki-edits > example) multiple times on a standalone cluster (one local machine) having > t

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread 杨力
It seems to be the case. But when I use timeWindow or CEP with fromCollection, it works well. For example, ``` sEnv.fromCollection(Seq[Long](1, 1002, 2002, 3002)).assignAscendingTimestamps(identity[Long]) .keyBy(_ % 2).timeWindow(Time.seconds(1)).sum(0).print() ``` prints ``` 1 1002 2002 300

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread Dian Fu
Before a program close, it will emit Long.MaxValue as the watermark and that watermark will trigger all the windows. This is the reason why your `timeWindow` program could work. However, for the first program, you have not registered the event time timer(though context.timerService.registerEven

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
Hi Michael, You may need to know `KeyedOneInputStreamOperatorTestHarness` test class. You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`) as

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread Nguyen, Michael
Hi Vino, This is a great example – thank you! It looks like I need to instantiate a StreamExecutionEnvironment to order to get my OneInputStreamOperator. Would I need to setup a local flinkCluster using MiniClusterWithClientResource in order to use StreamExecutionEnvironment? Best, Michael

PreAggregate operator with timeout trigger

2019-10-28 Thread Felipe Gutierrez
Hi all, I have my own stream operator which trigger an aggregation based on the number of items received (OneInputStreamOperator#processElement(StreamRecord)). However, it is possible to not trigger my aggregation if my operator does not receive the max items that have been set. So, I need a timeo

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
Hi Michael, >From the WindowTranslationTest, I did not see anything about the initialization of mini-cluster. Here we are testing operator, it seems operator test harness has provided the necessary infrastructure. You can try to see if there is anything missed. Best, Vino Nguyen, Michael 于2019

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread David Anderson
The reason why the watermark is not advancing is that assignAscendingTimestamps is a periodic watermark generator. This style of watermark generator is called at regular intervals to create watermarks -- by default, this is done every 200 msec. With only a tiny bit of data to process, the job doesn

Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, I am trying to execute Wordcount.jar in Flink 1.8.1 with Hadoop version 2.6.5. HDFS is enabled with Kerberos+SSL. While writing output to HDFS, facing the below exception and job will be failed. Please let me know if any suggestions to debug this issue. Caused by: org.apache.flink.runtime.

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread 杨力
Thank you for your response. Registering a timer at Long.MaxValue works. And I have found the mistake in my original code. When a timer fires and there are elements in the priority queue with timestamp greater than current watermark, they do not get processed. A new timer should be registered for

Re: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread Dian Fu
It seems that the CryptoCodec is null from the exception stack trace. This may occur when "hadoop.security.crypto.codec.classes.aes.ctr.nopadding" is misconfigured. You could change the log level to "DEBUG" and it will show more detailed information about why CryptoCodec is null. > 在 2019年10月28

RE: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, >From debug logs I could see below logs in taskmanager. Please have a look. org.apache.hadoop.ipc.ProtobufRpcEngine Call: addBlock took 372ms"} org.apache.hadoop.ipc.ProtobufRpcEngine Call: addBlock took 372ms"} org.apache.hadoop.hdfs.DFSClient pipeline = 10.76.113.216:1044"} org.apache.hadoop

Re: Flink 1.5+ performance in a Java standalone environment

2019-10-28 Thread Jakub Danilewicz
Thanks for your replies. We use Flink from within a standalone Java 8 application (no Hadoop, no clustering), so it's basically boils down to running a simple code like this: import java.util.*; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.graph.*; import org.ap

Re: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread Dian Fu
I guess this is a bug in Hadoop 2.6.5 and has been fixed in Hadoop 2.8.0 [1]. You can work around it by explicitly setting the configration "hadoop.security.crypto.codec.classes.aes.ctr.nopadding" as "org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec, org.apache.hadoop.crypto.JceAesCtrCryptoCod

RE: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread V N, Suchithra (Nokia - IN/Bangalore)
Thanks for the information. Without setting such parameter explicitly, is there any possibility that it may work intermittently? From: Dian Fu Sent: Tuesday, October 29, 2019 7:12 AM To: V N, Suchithra (Nokia - IN/Bangalore) Cc: user@flink.apache.org Subject: Re: Flink 1.8.1 HDFS 2.6.5 issue I

Add custom fields into Json

2019-10-28 Thread srikanth flink
Hi there, I'm querying json data and is working fine. I would like to add custom fields including the query result. My query looks like: select ROW(`source`), ROW(`destination`), ROW(`dns`), organization, cnt from (select (source.`ip`,source.`isInternalIP`) as source, (destination.`ip`,destinatio