Re: Running on AWS/EMR/Yarn - where is the WebUI?

2016-08-15 Thread Chiwan Park
Hi Jon, You can connect Flink Web UI via clicking ApplicationMaster link in YARN administrator UI. Regards, Chiwan Park > On Aug 15, 2016, at 2:24 PM, Jon Yeargers wrote: > > Working with a 3 node cluster. Started via YARN. > > If I go to port 8080 I see the Tomcat start screen. 8088 has th

Running on AWS/EMR/Yarn - where is the WebUI?

2016-08-15 Thread Jon Yeargers
Working with a 3 node cluster. Started via YARN. If I go to port 8080 I see the Tomcat start screen. 8088 has the Yarn screen. Didn't see anything obvious to start the UI in the bin folder.

Performance issues - is my topology not setup properly?

2016-08-15 Thread Jon Yeargers
Flink 1.1.1 is running on AWS / EMR. 3 boxes - total 24 cores and 90Gb of RAM. Job is submitted via yarn. Topology: read csv files from SQS -> parse files by line and create object for each line -> pass through 'KeySelector' to pair entries (by hash) over 60 second window -> write original and

Error joining with Python API

2016-08-15 Thread davis k
I've got an issue performing joins using Python API in flink-1.1.1. With this example code get an NPE (below). However, the NPE disappears when the filter is removed. Is there an error I'm making in this brief example or is this a Flink bug? env = get_environment() env.set_parallelism(1)

Enriching events with data from external http resources

2016-08-15 Thread Maciek Próchniak
Hi, Our data streams do some filtering based on data from external http resources (not maintained by us, they're really fast with redis as storage). So far we did that by just invoking synchronously some http client in map/flatMap operations. It works without errors but it seems somehow inef

Re: Does Flink DataStreams using combiners?

2016-08-15 Thread Stephan Ewen
I think combiners would be a great addition to "aligned windows". On Fri, Aug 12, 2016 at 11:11 AM, Aljoscha Krettek wrote: > Hi, > Sameer is right that Flink currently does not combine for any combination > of assigner, trigger and window function. > > Technically, it would be possible to use a

Re: 1.1.1: JobManager config endpoint no longer supplies port

2016-08-15 Thread Shannon Carey
Thanks Ufuk. For now, we will use the Yarn AM proxy. About uploading JARs: the JobManager UI that is exposed via the Yarn AM proxy does not allow manually uploading Flink job jars for execution on the cluster (look for "Yarn's AM proxy doesn't allow file uploads." in the code). As I understand

Re: ValueState is missing

2016-08-15 Thread Dong-iL, Kim
Hi. I use ingestion time. I didn’t use timing window. I've used a GlobalWindow with custom Trigger as below. My apply() logic is same as before and no complaint. Thanks. class HandTrigger extends Trigger[(String, String, String, String, Long), GlobalWindow] { override def onElement(t

Re: ValueState is missing

2016-08-15 Thread Stephan Ewen
Hi! Concerning your latest questions - There should not be multiple threads accessing the same state. - With "using a regular Java Map" I mean keeping everything as it is, except instead of using "ValueState" in the RichFlatMapFunction, you use a java.util.HashMap - If the program works wit

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
I think I also figured out the reason of the behavior I described when one Kafka partition is empty. According to the JavaDocs

Re: 1.1.1: JobManager config endpoint no longer supplies port

2016-08-15 Thread Ufuk Celebi
I've verified this. I think this has likely accidentally changed with the refactoring of the YARN setup for Flink 1.1. We probably wrote the web monitor port explicitly to the config in 1.0 whereas we don't do it in 1.1 anymore. I think this should be addressed with the next bugfix release 1.1.2.

Re: Get minimum or maximum value from a Dataset

2016-08-15 Thread Ufuk Celebi
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html On Wed, Aug 10, 2016 at 11:06 PM, Punit Naik wrote: > Hi > > I have a dataset like this: > > val x : Dataset[Long]… > > I wanted to get the minimum or the maximum Long value. How do I do it?

Re: 1.1.1: JobManager config endpoint no longer supplies port

2016-08-15 Thread Ufuk Celebi
Hey Shannon! I just took a look at the code and it looks like the Flink REST handler for the config did _not_ change since last year. It could be that somehow the config is loaded differently. Can you verify that using the same config with Flink 1.0 and Flink 1.1 the port is printed correctly and a

Re: Unit tests failing, losing stream contents

2016-08-15 Thread Ciar, David B.
Hello again Stephan, I tried modifying the way I call the stream job, while still using the DataStreamUtils.collect(), and I found that it behaves as expected when suites are run individually, or a small number of suites are run as a group. My different suites work when calling `test` within

Re: Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
I think I figured out the explanation of the first part. Looks like the stream gets distributed and merged between the source and the map operator because their parallelisms are different, and therefore the messages resulting from the map operator become out of order. The "Timestamp monotony violat

Kafka topic with an empty parition : parallelism issue and Timestamp monotony violated

2016-08-15 Thread Yassine Marzougui
Hi all, I have a Kafka topic with two partitions, messages within each partition are ordered in ascending timestamps. The following code works correctly (I'm running this on my local machine, the default parallelism is the number of cores=8): stream = env.addSource(myFlinkKafkaConsumer09) stream

Re: No output when using event time with multiple Kafka partitions

2016-08-15 Thread Yassine Marzougui
Hi Aljoscha, Sorry for the late response, I was busy and couldn't make time to work on this again again until now. Indeed, it turns out only one of the partitions is not receiving elements. The reason is that the producer will stick to a partition for topic.metadata.refresh.interval.ms (defaults t

Re: ValueState is missing

2016-08-15 Thread Dong-iL, Kim
Hi. Stephan. do you mean using map on local excution? I’ve tested it but not works at all. Thanks. > On Aug 15, 2016, at 4:56 PM, Dong-iL, Kim wrote: > > Hi. > I've tested the program with window function(keyBy->window->collect). it has > no problem. > > my old program. (keyBy-> state process

Re: ValueState is missing

2016-08-15 Thread Dong-iL, Kim
Hi. I've tested the program with window function(keyBy->window->collect). it has no problem. my old program. (keyBy-> state processing). can it be processed by multiple thread within a key? Thank you. > On Aug 12, 2016, at 8:27 PM, Stephan Ewen wrote: > > Hi! > > So far we are not aware of

Re: Within interval for CEP - Wall Clock based or Event Timestamp based?

2016-08-15 Thread Till Rohrmann
Hi, at the moment the CEP operator using EventTime only processes elements after it has received a watermark which is larger than the timestamp of the elements. In case of late arrivals it might make sense to directly process these events. This should decrease the latency a bit (for this case). C

Re: Unit tests failing, losing stream contents

2016-08-15 Thread Ciar, David B.
Hello Maximilian, Stephan, Thanks for the help/information. For the race condition, that makes sense - I'll stop using DataStreamUtils. FYI it is still listed on the documentation page: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#iterator-data-sink For