Re: Multiple consumers and custom triggers

2016-12-15 Thread Jamie Grier
To be more clear... A single source in a Flink program is a logical concept. Flink jobs are run with some level of parallelism meaning that multiple copies of your source (and all other) functions are run distributed across a cluster. So if you have a streaming program with two sources and you r

Re: Multiple consumers and custom triggers

2016-12-15 Thread Jamie Grier
All streams can be parallelized in Flink even with only one source. You can have multiple sinks as well. On Thu, Dec 15, 2016 at 7:56 AM, Meghashyam Sandeep V < vr1meghash...@gmail.com> wrote: > 1. If we have multiple sources, can the streams be parallelized ? > 2. Can we have multiple sinks as

Flink 1.1.3 web UI is loading very slowly

2016-12-15 Thread Yury Ruchin
Hi, I'm seeing an issue with the load speed of Flink Web UI when running in YARN session. Initial load takes several minutes or even more, although according to the browser console there are only a couple of MBs to download. When the loading is complete, the UI itself is quite responsive. I don't

Re: more complex patterns for CEP (was: CEP two transitions to the same state)

2016-12-15 Thread Dima Arbuzin
Hey there, I was investigating CEP functionality and realized that I'm missing some, which are discussed here, especially: accessing fields from previous events. Any progress regarding this question? I'm working with streaming car location data trying to analyze different traffic patterns. Consi

benchmarking flink streaming

2016-12-15 Thread Meghashyam Sandeep V
Hi There, We are evaluating Flink streaming for real time data analysis. I have my flink job running in EMR with Yarn. What are the possible benchmarking tools that work best with Flink? I couldn't find this information in the Apache website. Thanks, Sandeep

Re: Multiple consumers and custom triggers

2016-12-15 Thread Meghashyam Sandeep V
1. If we have multiple sources, can the streams be parallelized ? 2. Can we have multiple sinks as well? On Dec 14, 2016 10:46 PM, wrote: > Got it. Thanks! > > On Dec 15, 2016, at 02:58, Jamie Grier wrote: > > Ahh, sorry, for #2: A single Flink job can have as many sources as you > like. They c

Re: Jar hell when running streaming job in YARN session

2016-12-15 Thread Yury Ruchin
Hi Kidong, Stephan, First of all, you've saved me days of investigation - thanks a lot! The problem is solved now. More details follow. I use the official Flink 1.1.3 + Hadoop 2.7 distribution. My problem was indeed caused by clash of classes under "com.google" in my fat jar and in the YARN libra

Re: Question about Memory Manage in the Streaming mode

2016-12-15 Thread Stephan Ewen
Hi! I do slightly disagree with Timo. Custom memory management is always useful, also in the Streaming API. It makes execution more robust. If you use RocksDB as a state backend, you get memory management from RocksDB - effectively all your program key/value state is off-heap. Flink's own state

Re: Jar hell when running streaming job in YARN session

2016-12-15 Thread Stephan Ewen
Hi Yuri! Flink should hide Hadoop's Guava, to avoid this issue. Did you build Flink yourself from source? Maybe you are affected by this issue: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html#dependency-shading Stephan On Thu, Dec 15, 2016 at 11:18 AM, Kidong Le

Re: Flink 1.1.3 RollingSink - understanding output blocks/parallelism

2016-12-15 Thread Aljoscha Krettek
Hi Dominik, I think having a single output file is only possible if you set the parallelism of the sink to 1. AFAIK it is not possible to concurrently write to a single HDFS file from multiple clients. Cheers, Aljoscha On Wed, 14 Dec 2016 at 20:57 Dominik Safaric wrote: > Hi everyone, > > altho

Re: In 1.2-SNAPSHOT, EventTimeSessionWindows are not firing untill the whole stream is processed

2016-12-15 Thread Aljoscha Krettek
Hi, right now, the only way of shutting down a running pipeline is to cancel it. You can do that in the JobManager dashboard or using the bin/flink command. And the watermark extraction period does not depend on the watch interval. It can be configured using env.getConfig().setAutoWatermarkInterval

Re: Jar hell when running streaming job in YARN session

2016-12-15 Thread Kidong Lee
To avoid guava conflict, I use maven shade plugin to package my fat jar. If you use maven, the shade plugin looks like this: ... org.apache.maven.plugins maven-shade-plugin 2.4.2 false true flink-job com.google yourpackage.sh

Jar hell when running streaming job in YARN session

2016-12-15 Thread Yury Ruchin
Hi, I have run into a classpath issue when running Flink streaming job in YARN session. I package my app into a fat jar with all the dependencies needed. One of them is Google Guava. I then submit the jar to the session. The task managers pre-created by the session build their classpath from the F

Re: Question about Memory Manage in the Streaming mode

2016-12-15 Thread Tao Meng
Thanks a lot.![](https://link.nylas.com/open/f0mvqfd5d2e8i8vyg0632ikp4/local- af73d072-e4ca?r=dXNlckBmbGluay5hcGFjaGUub3Jn) On 12月 15 2016, at 5:39 δΈ‹εˆ, Timo Walther wrote: > Hi Tao, no, streaming jobs do not use managed memory yet. Managed memory is useful for sorting, joining and grou

Re: Question about Memory Manage in the Streaming mode

2016-12-15 Thread Timo Walther
Hi Tao, no, streaming jobs do not use managed memory yet. Managed memory is useful for sorting, joining and grouping bounded data. Unbounded stream do not need that. It could be used in the future e.g. to store state or for new operators, but is this is not on the roadmap so far. Regards,

Question about Memory Manage in the Streaming mode

2016-12-15 Thread Tao Meng
Hi all, I have some questions about memory management in the Streaming mode. Do the Streaming jobs use the memory management module ? If they don't, for what considerations do not ? Because Data exchange is too frequent ? Is there a plan to let streaming job use it ? Thanks a

RE: Standalone cluster layout

2016-12-15 Thread Avihai Berkovitz
Thank you for the answers. The cluster will run in Azure, so I will be using HDFS over Azure Blob Store, as outlined in http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Azure-Blob-Storage-Connector-td8536.html I got pretty good performance in my tests, and it should handle scal