To be more clear...
A single source in a Flink program is a logical concept. Flink jobs are
run with some level of parallelism meaning that multiple copies of your
source (and all other) functions are run distributed across a cluster. So
if you have a streaming program with two sources and you r
All streams can be parallelized in Flink even with only one source. You
can have multiple sinks as well.
On Thu, Dec 15, 2016 at 7:56 AM, Meghashyam Sandeep V <
vr1meghash...@gmail.com> wrote:
> 1. If we have multiple sources, can the streams be parallelized ?
> 2. Can we have multiple sinks as
Hi,
I'm seeing an issue with the load speed of Flink Web UI when running in
YARN session. Initial load takes several minutes or even more, although
according to the browser console there are only a couple of MBs to
download. When the loading is complete, the UI itself is quite responsive.
I don't
Hey there,
I was investigating CEP functionality and realized that I'm missing some,
which are discussed here, especially: accessing fields from previous
events.
Any progress regarding this question?
I'm working with streaming car location data trying to analyze different
traffic patterns.
Consi
Hi There,
We are evaluating Flink streaming for real time data analysis. I have my
flink job running in EMR with Yarn. What are the possible benchmarking
tools that work best with Flink? I couldn't find this information in the
Apache website.
Thanks,
Sandeep
1. If we have multiple sources, can the streams be parallelized ?
2. Can we have multiple sinks as well?
On Dec 14, 2016 10:46 PM, wrote:
> Got it. Thanks!
>
> On Dec 15, 2016, at 02:58, Jamie Grier wrote:
>
> Ahh, sorry, for #2: A single Flink job can have as many sources as you
> like. They c
Hi Kidong, Stephan,
First of all, you've saved me days of investigation - thanks a lot! The
problem is solved now. More details follow.
I use the official Flink 1.1.3 + Hadoop 2.7 distribution. My problem was
indeed caused by clash of classes under "com.google" in my fat jar and in
the YARN libra
Hi!
I do slightly disagree with Timo. Custom memory management is always
useful, also in the Streaming API. It makes execution more robust.
If you use RocksDB as a state backend, you get memory management from
RocksDB - effectively all your program key/value state is off-heap.
Flink's own state
Hi Yuri!
Flink should hide Hadoop's Guava, to avoid this issue.
Did you build Flink yourself from source? Maybe you are affected by this
issue:
https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html#dependency-shading
Stephan
On Thu, Dec 15, 2016 at 11:18 AM, Kidong Le
Hi Dominik,
I think having a single output file is only possible if you set the
parallelism of the sink to 1. AFAIK it is not possible to concurrently
write to a single HDFS file from multiple clients.
Cheers,
Aljoscha
On Wed, 14 Dec 2016 at 20:57 Dominik Safaric
wrote:
> Hi everyone,
>
> altho
Hi,
right now, the only way of shutting down a running pipeline is to cancel
it. You can do that in the JobManager dashboard or using the bin/flink
command. And the watermark extraction period does not depend on the watch
interval. It can be configured using
env.getConfig().setAutoWatermarkInterval
To avoid guava conflict, I use maven shade plugin to package my fat jar.
If you use maven, the shade plugin looks like this:
...
org.apache.maven.plugins
maven-shade-plugin
2.4.2
false
true
flink-job
com.google
yourpackage.sh
Hi,
I have run into a classpath issue when running Flink streaming job in YARN
session. I package my app into a fat jar with all the dependencies needed.
One of them is Google Guava. I then submit the jar to the session. The task
managers pre-created by the session build their classpath from the
F
Thanks a lot.
On 12ζ 15 2016, at 5:39 δΈε, Timo Walther wrote:
> Hi Tao,
no, streaming jobs do not use managed memory yet. Managed memory is useful for
sorting, joining and grou
Hi Tao,
no, streaming jobs do not use managed memory yet. Managed memory is
useful for sorting, joining and grouping bounded data. Unbounded stream
do not need that.
It could be used in the future e.g. to store state or for new operators,
but is this is not on the roadmap so far.
Regards,
Hi all,
I have some questions about memory management in the Streaming mode.
Do the Streaming jobs use the memory management module ?
If they don't, for what considerations do not ? Because Data exchange is too
frequent ?
Is there a plan to let streaming job use it ?
Thanks a
Thank you for the answers. The cluster will run in Azure, so I will be using
HDFS over Azure Blob Store, as outlined in
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Azure-Blob-Storage-Connector-td8536.html
I got pretty good performance in my tests, and it should handle scal
17 matches
Mail list logo