Hello Stefano
Sorry for the late reply. Many thanks for taking effort to write and share
an example code snippet.
I have been playing with the countWindow behaviour for some weeks now and I
am generally aware of the functionality of countWindowAll(). For my
useCase, where I _have to observe_ the
Thanks, I'll check this.
Saliya
On Mon, Feb 15, 2016 at 4:08 PM, Fabian Hueske wrote:
> I would have a look at the example programs in our code base:
>
>
> https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java
>
> Best, Fabi
I would have a look at the example programs in our code base:
https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java
Best, Fabian
2016-02-15 22:03 GMT+01:00 Saliya Ekanayake :
> Thank you, Fabian.
>
> Any chance you might hav
Thank you, Fabian.
Any chance you might have an example on how to define a data flow with
Flink?
On Mon, Feb 15, 2016 at 3:58 PM, Fabian Hueske wrote:
> It is not possible to "pin" data sets in memory, yet.
> However, you can stream the same data set through two different mappers at
> the sam
It is not possible to "pin" data sets in memory, yet.
However, you can stream the same data set through two different mappers at
the same time.
For instance you can have a job like:
/---> Map 1 --> SInk1
Source --<
\---> Map 2 --> SInk2
and execute it at once.
F
Fabian,
count() was just an example. What I would like to do is say run two map
operations on the dataset (ds). Each map will have it's own reduction, so
is there a way to avoid creating two jobs for such scenario?
The reason is, reading these binary matrices are expensive. In our current
MPI imp
Hi,
it looks like you are executing two distinct Flink jobs.
DataSet.count() triggers the execution of a new job. If you have an
execute() call in your program, this will lead to two Flink jobs being
executed.
It is not possible to share state among these jobs.
Maybe you should add a custom count
Hi,
I see that an InputFormat's open() and nextRecord() methods get called for
each terminal operation on a given dataset using that particular
InputFormat. Is it possible to avoid this - possibly using some caching
technique in Flink?
For example, I've some code like below and I see for both the
Hi,
I've now created a "preview RC" for the upcoming 1.0.0 release.
There are still some blocking issues and important pull requests to be
merged but nevertheless I would like to start testing Flink for the release.
In past major releases, we needed to create many release candidates, often
for fi
Hi Zach,
Here you go:
http://flink.apache.org/contribute-code.html#snapshots-nightly-builds
Cheers,
Max
On Mon, Feb 15, 2016 at 6:29 PM, Zach Cox wrote:
> Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like
> there are for 0.10.2 [1]? I'm testing out an application built a
Hi - are there binary downloads of the Flink 1.0-SNAPSHOT tarballs, like
there are for 0.10.2 [1]? I'm testing out an application built against the
1.0-SNAPSHOT dependencies from Maven central, and want to make sure I run
them on a Flink 1.0-SNAPSHOT cluster that matches up with those jars.
Thanks
Hi,
Thanks Aljoscha for the details!
The warning about performance and evictors is useful, but I am not sure how it
can be put in practice always. Take for example a GlobalWindow that you would
use to aggregate data from multiple partitions. A GlobalWindow does not come
with a trigger - would
You can find the log of the recovering job manager here:
https://gist.github.com/stefanobaghino/ae28f00efb6bdd907b42
Basically, what Ufuk said happened: the job manager tried to fill in for
the lost one but couldn't find the actual data because it looked it up
locally whereas due to my configurati
Hi Stefano,
A correction from my side: You don't need to set the execution retries
for HA because a new JobManager will automatically take over and
resubmit all jobs which were recovered from the storage directory you
set up. The number of execution retries applies only to jobs which are
restarted
But isn't that a normal stack trace which you see when you submit a job to
the cluster via the CLI and somewhere in the compilation process something
fails?
Anyway, it would be helpful to see the program which causes this problem.
Cheers,
Till
On Mon, Feb 15, 2016 at 12:25 PM, Fabian Hueske wro
Stephan Ewen wrote
> Do you know why you are getting conflicts on the FashHashMap class, even
> though the core Flink dependencies are "provided"? Does adding the Kafka
> connector pull in all the core Flink dependencies?
Yes, the core Flink dependencies are being pulled in transitively from the
K
Hi!
Looks like that experience should be improved.
Do you know why you are getting conflicts on the FashHashMap class, even
though the core Flink dependencies are "provided"? Does adding the Kafka
connector pull in all the core Flink dependencies?
Concerning the Kafka connector: We did not inclu
> On 15 Feb 2016, at 13:40, Stefano Baghino
> wrote:
>
> Hi Ufuk, thanks for replying.
>
> Regarding the masters file: yes, I've specified all the masters and checked
> out that they were actually running after the start-cluster.sh. I'll gladly
> share the logs as soon as I get to see them.
Hi Stefano,
That is true. The documentation doesn't mention that. Just wanted to
point you to the documentation if anything else needs to be
configured. We will update it.
Instead of setting the number of execution retries on the
StreamExecutionEnvironment, you may also set
"execution-retries.def
Hi Maximilian,
thank you for the reply. I've checked out the documentation before running
my tests (I'm not expert enough to not read the docs ;)) but it doesn't
mention some specific requirement regarding the execution retries, I'll
check it out, thank!
On Mon, Feb 15, 2016 at 12:51 PM, Maximili
Hi Ufuk, thanks for replying.
Regarding the masters file: yes, I've specified all the masters and checked
out that they were actually running after the start-cluster.sh. I'll gladly
share the logs as soon as I get to see them.
Regarding the state backend: how does having a non-distributed storage
org.apache.flink.streaming.connectors.twitter.TwitterFilterSource -
Initializing Twitter Streaming API connection
12:27:32,134 INFO
com.twitter.hbc.httpclient.BasicClient- New
connection executed: twitterSourceClient, endpoint:
/1.1/statuses/filter.json?delimited=length
12:
Hi Stefano,
The Job should stop temporarily but then be resumed by the new
JobManager. Have you increased the number of execution retries? AFAIK,
it is set to 0 by default. This will not re-run the job, even in HA
mode. You can enable it on the StreamExecutionEnvironment.
Otherwise, you have prob
Using the local file system as state backend only works if all job
managers run on the same machine. Is that the case?
Have you specified all job managers in the masters file? With the
local file system state backend only something like
host-X
host-X
host-X
will be a valid masters configuration.
Hi Javier,
Keys is an internal class and was recently moved to a different package.
So it appears like your Flink dependencies are not aligned to the same
version.
We also added Scala version identifiers to all our dependencies which
depend on Scala 2.10.
For instance, flink-scala became flink-sc
Hello everyone,
last week I've ran some tests with Apache ZooKeeper to get a grip on Flink
HA features. My tests went bad so far and I can't sort out the reason.
My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3
masters and 4 slaves. The 3 masters are also the ZooKeeper (3
Hi Michal,
If I got your requirements right, you could try to solve this issue by
serving the updates through a regular DataStream.
You could add a SourceFunction which periodically emits a new version of
the cache and a CoFlatMap operator which receives on the first input the
regular streamed inp
Hi,
This stacktrace looks really suspicious.
It includes classes from the submission client (CLIClient), optimizer
(JobGraphGenerator), and runtime (KryoSerializer).
Is it possible that you try to start a new Flink job inside another job?
This would not work.
Best, Fabian
Hi Srikanth,
DataSet.partitionBy() will partition the data on the declared partition
fields.
If you append a DataSink with the same parallelism as the partition
operator, the data will be written out with the defined partitioning.
It should be possible to achieve the behavior you described using
D
Hi,
you are right, the logic is in EvictingNonKeyedWindowOperator.emitWindow() for
non-parallel (non-keyed) windows and in EvictingWindow.processTriggerResult()
in the case of keyed windows.
You are also right about the contract of the Evictor, it returns the number of
elements to be evicted fr
Hello,
I am looking over the mechanisms of evicting events in Flink. I saw that either
using a default evictor or building a custom one the logic is that the evictor
will provide the number of events to be discarded.
Could you please provide me with some additional pointers regarding the
mechan
Hi guys,
I'm running a small test with the SNAPSHOT version in order to be able to
use Kafka 0.9 and I'm getting the following error:
*cannot access org.apache.flink.api.java.operators.Keys*
*[ERROR] class file for org.apache.flink.api.java.operators.Keys not found*
The code I'm using is as foll
32 matches
Mail list logo