Hi Gwenhaël,
good to hear that you could resolve the problem.
When you run multiple HA flink jobs in the same cluster, then you don’t
have to adjust the configuration of Flink. It should work out of the box.
However, if you run multiple HA Flink cluster, then you have to set for
each cluster a d
agree,
and Stateful Streaming operator instance in Flink is looks natural compare
to Apache Spark.
On Thu, Nov 19, 2015 at 10:54 AM, Liang Chen
wrote:
> Two aspects are attracting them:
> 1.Flink is using java, it is easy for most of them to start Flink, and be
> more easy to maintain in compar
Two aspects are attracting them:
1.Flink is using java, it is easy for most of them to start Flink, and be
more easy to maintain in comparison to Storm(as Clojure is difficult to
maintain, and less people know it.)
2.Users really want an unified system supporting streaming and batch
processing.
Hi rmetzger0,
Thanx for the response. I didn't know that I had to register before I could
receive responses for my posts.
Now I am registered. But the problem is not resolved yet. I know it might
not be intuitive to get execution time from a long running streaming job but
it is possible to get to
Hi,
I'm hitting Compiler Exception with some of my data set, but not all of
them.
Exception in thread "main" org.apache.flink.optimizer.CompilerException: No
plan meeting the requirements could be created @ Bulk Iteration (Bulk
Iteration) (1:null). Most likely reason: Too restrictive plan hints.
Granted, both are presented with the same example in the docs. They are
modeled after reduce and fold in functional programming. Perhaps we should
have a bit more enlightening examples.
On Wed, Nov 18, 2015 at 6:39 PM, Fabian Hueske wrote:
> Hi Ron,
>
> Have you checked:
> https://ci.apache.org/
it was long ago..but if I remember correctly they were about 50k
On 18 Nov 2015 19:22, "Stephan Ewen" wrote:
> Okay, let me take a step back and make sure I understand this right...
>
> With many small files it takes longer to start the job, correct? How much
> time did it actually take and how m
Okay, let me take a step back and make sure I understand this right...
With many small files it takes longer to start the job, correct? How much
time did it actually take and how many files did you have?
On Wed, Nov 18, 2015 at 7:18 PM, Flavio Pompermaier
wrote:
> in my test I was using the lo
The new KafkaConsumer fro Kafka 0.9 should be able to handle this, as the
Kafka Client Code itself has support for this then.
For 0.8.x, we would need to implement support for recovery inside the
consumer ourselves, which is why we decided to initially let the Job
Recovery take care of that.
If th
Please see the above gist: my test makes no assertions until after the
env.execute() call. Adding setParallelism(1) to my sink appears to
stabilize my test. Indeed, very helpful. Thanks a lot!
-n
On Wed, Nov 18, 2015 at 9:15 AM, Stephan Ewen wrote:
> Okay, I think I misunderstood your problem.
in my test I was using the local fs (ext4)
On 18 Nov 2015 19:17, "Stephan Ewen" wrote:
> The JobManager does not read all files, but is has to query the HDFS for
> all file metadata (size, blocks, block locations), which can take a bit.
> There is a separate call to the HDFS Namenode for each fil
The JobManager does not read all files, but is has to query the HDFS for
all file metadata (size, blocks, block locations), which can take a bit.
There is a separate call to the HDFS Namenode for each file. The more
files, the more metadata has to be collected.
On Wed, Nov 18, 2015 at 7:15 PM, Fl
So why it takes so much to start the job?because in any case the job
manager has to read all the lines of the input files before generating the
splits?
On 18 Nov 2015 17:52, "Stephan Ewen" wrote:
> Late answer, sorry:
>
> The splits are created in the JobManager, so the sub submission should not
Hi Ron,
Have you checked:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#transformations
?
Fold is like reduce, except that you define a start element (of a different
type than the input type) and the result type is the type of the initial
value. In reduce,
Nevermind,
Looking at the logs I saw that it was having issues trying to connect to ZK.
To make I short is had the wrong port.
It is now starting.
Tomorrow I’ll try to kill some JobManagers *evil*.
Another question : if I have multiple HA flink jobs, are there some points to
check in order to
Is there a succinct description of the distinction between these transforms?
Ron
—
Ron Crocker
Principal Engineer & Architect
( ( •)) New Relic
rcroc...@newrelic.com
M: +1 630 363 8835
Okay, I think I misunderstood your problem.
Usually you can simply execute tests one after another by waiting until
"env.execute()" returns. The streaming jobs terminate by themselves once
the sources reach end of stream (finite streams are supported that way) but
make sure all records flow throug
Hi Gwenhaël,
do you have access to the yarn logs?
Cheers,
Till
On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers <
gwenhael.pasqui...@ericsson.com> wrote:
> Hello,
>
>
>
> We’re trying to set up high availability using an existing zookeeper
> quorum already running in our Cloudera cluster.
>
Hello,
We're trying to set up high availability using an existing zookeeper quorum
already running in our Cloudera cluster.
So, as per the doc we've changed the max attempt in yarn's config as well as
the flink.yaml.
recovery.mode: zookeeper
recovery.zookeeper.quorum: host1:3181,host2:3181,hos
Late answer, sorry:
The splits are created in the JobManager, so the sub submission should not
be affected by that.
The assignment of splits to workers is very fast, so many splits with small
data is not very different from few splits with large data.
Lines are never materialized and the operato
Hi Guido!
If you use Scala, I would use an Option to represent nullable fields. That
is a very explicit solution that marks which fields can be null, and also
forces the program to handle this carefully.
We are looking to add support for Java 8's Optional type as well for
exactly that purpose.
G
Thank you indeed for presenting there.
It looks like a very large audience!
Greetings,
Stephan
On Mon, Oct 26, 2015 at 11:24 AM, Maximilian Michels wrote:
> Hi Liang,
>
> We greatly appreciate you introduced Flink to the Chinese users at CNCC!
> We would love to hear how people like Flink.
>
Yes, that does make sense! Thank you for explaining. Have you made the
change yet? I couldn't find it on the master.
On Wed, Nov 18, 2015 at 5:16 PM, Stephan Ewen wrote:
> That makes sense...
>
> On Mon, Oct 26, 2015 at 12:31 PM, Márton Balassi
> wrote:
>>
>> Hey Max,
>>
>> The solution I am pro
That makes sense...
On Mon, Oct 26, 2015 at 12:31 PM, Márton Balassi
wrote:
> Hey Max,
>
> The solution I am proposing is not flushing on every record, but it makes
> sure to forward the flushing from the sinkfunction to the outputformat
> whenever it is triggered. Practically this means that th
Sorry Stephan but I don't follow how global order applies in my case. I'm
merely checking the size of the sink results. I assume all tuples from a
given test invitation have sunk before the next test begins, which is
clearly not the case. Is there a way I can place a barrier in my tests to
ensure o
The option was accepted using the yaml file and it looks likes it solved our
issue.
Thanks again.
From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: mardi 17 novembre 2015 12:04
To: user@flink.apache.org
Subject: Re: MaxPermSize on yarn
You can also put the configuration option into the fl
Hi!
If you go with the Batch API, then any failed task (like a sink trying to
insert into the database) will be completely re-executed. That makes sure
no data is lost in any way, no extra effort needed.
It may insert a lot of duplicates, though, if the task is re-started after
half the data was
There is no global order in parallel streams, it is something that
applications need to work with. We are thinking about adding operations to
introduce event-time order (at the cost of some delay), but that is only
plans at this point.
What I do in my tests is run the test streams in parallel, bu
Hi,
I wrote a little example that could be what you are looking for:
https://github.com/dataArtisans/query-window-example
It basically implements a window operator with a modifiable window size that
also allows querying the current accumulated window contents using a second
input stream.
There
Hi Aljoscha,
thanks, that's what I thought. Just wanted to verify, that keyBy +
SessionWindow() works with intermingled events.
Cheers,
Konstantin
On 18.11.2015 11:14, Aljoscha Krettek wrote:
> Hi Konstatin,
> you are right, if the stream is keyed by the session-id then it works.
>
> I was ref
Hey Vasia,
I think a very common workload would be an event stream from web servers of
an online shop. Usually, these shops have multiple servers, so events
arrive out of order.
I think there are plenty of different use cases that you can build around
that data:
- Users perform different actions t
Hi Konstatin,
you are right, if the stream is keyed by the session-id then it works.
I was referring to the case where you have, for example, some interactions with
timestamps and you want to derive the sessions from this. In that case, it can
happen that events that should belong to one session
We, were also trying to address session windowing but took slightly different
approach as to what window we place the event into.
We did not want "triggering event" to be purged as part of the window it
triggered, but instead to create a new window for it and have the old window to
fire and pu
33 matches
Mail list logo