Hello,
As far as I've seen, there are a lot of projects using Flink and Kafka
together, but I'm not seeing the point of that. Let me know what you think
about this.
1. If I'm not wrong, Kafka provides basically two things: storage (records
retention) and fault tolerance in case of failure, while
Most of the ValueState I am using are of Long or Boolean, except one which
is a map of Long to Scala case class:
ValueState[Map[Long, AnScalaCaseClass]]
Does this serialization happen only for the value state members of
operators, or also other private fields?
Thanks
+satish
On Mon, Nov 14, 201
Hi Stephan,
I faced the similar issue, the way implemented this(though a workaround) is
by making the input to fold function i.e. the initial value to fold
symmetric to what goes into the window function.
I made the initial value to fold function a tuple with all non
required/available index valu
Aljoscha,
thanks for your response. The use-case I'm after is basically providing
"early" (inaccurate) results to downstream consumers. Suppose we're
running aggregations for daily time windows, but we don't want to wait a
whole day to see results. The idea is to fire the windows continuously
Hey There,
I'm trying to calculate the betweenness in a graph with Flink and Gelly, the
way I tried this was by calculating the shortest path from every node to the
rest of the nodes. This results in a Dataset of vertices which all have
Datasets of their own with all the other vertices and their p
Hi Stephan,
I was going to suggest that using a flatMap and tracking the timestamp of
each key yourself is a bit like having a per-key watermark. I wanted to
wait a bit before answering because I'm currently working on a new type of
Function that will be release with Flink 1.2: ProcessFunction. Thi
Hi Gyula,
I've typically added external library dependencies to my own application
JAR as shaded-dependencies. This ensures that all dependencies are included
with my application while being distributed to Flink Job Manager & Task
Manager instances.
Another approach is to place these external JAR
Hi Ufuk,
The master instance of the cluster was also a m3.xlarge instance with 15 GB
RAM, which I would've expected to be enough. I have gotten the program to
run successfully on a personal virtual cluster where each node has 8 GB RAM
and where the master node was also a worker node, so the proble
Hi,
I have been trying to use the -C flag to add external jars with user code
and I have observed some strange behaviour.
What I am trying to do is the following:
I have 2 jars, JarWithMain.jar contains the main class and UserJar.jar
contains some classes that the main method will eventually exec
Hi,
could it be that the job has restarted due to failures a large number of
times?
Cheers,
Aljoscha
On Mon, 7 Nov 2016 at 09:21 Dominique Rondé
wrote:
> First of all, thanks for the explanation. That sounds reasonable.
>
> But I started the flink routes 3 days ago and went out for the weekend.
bin/flink run -c com.att.flink.poc.NifiTest jars/flinkpoc-0.0.1-SNAPSHOT.jar
I have another entry point in this jar that uses readFileStream and that works
fine.
From: Robert Metzger [mailto:rmetz...@apache.org]
Sent: Sunday, November 13, 2016 12:53 AM
To: user@flink.apache.org
Subject: Re: Flin
Hi,
in additional to what you mentioned (having a very large allowed lateness)
you can also try another approach: adding a custom operator in front of the
window operation and splitting the stream by normal elements and very late
elements. Then, in the stream of very late elements you have some cus
Hey Fabian,
thank you very much.
- yes, I would window by event time and fire/purge by processing time
- Cheaper in the end meant, that having too much state in the flink cluster
would be more expensive, as we store all data in cassandra too.I think the
fault tolerance would be okay, as we wou
Hi,
this is a known bug: https://issues.apache.org/jira/browse/FLINK-3869.
I'm still hoping that we can get a workaround in for Flink 1.2. See my last
comment in the Jira Issue.
Cheers,
Aljoscha
On Mon, 14 Nov 2016 at 14:49 Stephan Epping
wrote:
> Hello,
>
> I wondered if there is a particular
Yup, we're currently working on getting ProcessWindowFunction into master.
Then we would work on getting additional information available, such as the
current watermark or a firing reason.
On Mon, 14 Nov 2016 at 11:07 Ufuk Celebi wrote:
> I don't think that this is possible right now.
>
> There
Hi,
event-time windows are being processed in the order of their end timestamp.
If several windows have the same end timestamp then no ordering across
those windows is guaranteed.
Cheers,
Aljoscha
On Mon, 14 Nov 2016 at 10:01 Ufuk Celebi wrote:
> I think there are no ordering guarantees for thi
Hi Stephan,
I'm skeptical about two things:
- using processing time will result in inaccurately bounded aggregates (or
do you want to group by event time in a processing time window?)
- writing to and reading from Cassandra might be expensive (not sure what
you mean by cheaper in the end) and it i
Hi,
I'm afraid the ContinuousEventTimeTrigger is a bit misleading and should
probably be removed. The problem is that a watermark T signals that we
won't see elements with a timestamp < T in the future. It does not signal
that we haven't already seen elements with a timestamp > T. So this cannot
be
Hello,
I wondered if there is a particular reason for the window function to have
explicitly the same input/output type?
public SingleOutputStreamOperator apply(R initialValue, FoldFunction foldFunction, WindowFunction function)
for example (the following does not work):
DataStream aggregate
On 14 November 2016 at 11:30:13, Lorenzo Affetti (lorenzo.affe...@polimi.it)
wrote:
> Thank you for the pointers and the clarification!
>
> One question, when you say: "There is currently no way to be explicitly
> notified about
> an incoming barrier”, isn’t `snapshotOperatorState` invoked w
Thank you for the pointers and the clarification!
One question, when you say: "There is currently no way to be explicitly
notified about an incoming barrier”, isn’t `snapshotOperatorState` invoked when
a barrier approaches the operator?
Lorenzo Affetti
---
MD in computer engineering
PhD St
There seems to be an Exception happening when Flink tries to serialize the
state of your operator (see the stack trace).
What are you trying to store via the ValueState? Maybe you can share a code
excerpt?
– Ufuk
On 14 November 2016 at 10:51:06, Satish Chandra Gupta (scgupt...@gmail.com)
wrot
Ah, sorry. I thought it was something related to Flink. ;)
On 14 November 2016 at 10:59:44, Gyula Fóra (gyula.f...@gmail.com) wrote:
> What I mean is the logs coming from org.apache.hadoop.ipc.Client if you
> look at my original email (at JM logs)
>
> Gyula
>
> Ufuk Celebi ezt írta (időpont: 2
The Python API is in alpha state currently, so we would have to check if it is
related specifically to that. Looping in Chesnay who worked on that.
The JVM GC error happens on the client side as that's where the optimizer runs.
How much memory does the client submitting the job have?
How do you
Sounds good to me if possible without breaking anything :-)
On 14 November 2016 at 10:57:26, Andrey Melentyev (andrey.melent...@gmail.com)
wrote:
> Ufuk,
>
> do you think it's still worth fixing the build errors and maybe adding a
> Travis configuration to build and test against 1.8? Or do you
I don't think that this is possible right now.
There are a proposal and discussion to extend the window function meta data
here:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FL
What I mean is the logs coming from org.apache.hadoop.ipc.Client if you
look at my original email (at JM logs)
Gyula
Ufuk Celebi ezt írta (időpont: 2016. nov. 14., H, 10:52):
> What was the log message shown on DEBUG level?
>
> Maybe it makes sense to promote it to INFO. ;)
>
> I guess there is
Ufuk,
do you think it's still worth fixing the build errors and maybe adding a
Travis configuration to build and test against 1.8? Or do you think that
would introduce too much additional overheads? I mean without deprecating
1.7 support of course.
Andrey
On Mon, Nov 14, 2016 at 10:51 AM, Ufuk C
What was the log message shown on DEBUG level?
Maybe it makes sense to promote it to INFO. ;)
I guess there is no easy way to verify the version, right Max or Robert?
On 14 November 2016 at 10:45:52, Gyula Fóra (gyula.f...@gmail.com) wrote:
> Hi,
>
> The main problem was that whatever was goin
As far as I know, there were no concrete discussions about this yet. I would
not expect it to happen this year, but maybe/probably in the course of next
year. We would have to have a release where we explicitly say that this will be
the last major release with 1.7 support, so users running old v
Hi,
I am using Value State, backed by FsStateBackend on hdfs, as following:
env.setStateBackend(new FsStateBackend(stateBackendPath))
env.enableCheckpointing(checkpointInterval)
It is non-iterative job running Flink/Yarn. The job restarts at
checkpointInterval, I have tried interval varying fro
Hi,
The main problem was that whatever was going wrong was not apparent in the
Flink Application master runner but it was only shown in the YarnClient
debug log.
If you run with the default INFO log level all you see that the Yarn client
is trying to fail over again and again as if something was
You can specify a custom trigger that extends the default ProcessingTimeTrigger
(if you are working with processing time) or EventTimeTrigger (if you are
working with event time).
You do it like this:
stream.timeWindow(Time.of(1, SECONDS)).trigger(new MyTrigger())
Check out the Trigger impleme
Not sure, because there's going to be two WindwoOperator classes on the
class path when deserialising this on the cluster.
On Mon, 14 Nov 2016 at 10:07 Ufuk Celebi wrote:
> On 11 November 2016 at 18:19:30, Aljoscha Krettek (aljos...@apache.org)
> wrote:
> > Hi,
> > I think the jar files in the l
What do the TaskManager logs say wrt to allocation of managed memory?
Something like:
Limiting managed memory to ... of the currently free heap space ..., memory
will be allocated lazily.
What else did you configure in flink-conf?
Looping in Greg and Vasia who maintain Gelly and are most-famil
The Flink docs show how to setup Flink's internal file system operations to use
the S3FileSystem (the StackOverflow question actually shows that it is working,
see answer there).\
This is configuration is independent of what you are doing in your user code.
If you want to use your own S3 based
On 11 November 2016 at 18:19:30, Aljoscha Krettek (aljos...@apache.org) wrote:
> Hi,
> I think the jar files in the lib folder are checked first so shipping the
> WindowOperator with the job should not work.
The WindowOperator is instantiated on the client side and shipped as user code
to the clu
Hey Lorenzo,
internally Flink is able to abort checkpoints, but this is not possible from
the user code. There is currently no way to be explicitly notified about an
incoming barrier.
You can check out this PR (https://github.com/apache/flink/pull/2629) and see
whether it addresses your questi
I think there are no ordering guarantees for this. @Aljoscha: is this correct?
On 11 November 2016 at 19:57:43, Saiph Kappa (saiph.ka...@gmail.com) wrote:
> Hi,
>
> I have a streaming application based on event time. When I issue a
> watermark that will close more than 1 window (and trigger thei
Looping in Kostas and Aljoscha who should know what's the expected behaviour
here ;)
On 11 November 2016 at 16:17:23, Petr Novotnik (petr.novot...@firma.seznam.cz)
wrote:
> Hello,
>
> I'm struggling to understand the following behaviour of the
> `WindowOperator` and would appreciate some insig
Good to know that you solved this. :) Do you think there is something we can do
to help users noticing this situation faster?
– Ufuk
On 13 November 2016 at 00:23:21, Gyula Fóra (gyula.f...@gmail.com) wrote:
> Hi,
>
> What happened is that I compiled Flink with the wrong hadoop version...
>
>
I think this is independent of streaming. If you want to compute the aggregate
over all keys and data you need to do this in a single task, e.g. use a
(flat)map with parallelism 1, do the aggregation there and then broadcast to
downstream operators. Does this make sense or am I overlooking somet
Hey Max!
Thanks for reporting this issue. Can you give more details about how you are
running your job? If you are doing checkpoints to HDFS, could you please report
how many checkpoints you find in your configured directory? Is everything
properly cleaned up there?
– Ufuk
On 12 November 2016
43 matches
Mail list logo