Hi!
The most common cause for "NoSuchMethodError" in Java is a mixup between
code versions. You may have compiled against a different version of the
code than you are running against.
The fix is mostly simply to make sure you have the same version that you
program against and that you run, and re
This looks to me like a bug where type registrations are not properly
forwarded to all Serializers.
Can you open a JIRA ticket for this?
On Thu, Oct 1, 2015 at 6:46 PM, Stefano Bortoli wrote:
> Hi guys,
>
> I hit a Kryo exception while running a process 'crossing' POJOs datasets.
> I am using t
@Lydia Did you create your POM files for your job with an 0.8.x quickstart?
Can you try to simply re-create your project's POM files with a new
quickstart?
I think that the POMS between 0.8-incubating-SNAPSHOT and 0.10-SNAPSHOT may
not be quite compatible any more...
On Fri, Oct 2, 2015 at 12:0
The delay you see happens when the TaskManager allocates the memory for its
memory manager. Allocating that much in a JVM can take a bit, although 40
seconds looks a lot to me...
How do you start the JVM? Are Xmx and Xms set to the same value? If not,
the JVM incrementally grows through multiple g
Is that a new observation that it takes so long, or has it always taken so
long?
On Fri, Oct 2, 2015 at 5:40 PM, Robert Schmidtke
wrote:
> I figured the JM would be waiting for the TMs. Each of my nodes has 64G of
> memory available.
>
> On Fri, Oct 2, 2015 at 5:38 PM, Maximilian Michels wrote:
just looked into an old
>> log (well, from last Friday) and it took about 1 minute for 31 TMs to
>> connect to 1 JM. They each had -Xms and -Xmx6079m though.
>>
>> On Fri, Oct 2, 2015 at 5:44 PM, Stephan Ewen wrote:
>>
>>> Is that a new observation that i
Matthias' solution should work in most cases.
In cases where you do not control the source (or the source can never be
finite, like the Kafka source), we often use a trick in the tests, which is
throwing a special type of exception (a SuccessException).
You can catch this exception on env.execute
I think this is yet another problem caused by Akka's overly strict message
routing.
An actor system bound to a certain URL can only receive messages that are
sent to that exact URL. All other messages are dropped.
This has many problems:
- Proxy routing (as described here, send to the proxy UR
I assume this concerns the streaming API?
Can you share your program and/or the custom input format code?
On Mon, Oct 5, 2015 at 12:33 PM, Pieter Hameete wrote:
> Hello Flinkers!
>
> I run into some strange behavior when reading from a folder of input files.
>
> When the number of input files i
scala
> The Custom Input Format at
> https://github.com/PHameete/dawn-flink/blob/development/src/main/scala/wis/dawnflink/parsing/xml/XML2DawnInputFormat.java
>
> Cheers!
>
> 2015-10-05 12:38 GMT+02:00 Stephan Ewen :
>
>> I assume this concerns the streaming API?
>>
&g
file no records were parsed.
>
> Thanks alot for your help!
>
> - Pieter
>
> 2015-10-05 12:50 GMT+02:00 Stephan Ewen :
>
>> If you have more files than task slots, then some tasks will get multiple
>> files. That means that open() and close() are called multiple times o
Hi!
Are you on the SNAPSHOT master version?
You can pass the configuration to the constructor of the execution
environment, or create one via
ExecutionEnvironment.createLocalEnvironment(config) or via
createRemoteEnvironment(host, port, configuration, jarFiles);
The change of the signature was p
ck support!
>>
>> Best,
>> Flavio
>>
>> On Tue, Oct 6, 2015 at 10:48 AM, Stephan Ewen wrote:
>>
>>> Hi!
>>>
>>> Are you on the SNAPSHOT master version?
>>>
>>> You can pass the configuration to the constructor of the
The split functionality is in the FileInputFormat and the functionality
that takes care of lines across splits is in the DelimitedIntputFormat.
On Wed, Oct 7, 2015 at 3:24 PM, Fabian Hueske wrote:
> I'm sorry there is no such documentation.
> You need to look at the code :-(
>
> 2015-10-07 15:19
Yes, sinks in Flink are lazy and do not trigger execution automatically. We
made this choice to allow multiple concurrent sinks (spitting the streams
and writing to many outputs concurrently). That requires explicit execution
triggers (env.execute()).
The exceptions are, as mentioned, the "eager"
Can you give us a bit more background? What exactly is your program doing?
- Are you running a DataSet program, or a DataStream program?
- Is it one simple source that reads from S3, or are there multiple
sources?
- What operations do you apply on the CSV file?
- Are you using Flink's S3
ot Emr) as far as Flink's s3 connector
> doesn't work at all.
>
> Currently I have 3 taskmanagers each 5k MB, but I tried different
> configurations and all leads to the same exception
>
> *Sent from my ZenFone
> On Oct 8, 2015 12:05 PM, "Stephan Ewen" wrote:
>
at
> org.apache.flink.api.common.io.DelimitedInputFormat.nextRecord(DelimitedInputFormat.java:453)
> at
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:176)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
> at java.lang.Threa
> Konstantin Kudryavtsev
>
> On Thu, Oct 8, 2015 at 12:35 PM, Stephan Ewen wrote:
>
>> Ah, that makes sense!
>>
>> The problem is not in the core runtime, it is in the delimited input
>> format. It probably looks for the line split character and never finds it,
On Tue, Oct 6, 2015 at 11:31 AM, Flavio Pompermaier
> wrote:
>
>> That makes sense: what can be configured should be differentiated between
>> local and remote envs (obviously this is a minor issue/improvement)
>>
>> Thanks again,
>> Flavio
>>
>&
Hi!
As Gyula mentioned an upcoming Pull Request will make the state backend
pluggable. We would like to add the following state holders into Flink:
(1) Small state in memory (local execution / debugging) : State maintained
in a heap hash map, checkpoints to JobManager. This is in there now.
(2)
@Konstantin (2) : Can you try the workaround described by Robert, with the
"s3n" file system scheme?
We are removing the custom S3 connector now, simply reusing Hadoop's S3
connector for all cases.
@Kostia:
You are right, there should be no broken stuff that is not clearly marked
as "beta". For t
@Jakob: If you use Flink standalone (not through YARN), one thing to be
aware of is that the relevant change is in the bash scripts that start the
cluster, not the code. If you upgraded Flink by copying a newer JAR file,
you missed the update of the bash scripts and missed the fix for that issue.
Hi Andrew!
TL;DR There is no out of the box (de)serializer for Flink with Kafka, but
it should be not very hard to add.
Here is a gist that basically does it. Let me know if that works for you,
I'll add it to the Flink source then:
https://gist.github.com/StephanEwen/d515e10dd1c609f70bed
Greeti
Hi!
Two comments:
(1) The iterate() statement is probably wrong, as noticed by Anwar.
(2) Which version of Flink are you using? In 0.9.x, the Union operator is
not lock-safe, in 0.10, it should work well. The 0.10 release is coming up
shortly, you can try the 0.10-SNAPSHOT version already.
Gree
@Andrew Flink should work with Scala classes that follow the POJO style
(public fields), so you should be able to use the Java Avro Library just
like that.
If that does not work in your case, please file a bug report!
On Wed, Oct 21, 2015 at 9:41 AM, Till Rohrmann wrote:
> What was your proble
Hey!
These files are the spilled data from a sort, a hash table, or a cache,
when memory runs short.
If you have some very big files and some 0 sized, I would guess you are
running a Hash Join, and have heavy skew in the distribution of the keys.
Greetings,
Stephan
On Wed, Oct 21, 2015 at 12:5
I think the most crucial question is still whether you are running 0.9.1 or
0.10-SNAPSHOT, because the 0.9.1 union has known issues...
If you are running 0.9.1 there is not much you can do except upgrade the
version ;-)
On Wed, Oct 21, 2015 at 5:19 PM, Aljoscha Krettek
wrote:
> Hi,
> first of al
This is actually not a bug, or a POJO or Avro problem. It is simply a
limitation in the functionality, as the exception message says: "Specifying
fields by name is only supported on Case Classes (for now)."
Try this with a regular reduce function that selects the max and it should
work fine...
Gr
Hi!
The bottom of this page also has an illustration of task to task slots.
https://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/config.html
There are two optimizations involved:
(1) Chaining:
Here sources, mappers, filters are chained together. This is pretty
classic, most systems
Hi!
You are checking for equality / inequality with "!=" - can you check with
"equals()" ?
The key objects will most certainly be different in each record (as they
are deserialized individually), but they should be equal.
Stephan
On Thu, Oct 22, 2015 at 12:20 PM, LINZ, Arnaud
wrote:
> Hello,
Aljoscha Krettek
> wrote:
>
> Hi,
> but he’s comparing it to a primitive long, so shouldn’t the Long key be
> unboxed and the comparison still be valid?
>
> My question is whether you enabled object-reuse-mode on the
> ExecutionEnvironment?
>
> Cheers,
> Aljoscha
>
Hi Johann!
You can try and use the Table API, it has logical tuples that you program
with, rather than tuple classes.
Have a look here:
https://ci.apache.org/projects/flink/flink-docs-master/libs/table.html
Stephan
On Thu, Oct 29, 2015 at 6:53 AM, Fabian Hueske wrote:
> Hi Johann,
>
> I see
Actually, sortPartition(col1).sortPartition(col2) results in a single sort
that primarily sorts after col1 and secondarily sorts after col2, so it is
the same as in SQL when you state "ORDER BY col1, col2".
The SortPartitionOperator created with the first "sortPartition(col1)" call
appends further
Hi!
You probably miss some jars in your classpath. Usually Maven/SBT resolve
that automatically, so I assume you are manually constructing the classpath
here?
For this particular error, you probably miss the "flink-streaming-core"
(0.9.1) / "flink-streaming-java" (0.10) in your classpath.
I woul
Hi!
What kind of setup are you using, YARN or standalone?
In both modes, you should be able to pass your flags via the config entry
"env.java.opts" in the flink-conf.yaml file. See here
https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#other
We have never passed library pa
Ah, okay, I confused the issue.
The environment variables would need to be defined or exported in the
environment that spawns TaskManager processes. I think there is nothing for
that in Flink yet, but it should not be hard to add that.
Can you open an issue for that in JIRA?
Thanks,
Stephan
On
*From:* ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] *On Behalf
> Of *Stephan Ewen
> *Sent:* Monday, November 02, 2015 4:35 PM
> *To:* user@flink.apache.org
> *Subject:* [IE] Re: passing environment variables to flink program
>
>
>
> Ah, okay, I confused the issue.
>
&
You can also try and make the decision on the client. Imagine a program
like this.
long count = env.readFile(...).filter(...).count();
if (count > 5) {
env.readFile(...).map().join(...).reduce(...);
}
else {
env.readFile(...).filter().coGroup(...).map(...);
}
On Mon, Nov 2, 2015 at 1:35 AM
Hey!
There is also a collect() sink in the "flink-streaming-contrib" project,
see here:
https://github.com/apache/flink/blob/master/flink-contrib/flink-streaming-contrib/src/main/java/org/apache/flink/contrib/streaming/DataStreamUtils.java
It should work well locally for testing. In that case you
Hi Niels!
Usually, you simply build the binaries by invoking "mvn -DskipTests clean
package" in the root flink directory. The resulting program should be in
the "build-target" directory.
If the program gets stuck, let us know where and what the last message on
the command line is.
Please be awar
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>
> On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes wrote:
>
>> How long should this take if you have HDD and about 8GB of RAM?
>> Is that 10 minutes? 20?
>>
>> Niels
>>
>> On Sat, Nov 7,
Hi!
Sorry, I do not really understand you question. Can you paste the skeleton
of your program code here?
Thanks,
Stephan
On Fri, Nov 6, 2015 at 6:55 PM, Hajira Jabeen
wrote:
> Hi all,
>
> I am writing an evolutionary computing application with Flink.
> Each object is a particle with multidim
Hi Hector!
I know of users that use camel to produce a stream into Apache Kafka and
then use Flink to consume and process the Kafka stream. That pattern work
well.
Greetings,
Stephan
On Sun, Nov 8, 2015 at 1:33 PM, rmetzger0 wrote:
> Hi Hector,
>
> I'm sorry that nobody replied to your messag
Hi!
If you want to work on subsets of streams, the answer is usually to use
windows, "stream.keyBy(...).timeWindow(Time.of(1, MINUTE))".
The transformations that you want to make, do they fit into a window
function?
There are thoughts to introduce something like global time windows across
the en
The distributed "start-cluster.sh" script works only, if the code is
accessible under the same path on all machines, which must be the same path
as on the machine where you invoke the script.
Otherwise the paths for remote shell commands will be wrong, and the
classpaths will be wrong as a result.
gt; Kerberos secured cluster.
> Cleaning up the patch so I can submit it in a few days.
>
> On Sat, Nov 7, 2015 at 10:01 PM, Stephan Ewen wrote:
>
>> The single shading step on my machine (SSD, 10 GB RAM) takes about 45
>> seconds. HDD may be significantly longer, but should rea
Hi!
Rather than taking an 0.10-SNAPSHOT, you could also take a 0.10 release
candidate.
The latest is for example in
https://repository.apache.org/content/repositories/orgapacheflink-1053/
Greetings,
Stephan
On Mon, Nov 9, 2015 at 5:45 PM, Maximilian Michels wrote:
> Hi Brian,
>
> We are curr
Hi Cory!
There is no flag to define the BlobServer port right now, but we should
definitely add this: https://issues.apache.org/jira/browse/FLINK-2996
If your setup is such that the firewall problem is only between client and
master node (and the workers can reach the master on all ports), then y
Hi!
Flink requires at least Java 1.7, so one of the reasons could also be that
the classes are rejected for an incompatible version (class format 1.7, JVM
does not understand it since it is only version 1.6).
That could explain things...
Greetings,
Stephan
On Wed, Nov 11, 2015 at 9:01 AM, Came
I would encourage you to use the 0.10 version of Flink. Streaming has made
some major improvements there.
The release is voted on now, you can refer to these repositories for the
release candidate code:
http://people.apache.org/~mxm/flink-0.10.0-rc8/
https://repository.apache.org/content/reposit
Hi!
Can you explain a little more what you want to achieve? Maybe then we can
give a few more comments...
I briefly read through some of the articles you linked, but did not quite
understand their train of thoughts.
For example, letting Tomcat write to Cassandra directly, and to Kafka,
might just
Hi!
In general, if you can keep state in Flink, you get better
throughput/latency/consistency and have one less system to worry about
(external k/v store). State outside means that the Flink processes can be
slimmer and need fewer resources and as such recover a bit faster. There
are use cases for
Hi!
You can use both the DataSet API or the DataStream API for that. In case of
failures, they would behave slightly differently.
DataSet:
Fault tolerance for the DataSet API works by restarting the job and redoing
all of the work. In some sense, that is similar to what happens in
MapReduce, onl
I wrong.
>
> Moreover it should be possible to link the streams by next request with
> other filtering criteria. That is create new data transformation chain
> after running of env.execute("WordCount Example"). Is it possible now? If
> not, is it possible with minimal chan
iced that OperatorState
> is not implemented for ConnectedStream, which is quite the opposite of what
> you said below.
>
> Or maybe I misunderstood your sentence here ?
>
> Thanks,
> Anwar.
>
>
> On Wed, Nov 11, 2015 at 10:49 AM, Stephan Ewen wrote:
>
>> Hi!
Hi Nick!
The errors outside your UDF (such as network problems) will be handled by
Flink and cause the job to go into recovery. They should be transparently
handled.
Just make sure you activate checkpointing for your job!
Stephan
On Mon, Nov 16, 2015 at 6:18 PM, Nick Dimiduk wrote:
> I have
oscha
> > On 16 Nov 2015, at 19:22, Stephan Ewen wrote:
> >
> > Hi Nick!
> >
> > The errors outside your UDF (such as network problems) will be handled
> by Flink and cause the job to go into recovery. They should be
> transparently handled.
> >
> > Just mak
It is actually very important that the co group in delta iterations works
like that.
If the CoGroup touched every element in the solution set, the "decreasing
work" effect would not happen.
The delta iterations are designed for cases where specific updates to the
solution are made, driven by the w
Hi!
Can you give us a bit more context? For example share the structure of the
program (what stream get windowed and connected in what way)?
I would guess that the following is the problem:
When you connect one stream to another, then partition n of the first
stream connects with partition n of
out.collect(new
> Features(tuple.getField(0), tuple.getField(2), tuple.getField(1), start_ts,
> end_ts, size, dwell_time, Boolean.FALSE));
>
>
>
> On Tuesday, November 17, 2015 12:59 PM, Stephan Ewen
> wrote:
>
>
> Hi!
>
> Can you give us a bit more context? For
;
>
>
> On Tuesday, November 17, 2015 1:49 PM, Vladimir Stoyak
> wrote:
>
>
> Perfect! It does explain my problem.
>
> Thanks a lot
>
>
>
> On Tuesday, November 17, 2015 1:43 PM, Stephan Ewen
> wrote:
>
>
> Is the CoFlatMapFunction intended to be exe
Hi Arnaud!
Java direct-memory is tricky to debug. You can turn on the memory logging
or check the TaskManager tab in teh web dashboard - both report on direct
memory consumption.
One thing you can look for is forgetting to close streams. That means the
streams consume native resources until the J
I think this pattern may be common, so some tools that share such a table
across multiple tasks may make sense.
Would be nice to add a handler that you give an "initializer" which reads
the data and build the shared lookup map. The first to acquire the handler
actually initializes the data set (re
dicates to run
>> over every stream event. Is the java client API rich enough to express such
>> a flow, or should I examine something lower than DataStream?
>>
>> Thanks a lot, and sorry for all the newb questions.
>> -n
>>
>>
>> On Thursday, November 5,
en to have further input on how this or a similar
> approach (e.g. using a timestamp) could be automated, perhaps by
> customizing the output format as well?
>
> Cheers,
> Max
>
> Am 11.11.2015 um 11:35 schrieb Stephan Ewen :
>
> Hi!
>
> You can use both the Da
That makes sense...
On Mon, Oct 26, 2015 at 12:31 PM, Márton Balassi
wrote:
> Hey Max,
>
> The solution I am proposing is not flushing on every record, but it makes
> sure to forward the flushing from the sinkfunction to the outputformat
> whenever it is triggered. Practically this means that th
Thank you indeed for presenting there.
It looks like a very large audience!
Greetings,
Stephan
On Mon, Oct 26, 2015 at 11:24 AM, Maximilian Michels wrote:
> Hi Liang,
>
> We greatly appreciate you introduced Flink to the Chinese users at CNCC!
> We would love to hear how people like Flink.
>
Hi Guido!
If you use Scala, I would use an Option to represent nullable fields. That
is a very explicit solution that marks which fields can be null, and also
forces the program to handle this carefully.
We are looking to add support for Java 8's Optional type as well for
exactly that purpose.
G
s doesn't increase.
> What is going on underneath? Is it normal?
>
> Thanks in advance,
> Flavio
>
>
>
> On Wed, Oct 7, 2015 at 3:27 PM, Stephan Ewen wrote:
>
>> The split functionality is in the FileInputFormat and the functionality
>> that takes care
before the next test begins?
>
> What is "sink in DOP 1"?
>
> Thanks,
> Nick
>
>
> On Wednesday, November 18, 2015, Stephan Ewen wrote:
>
>> There is no global order in parallel streams, it is something that
>> applications need to work with. W
, Flavio Pompermaier
wrote:
> So why it takes so much to start the job?because in any case the job
> manager has to read all the lines of the input files before generating the
> splits?
> On 18 Nov 2015 17:52, "Stephan Ewen" wrote:
>
>> Late answer, sorry:
>
The new KafkaConsumer fro Kafka 0.9 should be able to handle this, as the
Kafka Client Code itself has support for this then.
For 0.8.x, we would need to implement support for recovery inside the
consumer ourselves, which is why we decided to initially let the Job
Recovery take care of that.
If th
the local fs (ext4)
> On 18 Nov 2015 19:17, "Stephan Ewen" wrote:
>
>> The JobManager does not read all files, but is has to query the HDFS for
>> all file metadata (size, blocks, block locations), which can take a bit.
>> There is a separate call to the HDFS Name
The KafkaAvroDecoder is not serializable, and Flink uses serialization to
distribute the code to the TaskManagers in the cluster.
I think you need to "lazily" initialize the decoder, in the first
invocation of "deserialize()". That should do it.
Stephan
On Thu, Nov 19, 2015 at 12:10 PM, Madhuka
Hi Ron!
You are right, there is a copy/paste error in the docs, it should be a
FoldFunction that is passed to fold(), not a ReduceFunction.
In Flink-0.10, the FoldFunction is only available on
- KeyedStream (
https://ci.apache.org/projects/flink/flink-docs-release-0.10/api/java/org/apache/flin
bit. Looks like another one of those
> API changes that I'll be struggling with for a little bit.
>
> On Thu, Nov 19, 2015 at 10:40 AM, Stephan Ewen wrote:
>
>> Hi Ron!
>>
>> You are right, there is a copy/paste error in the docs, it should be a
>>
Hi Stefania!
I think there is no hook for that right now. If I understand you correctly,
assuming you run YARN or so, you want to give the sources a set of
hostnames, and when scheduling, the sources have preferences for those
nodes.
Within a dataflow program (job), Flink will attempt to co-locat
Hi Arnaud!
In 0.10 , we renamed the dependency to "flink-streaming-java" (and
flink-streaming-scala"), to be more in line with the structure of the
dependencies on the batch side.
Just replace "flink-streaming-core" with "flink-streaming-java"...
Greetings,
Stephan
On Mon, Nov 23, 2015 at 9:07
One addition: You can set the system to use "ingestion time", which gives
you event time with auto-generated timestamps and watermarks, based on the
time that the events are seen in the sources.
That way you have the same simplicity as processing time, and you get the
window alignment that Aljosch
For streaming, I am a bit torn whether reading a file will should have so
many such prominent functions. Most streaming programs work on message
queues, or on monitored directories.
Not saying no, but not sure DataSet/DataStream parity is the main goal -
they are for different use cases after all.
Hi Javier!
You can solve this both using windows, or using manual state.
What is better depends a bit on when you want to have the result (the sum).
Do you want a result emitted after each update (or do some other operation
with that value) or do you want only the final sum after a certain time?
text().getKeyValueState("myCounter",
> Long.class, 0L);
> }
>
> }
>
>
> We are using a Tuple4 because we want to calculate the sum and the average
> (So our Tuple is ID, SUM, Count, AVG). Do we need to add another step to
> get a single value out of i
Hi!
Yes, looks like quite a graph problem. The best way to get started with
that is to have a look at Gelly:
https://ci.apache.org/projects/flink/flink-docs-release-0.10/libs/gelly_guide.html
Beware: The problem you describe (all possible paths between all pairs of
points) results in an exponenti
Hi Niels!
Currently, state is released by setting the value for the key to null. If
you are tracking web sessions, you can try and send a "end of session"
element that sets the value to null.
To be on the safe side, you probably want state that is automatically
purged after a while. I would look
gt;
> Niels
>
>
>
>
>
>
> On Fri, Nov 27, 2015 at 11:51 AM, Stephan Ewen wrote:
>
>> Hi Niels!
>>
>> Currently, state is released by setting the value for the key to null. If
>> you are tracking web sessions, you can try and send a "end of se
The reason why the binary distribution does not contain all connectors is
that this would add all libraries used by the connectors into the binary
distribution jar.
These libraries partly conflict with each other, and often conflict with
the libraries used by the user's programs. Not including the
Hi!
The reason why trigger state is purged right now with the window is to make
sure that no memory is occupied any more after the purge.
Otherwise, memory consumption would just grow indefinitely, holding state
of old triggers.
Greetings,
Stephan
On Fri, Nov 27, 2015 at 4:05 PM, Fabian Hueske
We wanted to combine the accumulators and aggregators for a while, but have
not gotten to it so far (there is a pending PR which needs some more work).
You can currently work your way around this by using the accumulators
together with the aggregators.
- Aggregators: Within an iteration across s
Hi Anton!
That you can do!
You can look at the interfaces "Checkpointed" and "checkpointNotifier".
There you will get a call at every checkpoint (and can look at what records
are before that checkpoint). You also get a call once the checkpoint is
complete, which corresponds to the point when ever
kpoint and catch it at the end of the DAG that would solve my problem
> (well, I also need to somehow distinguish my checkpoint from Flink's
> auto-generated ones).
>
> Sorry for being too chatty, this is the topic where I need expert opinion,
> can't find out the answer by
duced?
>
> On Mon, Nov 30, 2015 at 2:13 PM, Stephan Ewen wrote:
>
>> Hi!
>>
>> If you implement the "Checkpointed" interface, you get the function calls
>> to "snapshotState()" at the point when the checkpoint barrier arrives at an
>> operat
as recorded).
>
> Now in my case all events are in Kafka (for say 2 weeks).
> When something goes wrong I want to be able to 'reprocess' everything from
> the start of the queue.
> Here the matter of 'event time' becomes a big question for me; In those
> 'replay&
realtime version I currently break down the Window to get the single events
> and after that I have to recreate the same Window again.
>
> I'm looking forward to the implementation direction you are referring to.
> I hope you have a better way of doing this.
>
> Niels Basjes
&g
Just for clarification: The real-time results should also contain the
visitId, correct?
On Tue, Dec 1, 2015 at 12:06 PM, Stephan Ewen wrote:
> Hi Niels!
>
> If you want to use the built-in windowing, you probably need two window:
> - One for ID assignment (that immediately pi
Hope you can work with this!
Greetings,
Stephan
On Tue, Dec 1, 2015 at 1:00 PM, Stephan Ewen wrote:
> Just for clarification: The real-time results should also contain the
> visitId, correct?
>
> On Tue, Dec 1, 2015 at 12:06 PM, Stephan Ewen wrote:
>
>> Hi Niels!
>>
perator.snapshotOperatorState(SessionizingOperator.java:162)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:440)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask$1.onEvent(StreamTask.java:574)
> ... 8 more
>
>
> Nie
> Niels
>
>
>
> On Tue, Dec 1, 2015 at 4:41 PM, Niels Basjes wrote:
>
>> Thanks!
>> I'm going to study this code closely!
>>
>> Niels
>>
>> On Tue, Dec 1, 2015 at 2:50 PM, Stephan Ewen wrote:
>>
There is an overview of what guarantees what sources can give you:
https://ci.apache.org/projects/flink/flink-docs-master/apis/fault_tolerance.html#fault-tolerance-guarantees-of-data-sources-and-sinks
On Wed, Dec 2, 2015 at 9:19 AM, Till Rohrmann
wrote:
> Just a small addition. Your sources have
Mihail!
The Flink windows are currently in-memory only. There are plans to relax
that, but for the time being, having enough memory in the cluster is
important.
@Gyula: I think window state is currently also limited when using the
SqlStateBackend, by the size of a row in the database (because win
1 - 100 of 1049 matches
Mail list logo