Hi,
How do I do Guava Immutable collections serialization in Flink?
I am getting error
Caused by: java.io.NotSerializableException:
de.javakaffee.kryoserializers.guava.ImmutableMapSerializer
when I register ImmutableMap to be serialized by the
ImmutableMapSerializer. I am using the latest versi
hi. stephan.
- Did you submit any job to the cluster, or is the memory just growing even
on an idle TaskManager?
I have some stream job.
- If you are running a job, do you use the RocksDB state backend, of the
FileSystem state backend?
file state backend. i use s3.
- Does it grow infinitely, o
Hi,
If watermarks arriving from multiple sources, how long does the Event Time
Trigger wait for the slower source to send its watermarks before triggering
only from the faster source? I have seen that if one of the sources is really
slow then the elements of the faster source fires and when the
Hi David,
You are right, the events in the window are not sorted according to the
EventTime hence the processing is not done in an increasing order of
timestamp.
As you said, you will have to do the sorting yourself in your window
function to make sure that you are processing the events in order.
Hi all,
In Flink, after setting the time characteristic to event time and properly
assigning timestamps/watermarks, time-based windows will be created based upon
event time. If we need to process events within a window in event time order,
we can sort the windowed values and process as necessar
Thanks Milind & Till,
This is what I thought from my reading of the documentation but it is nice to
have it confirmed by people more knowledgeable.
Supplementary to this question is whether Flink is the best choice for batch
processing at this point in time or whether I would be better to look
Hi,
I want to run test my flink streaming code, and thus I want to run flink
streaming jobs with different parameters one by one.
So, when one job finishes after it doesn't receive new data points for
sometime , the next job with a different set of parameter should start.
For this, I am already
Hi!
I agree, that shading is tedious. It seems to be a pretty fundamental Java
problem that exists in all those Java-based frameworks.
The only way I know how to solve this is having fewer dependencies in the
framework code.
Right now, the JVMs that execute the user code have for example a
depend
Hi!
In order to answer this, we need a bit more information. Here are some
followup questions:
- Did you submit any job to the cluster, or is the memory just growing
even on an idle TaskManager?
- If you are running a job, do you use the RocksDB state backend, of the
FileSystem state backend?
Hi Flavio,
As Aljoscha pointed out the problem must be solved now.
The changes are already in the master.
If there is any issue let us know.
Kostas
> On Jul 20, 2016, at 6:29 PM, Aljoscha Krettek wrote:
>
> Hi,
> the configuration has to be passed using
> env.readFile(...).withParameters(ifCo
Good catch! That should do it if you have access to the local storage
of the JobManager.
On Wed, Jul 20, 2016 at 5:25 PM, Aljoscha Krettek wrote:
> Hi,
> in the JobManager log there should be a line like this:
> 2016-07-20 17:19:00,552 INFO
> org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
Hi,
the configuration has to be passed using
env.readFile(...).withParameters(ifConf). The InputFormat will then be
properly configured at runtime.
However, Kostas just enhanced the FileInputFormats to allow setting the
parameters directly on the input format. In 1.1-SNAPSHOT and the upcoming
1.1
I've set up cluster(stand alone).
Taskmanager consumes memory over the Xmx property and it grows up
continuously.
I saw this link(
http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccak2vtervsw4muboc4swix0mr6y9bijznjuypf6_f9f0g9-_...@mail.gmail.com%3E
).
So i set the taskmanager.memo
oh. my flink version is 1.0.3.
-- Forwarded message --
From: 김동일
Date: Thu, Jul 21, 2016 at 12:52 AM
Subject: taskmanager memory leak
To: user@flink.apache.org
I've set up cluster(stand alone).
Taskmanager consumes memory over the Xmx property and it grows up
continuously.
I sa
Hi to all,
in my job I'm doing the following to recursively read the files inside a
dir:
TextInputFormat inputFormat = new TextInputFormat(new Path(inputDir));
org.apache.flink.configuration.Configuration ifConf =
new org.apache.flink.configuration.Configuration();
ifConf.setBool
Hi,
in the JobManager log there should be a line like this:
2016-07-20 17:19:00,552 INFO
org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory
/some/dir for web frontend JAR file uploads
if you manually delete the offending jar file from that directory it could
solve your problem
Hi Stephan,
IMO the platform needs better jobs isolation. All this shading shouldn't
be required at all.
Michal
On 20.07.2016 16:18, Stephan Ewen wrote:
Hi Michael!
The only safe way in Java to isolate the user code from the platform
would be to completely run them in different JVMs.
Hi Stephan,
IMO the platform needs better jobs isolation. All this shading shouldn't
be required at all.
Michal
On 20.07.2016 16:18, Stephan Ewen wrote:
Hi Michael!
The only safe way in Java to isolate the user code from the platform
would be to completely run them in different JVMs.
Which is of course only available in 1.1-SNAPSHOT or the upcoming 1.1
release. :-)
On Tue, 19 Jul 2016 at 22:32 Till Rohrmann wrote:
> Hi Dominique,
>
> your problem sounds like a good use case for session windows [1, 2]. If
> you know that there is only a maximum gap between your request and re
I thing we can simply add this behavior when we use the TypeComparator in
the keyBy() function. It can implement the hashCode() as a deepHashCode on
array types.
On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi wrote:
> Would make sense to update the Javadocs for the next release.
>
> On Mon, Jun 1
Hi Max,
Yeah I tried that and its definitely better. Only a few points go missing
compared to a huge amount in the beginning. For now, its good for me and my
work.
Thanks a lot for the workaround.
-Biplob
--
View this message in context:
http://apache-flink-user-mailing-list-archive.233605
The connector doesn't cover this use case. Through the API you need to
use the IndicesAdminClient:
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-admin-indices.html
Otherwise Elasticsearch creates an index with shards automatically. We
could add support for configuring
Hello maximilian,
Then as you said, in starting question of thread:
Iterator iter = DataStreamUtils.collect(centroids);
Collection testCentroids = Lists.newArrayList(iter);
for(Centroid c: testCentroids){
System.out.println(c);
}
The above waits until the iterator doesn't have data anymore, and
Hi Michael!
The only safe way in Java to isolate the user code from the platform would
be to completely run them in different JVMs.
Other than that, Max's method to ensure the correct instantiation should
work in most cases.
We also continuously try to have fewer dependencies in Flink, to that t
Everything works as expected. The while loop blocks until the iterator
doesn't have data anymore (=the program has ended). All data will end up in
the ArrayList.
The latter exception comes from a duplicate call to execute(). Actually,
collect() internally calls execute() because the job has to run
Sure. No Problem.
The issue is a bit more involved. You're right, the user classes have
precedence over the Flink classpath. So your classes were probably
loaded fine. However, the user code also calls Flink code which can
use a library version different from the job jar library. And boom, it
cras
Hi Gary,
That is a bug. The main method might actually be there but it fails to
load a class:
> Caused by: java.lang.ClassNotFoundException:
> org.shaded.apache.flink.streaming.api.functions.source.SourceFunction
It looks like internal Flink classes have been shaded but not included
in the job j
hello maximilian,
Thanks! I learned new thing today :). But my problem still exists. Your
example has little data and it works fine. But in my datastream I have set
timeWindow as Time.seconds(5). What I found out is, if I print as below as
your example:
Iterator iter = DataStreamUtils.collect(cen
Hi all,
I accidentally packaged a Flink Job for which the main method could not
be looked up. This breaks the Flink Dashboard's job submission page
(no jobs are displayed). I opened a ticket:
https://issues.apache.org/jira/browse/FLINK-4236
Is there a way to recover from this without restartin
Thanks for the prompt replay.
You are right. The conflict was between "com.fasterxml.jackson.core" libs.
I am just wondering. If the the jobs were separted from the platform,
the jobs libs should have precedence and no versioning problem should
have happened.
Regards,
Michal
On 20.07.2016
Max is right, Flink uses Calcite rules for optimization. The following
rules are applied so far:
https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/rules/FlinkRuleSets.scala
The filter condition you metioned will not be optimize
Ah, now I see where the problem lies. You're reusing the Iterator
which you have already used in the for loop. You can only iterate over
the elements once! This is the nature of the Java Iterator and
DataStreamUtils.collect(..) returns an iterator.
On Wed, Jul 20, 2016 at 1:11 PM, subash basnet w
Hi Michal,
I couldn't find Joda in flink-dist. Possibly there is some other clash?
There are two potential issues here:
1) Flink shades some libraries (Guava) but not all. If you use a
version of a library in your Flink job which doesn't match the one in
flink-dist, you're bound for trouble.
2)
@Paris Thanks for the prompt feedback! I really have to check out your PR :)
@Biblop: If I understand correctly, a possible workaround in the
meantime seems to be to use `setBufferTimeout(0)` on your
StreamExecutionEnvironment. Could you try that?
On Wed, Jul 20, 2016 at 12:30 PM, Paris Carbone
Hi,
It's stating that you can't use a DataStream which was not part of the
iteration. It works with `newCentroids` because it is part of the
loop.
The only way to get the centroids DataStream in, is to union/join it
with the `newCentroids` stream.
Cheers,
Max
On Wed, Jul 20, 2016 at 11:33 AM, s
Hi Gallenvara,
As far as I know, the Table API is now translated into a Calcite plan
which is then optimized according to Calcite's optimization rules.
Cheers,
Max
On Wed, Jul 20, 2016 at 7:24 AM, gallenvara wrote:
>
> Hello, everyone. I'm new to Calcite and have some problems with it. Flink
>
Hello Maximilian,
Thank's for the update. Yup it works in the example you gave. I checked
with collection also it works. But not in my datastream case after the
collection.
DataStream centroids = *newCentroidDataStream*.map(new
TupleCentroidConverter());
Iterator iter = DataStreamUtils.collect(cen
Hi all,
We had a class versioning problem within Flink Job.
The job uses Joda 2.6, but the flink-dist 1.0.3 packages 2.5.
The problem was solved by relocating job classes with shade plug-in.
Does flink separate jobs from each other to avoid class conflicts between them
and the platform ?
Is job
This is possibly related to the way the queue between StreamIterationTail and
Head is currently implemented.
I think this part is a bit prone to records loss when things get wacky and
backpressure kicks in (but at least it avoids deadlocks, right?).
I don’t have the time availability to look int
Just tried the following and it worked:
public static void main(String[] args) throws IOException {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
final DataStreamSource source = env.fromElements(1, 2, 3, 4);
source.print();
final Iterator iter
CC Gyula and Paris in case they might want to help out.
On Tue, Jul 19, 2016 at 11:43 AM, Biplob Biswas
wrote:
> Hi Ufuk,
>
> Thanks for the update, is there any known way to fix this issue? Any
> workaround that you know of, which I can try?
>
>
>
> --
> View this message in context:
> http://a
Hello all,
When I execute the below streaming code:
DataStream *centroids* = newCentroidDataStream.map(new
TupleCentroidConverter());
ConnectedIterativeStreams loop =
*points*.iterate().withFeedbackType(Centroid.class);
DataStream *newCentroids* = loop.flatMap(new
SelectNearestCenter(10)).map(new
At the moment there is also no batch source for Kafka. I'm also not so sure
how you would define a batch given a Kafka stream. Only reading till a
certain offset? Or maybe until one has read n messages?
I think it's best to write the batch data to HDFS or another batch data
store.
Cheers,
Till
O
43 matches
Mail list logo