Guava immutable collection kryo serialization

2016-07-20 Thread Shaosu Liu
Hi, How do I do Guava Immutable collections serialization in Flink? I am getting error Caused by: java.io.NotSerializableException: de.javakaffee.kryoserializers.guava.ImmutableMapSerializer when I register ImmutableMap to be serialized by the ImmutableMapSerializer. I am using the latest versi

Re: taskmanager memory leak

2016-07-20 Thread 김동일
hi. stephan. - Did you submit any job to the cluster, or is the memory just growing even on an idle TaskManager? I have some stream job. - If you are running a job, do you use the RocksDB state backend, of the FileSystem state backend? file state backend. i use s3. - Does it grow infinitely, o

Re: Processing windows in event time order

2016-07-20 Thread Sameer Wadkar
Hi, If watermarks arriving from multiple sources, how long does the Event Time Trigger wait for the slower source to send its watermarks before triggering only from the faster source? I have seen that if one of the sources is really slow then the elements of the faster source fires and when the

Re: Processing windows in event time order

2016-07-20 Thread Vishnu Viswanath
Hi David, You are right, the events in the window are not sorted according to the EventTime hence the processing is not done in an increasing order of timestamp. As you said, you will have to do the sorting yourself in your window function to make sure that you are processing the events in order.

Processing windows in event time order

2016-07-20 Thread David Desberg
Hi all, In Flink, after setting the time characteristic to event time and properly assigning timestamps/watermarks, time-based windows will be created based upon event time. If we need to process events within a window in event time order, we can sort the windowed values and process as necessar

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Leith Mudge
Thanks Milind & Till, This is what I thought from my reading of the documentation but it is nice to have it confirmed by people more knowledgeable. Supplementary to this question is whether Flink is the best choice for batch processing at this point in time or whether I would be better to look

Running multiple Flink Streaming Jobs, one by one

2016-07-20 Thread Biplob Biswas
Hi, I want to run test my flink streaming code, and thus I want to run flink streaming jobs with different parameters one by one. So, when one job finishes after it doesn't receive new data points for sometime , the next job with a different set of parameter should start. For this, I am already

Re: Class loading and job versioning

2016-07-20 Thread Stephan Ewen
Hi! I agree, that shading is tedious. It seems to be a pretty fundamental Java problem that exists in all those Java-based frameworks. The only way I know how to solve this is having fewer dependencies in the framework code. Right now, the JVMs that execute the user code have for example a depend

Re: taskmanager memory leak

2016-07-20 Thread Stephan Ewen
Hi! In order to answer this, we need a bit more information. Here are some followup questions: - Did you submit any job to the cluster, or is the memory just growing even on an idle TaskManager? - If you are running a job, do you use the RocksDB state backend, of the FileSystem state backend?

Re: env.readFile with enumeratenestedFields

2016-07-20 Thread Kostas Kloudas
Hi Flavio, As Aljoscha pointed out the problem must be solved now. The changes are already in the master. If there is any issue let us know. Kostas > On Jul 20, 2016, at 6:29 PM, Aljoscha Krettek wrote: > > Hi, > the configuration has to be passed using > env.readFile(...).withParameters(ifCo

Re: Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Maximilian Michels
Good catch! That should do it if you have access to the local storage of the JobManager. On Wed, Jul 20, 2016 at 5:25 PM, Aljoscha Krettek wrote: > Hi, > in the JobManager log there should be a line like this: > 2016-07-20 17:19:00,552 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor

Re: env.readFile with enumeratenestedFields

2016-07-20 Thread Aljoscha Krettek
Hi, the configuration has to be passed using env.readFile(...).withParameters(ifConf). The InputFormat will then be properly configured at runtime. However, Kostas just enhanced the FileInputFormats to allow setting the parameters directly on the input format. In 1.1-SNAPSHOT and the upcoming 1.1

taskmanager memory leak

2016-07-20 Thread 김동일
I've set up cluster(stand alone). Taskmanager consumes memory over the Xmx property and it grows up continuously. I saw this link( http://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3ccak2vtervsw4muboc4swix0mr6y9bijznjuypf6_f9f0g9-_...@mail.gmail.com%3E ). So i set the taskmanager.memo

Fwd: taskmanager memory leak

2016-07-20 Thread 김동일
oh. my flink version is 1.0.3. -- Forwarded message -- From: 김동일 Date: Thu, Jul 21, 2016 at 12:52 AM Subject: taskmanager memory leak To: user@flink.apache.org I've set up cluster(stand alone). Taskmanager consumes memory over the Xmx property and it grows up continuously. I sa

env.readFile with enumeratenestedFields

2016-07-20 Thread Flavio Pompermaier
Hi to all, in my job I'm doing the following to recursively read the files inside a dir: TextInputFormat inputFormat = new TextInputFormat(new Path(inputDir)); org.apache.flink.configuration.Configuration ifConf = new org.apache.flink.configuration.Configuration(); ifConf.setBool

Re: Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Aljoscha Krettek
Hi, in the JobManager log there should be a line like this: 2016-07-20 17:19:00,552 INFO org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory /some/dir for web frontend JAR file uploads if you manually delete the offending jar file from that directory it could solve your problem

Re: Class loading and job versioning

2016-07-20 Thread Michal Budzyn
Hi Stephan, IMO the platform needs better jobs isolation. All this shading shouldn't be required at all. Michal On 20.07.2016 16:18, Stephan Ewen wrote: Hi Michael! The only safe way in Java to isolate the user code from the platform would be to completely run them in different JVMs.

Re: Class loading and job versioning

2016-07-20 Thread Michal Budzyn
Hi Stephan, IMO the platform needs better jobs isolation. All this shading shouldn't be required at all. Michal On 20.07.2016 16:18, Stephan Ewen wrote: Hi Michael! The only safe way in Java to isolate the user code from the platform would be to completely run them in different JVMs.

Re: Aggregate events in time window

2016-07-20 Thread Aljoscha Krettek
Which is of course only available in 1.1-SNAPSHOT or the upcoming 1.1 release. :-) On Tue, 19 Jul 2016 at 22:32 Till Rohrmann wrote: > Hi Dominique, > > your problem sounds like a good use case for session windows [1, 2]. If > you know that there is only a maximum gap between your request and re

Re: Arrays values in keyBy

2016-07-20 Thread Stephan Ewen
I thing we can simply add this behavior when we use the TypeComparator in the keyBy() function. It can implement the hashCode() as a deepHashCode on array types. On Mon, Jun 13, 2016 at 12:30 PM, Ufuk Celebi wrote: > Would make sense to update the Javadocs for the next release. > > On Mon, Jun 1

Re: Data point goes missing within iteration

2016-07-20 Thread Biplob Biswas
Hi Max, Yeah I tried that and its definitely better. Only a few points go missing compared to a huge amount in the beginning. For now, its good for me and my work. Thanks a lot for the workaround. -Biplob -- View this message in context: http://apache-flink-user-mailing-list-archive.233605

Re: Elasticsearch connector and number of shards

2016-07-20 Thread Maximilian Michels
The connector doesn't cover this use case. Through the API you need to use the IndicesAdminClient: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-admin-indices.html Otherwise Elasticsearch creates an index with shards automatically. We could add support for configuring

Re: DataStreamUtils not working properly

2016-07-20 Thread subash basnet
Hello maximilian, Then as you said, in starting question of thread: Iterator iter = DataStreamUtils.collect(centroids); Collection testCentroids = Lists.newArrayList(iter); for(Centroid c: testCentroids){ System.out.println(c); } The above waits until the iterator doesn't have data anymore, and

Re: Class loading and job versioning

2016-07-20 Thread Stephan Ewen
Hi Michael! The only safe way in Java to isolate the user code from the platform would be to completely run them in different JVMs. Other than that, Max's method to ensure the correct instantiation should work in most cases. We also continuously try to have fewer dependencies in Flink, to that t

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
Everything works as expected. The while loop blocks until the iterator doesn't have data anymore (=the program has ended). All data will end up in the ArrayList. The latter exception comes from a duplicate call to execute(). Actually, collect() internally calls execute() because the job has to run

Re: Class loading and job versioning

2016-07-20 Thread Maximilian Michels
Sure. No Problem. The issue is a bit more involved. You're right, the user classes have precedence over the Flink classpath. So your classes were probably loaded fine. However, the user code also calls Flink code which can use a library version different from the job jar library. And boom, it cras

Re: Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Maximilian Michels
Hi Gary, That is a bug. The main method might actually be there but it fails to load a class: > Caused by: java.lang.ClassNotFoundException: > org.shaded.apache.flink.streaming.api.functions.source.SourceFunction It looks like internal Flink classes have been shaded but not included in the job j

Re: DataStreamUtils not working properly

2016-07-20 Thread subash basnet
hello maximilian, Thanks! I learned new thing today :). But my problem still exists. Your example has little data and it works fine. But in my datastream I have set timeWindow as Time.seconds(5). What I found out is, if I print as below as your example: Iterator iter = DataStreamUtils.collect(cen

Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Gary Yao
Hi all, I accidentally packaged a Flink Job for which the main method could not be looked up. This breaks the Flink Dashboard's job submission page (no jobs are displayed). I opened a ticket: https://issues.apache.org/jira/browse/FLINK-4236 Is there a way to recover from this without restartin

Re: Class loading and job versioning

2016-07-20 Thread Michal Budzyn
Thanks for the prompt replay. You are right. The conflict was between "com.fasterxml.jackson.core" libs. I am just wondering. If the the jobs were separted from the platform, the jobs libs should have precedence and no versioning problem should have happened. Regards, Michal On 20.07.2016

Re: Logical plan optimization with Calcite

2016-07-20 Thread Timo Walther
Max is right, Flink uses Calcite rules for optimization. The following rules are applied so far: https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/rules/FlinkRuleSets.scala The filter condition you metioned will not be optimize

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
Ah, now I see where the problem lies. You're reusing the Iterator which you have already used in the for loop. You can only iterate over the elements once! This is the nature of the Java Iterator and DataStreamUtils.collect(..) returns an iterator. On Wed, Jul 20, 2016 at 1:11 PM, subash basnet w

Re: Class loading and job versioning

2016-07-20 Thread Maximilian Michels
Hi Michal, I couldn't find Joda in flink-dist. Possibly there is some other clash? There are two potential issues here: 1) Flink shades some libraries (Guava) but not all. If you use a version of a library in your Flink job which doesn't match the one in flink-dist, you're bound for trouble. 2)

Re: Data point goes missing within iteration

2016-07-20 Thread Maximilian Michels
@Paris Thanks for the prompt feedback! I really have to check out your PR :) @Biblop: If I understand correctly, a possible workaround in the meantime seems to be to use `setBufferTimeout(0)` on your StreamExecutionEnvironment. Could you try that? On Wed, Jul 20, 2016 at 12:30 PM, Paris Carbone

Re: Understanding iteration error message

2016-07-20 Thread Maximilian Michels
Hi, It's stating that you can't use a DataStream which was not part of the iteration. It works with `newCentroids` because it is part of the loop. The only way to get the centroids DataStream in, is to union/join it with the `newCentroids` stream. Cheers, Max On Wed, Jul 20, 2016 at 11:33 AM, s

Re: Logical plan optimization with Calcite

2016-07-20 Thread Maximilian Michels
Hi Gallenvara, As far as I know, the Table API is now translated into a Calcite plan which is then optimized according to Calcite's optimization rules. Cheers, Max On Wed, Jul 20, 2016 at 7:24 AM, gallenvara wrote: > > Hello, everyone. I'm new to Calcite and have some problems with it. Flink >

Re: DataStreamUtils not working properly

2016-07-20 Thread subash basnet
Hello Maximilian, Thank's for the update. Yup it works in the example you gave. I checked with collection also it works. But not in my datastream case after the collection. DataStream centroids = *newCentroidDataStream*.map(new TupleCentroidConverter()); Iterator iter = DataStreamUtils.collect(cen

Class loading and job versioning

2016-07-20 Thread Michal Budzyn
Hi all, We had a class versioning problem within Flink Job. The job uses Joda 2.6, but the flink-dist 1.0.3 packages 2.5. The problem was solved by relocating job classes with shade plug-in. Does flink separate jobs from each other to avoid class conflicts between them and the platform ? Is job

Re: Data point goes missing within iteration

2016-07-20 Thread Paris Carbone
This is possibly related to the way the queue between StreamIterationTail and Head is currently implemented. I think this part is a bit prone to records loss when things get wacky and backpressure kicks in (but at least it avoids deadlocks, right?). I don’t have the time availability to look int

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
Just tried the following and it worked: public static void main(String[] args) throws IOException { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final DataStreamSource source = env.fromElements(1, 2, 3, 4); source.print(); final Iterator iter

Re: Data point goes missing within iteration

2016-07-20 Thread Maximilian Michels
CC Gyula and Paris in case they might want to help out. On Tue, Jul 19, 2016 at 11:43 AM, Biplob Biswas wrote: > Hi Ufuk, > > Thanks for the update, is there any known way to fix this issue? Any > workaround that you know of, which I can try? > > > > -- > View this message in context: > http://a

Understanding iteration error message

2016-07-20 Thread subash basnet
Hello all, When I execute the below streaming code: DataStream *centroids* = newCentroidDataStream.map(new TupleCentroidConverter()); ConnectedIterativeStreams loop = *points*.iterate().withFeedbackType(Centroid.class); DataStream *newCentroids* = loop.flatMap(new SelectNearestCenter(10)).map(new

Re: Using Kafka and Flink for batch processing of a batch data source

2016-07-20 Thread Till Rohrmann
At the moment there is also no batch source for Kafka. I'm also not so sure how you would define a batch given a Kafka stream. Only reading till a certain offset? Or maybe until one has read n messages? I think it's best to write the batch data to HDFS or another batch data store. Cheers, Till O