Re: How to preserve KeyedDataStream

2015-11-03 Thread Martin Neumann
Here is some code of my current solution, if there is some better way of doing it let me know. KeyedStream hostStream = inStream .keyBy(t -> t.getHost()); KeyedStream,String> freqStream = hostStreams .timeWindow(Time.of(WINDOW_SIZE, TimeUnit.MILLISECONDS)) .fold(new Tuple2(0, ""), new Count()

Re: Running continuously on yarn with kerberos

2015-11-03 Thread Niels Basjes
Update on the status so far I suspect I found a problem in a secure setup. I have created a very simple Flink topology consisting of a streaming Source (the outputs the timestamp a few times per second) and a Sink (that puts that timestamp into a single record in HBase). Running this on a non-

Re: How to preserve KeyedDataStream

2015-11-03 Thread Aljoscha Krettek
Hi, where are you storing the results of each window computation to? Maybe you could also store it from inside a custom WindowFunction where you just count the elements and then store the results. On the other hand, adding a (1) field and doing a window reduce (à la WordCount) is going to be wa

Re: Could not load the task's invokable class.

2015-11-03 Thread Saleh
Hi Stephan, Thanx for the response. Actually I am using Maven and below are all Flink dependencies I have in my pom.xml file. / org.apache.flink flink-core 0.9.1 provided

Re: Running on a firewalled Yarn cluster?

2015-11-03 Thread Niels Basjes
Great! I'll watch the issue and give it a test once I see a working patch. Niels Basjes On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels wrote: > Hi Niels, > > Thanks a lot for reporting this issue. I think it is a very common setup > in corporate infrastructure to have restrictive firewall

Re: Running on a firewalled Yarn cluster?

2015-11-03 Thread Maximilian Michels
Hi Niels, Thanks a lot for reporting this issue. I think it is a very common setup in corporate infrastructure to have restrictive firewall settings. For Flink 1.0 (and probably in a minor 0.10.X release) we will have to address this issue to ensure proper integration of Flink. I've created a JIR

How to preserve KeyedDataStream

2015-11-03 Thread Martin Neumann
Hej, I want to do the following thing: 1. Split a Stream of incoming Logs by host address. 2. For each Key, create time based windows 3. Count the number of items in the window 4. Feed it into a statistical model that is maintained for each host Since I don't have fields to sum upon, I use a (win

Re: Running on a firewalled Yarn cluster?

2015-11-03 Thread Niels Basjes
Hi, I forgot to answer your other question: On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger wrote: > so the problem is that you can not submit a job to Flink using the > "/bin/flink" tool, right? > I assume Flink and its TaskManagers properly start and connect to each > other (the number of Task

Re: Long ids from String

2015-11-03 Thread Martin Junghanns
Hi, Sounds like RDF problems to me :) To build an index you could do the following: triplet := (0) build set of all triplets (with strings) triplets = triplets1.union(triplets2) (1) assign unique long ids to each vertex vertices = triplets.flatMap() => [,,...].distinct() vertexWithID = vertic

Re: Long ids from String

2015-11-03 Thread Fabian Hueske
Converting String ids into Long ids can be quite expensive, so you should make sure it pays off. The save way to do it is to get all unique String ids (project, distinct), do zipWithUniqueId, and join all DataSets that have the String id with the new long id. So it is a full sort for the unique an

Re: Long ids from String

2015-11-03 Thread Flavio Pompermaier
Hi Martin, thanks for the suggestion but unfortunately in my use case I have another problem: I have to join triplets when f2==f0..is there any way to translate also references? i.e. if I have 2 tuples , when I apply that function I obtain something like <1,>,<2,>. I'd like to be able to join the

Re: Long ids from String

2015-11-03 Thread Martin Junghanns
Hi Flavio, If you just want to assign a unique Long identifier to each element in your dataset, you can use the DataSetUtils.zipWithUniqueId() method [1]. Best, Martin [1] https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L131

Long ids from String

2015-11-03 Thread Flavio Pompermaier
Hi to all, I was reading the thread about the Neo4j connector and an old question came to my mind. In my Flink job I have Tuples with String ids that I use to join on that I'd like to convert to long (because Flink should improve quite a lot the memory usage and the processing time if I'm not wro