Re: Creating Graphs from DataStream in Flink Gelly

2015-11-02 Thread UFUOMA IGHOROJE
Hi Vasia & Andra, Thanks for the pointers. I guess I have to implement basic support for dynamic streaming graphs. I’ll share my solutions or issues as I go on. Best, Ufuoma > On 02 Nov 2015, at 17:51, Vasiliki Kalavri wrote: > > Hi Ufuoma, > > Gelly doesn't support dynamic streaming graph

Re: Create triggers

2015-11-02 Thread Stephan Ewen
You can also try and make the decision on the client. Imagine a program like this. long count = env.readFile(...).filter(...).count(); if (count > 5) { env.readFile(...).map().join(...).reduce(...); } else { env.readFile(...).filter().coGroup(...).map(...); } On Mon, Nov 2, 2015 at 1:35 AM

Re: [IE] Re: passing environment variables to flink program

2015-11-02 Thread Stephan Ewen
Thanks! Let's see if we can get that feature added soon... On Mon, Nov 2, 2015 at 3:02 PM, Jian Jiang wrote: > 1.Sure. I have created FLINK-2954 > . > > > > Thanks > > jackie > > > > *From:* ewenstep...@gmail.com [mailto:ewenstep...@gmail.co

RE: [IE] Re: passing environment variables to flink program

2015-11-02 Thread Jian Jiang
1.Sure. I have created FLINK-2954. Thanks jackie From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan Ewen Sent: Monday, November 02, 2015 4:35 PM To: user@flink.apache.org Subject: [IE] Re: passing environment var

Re: Running on a firewalled Yarn cluster?

2015-11-02 Thread Niels Basjes
My take on those 3 options: a) Bad idea; people need to be able to automate their jobs and run them from the command line (i.e. bash, cron). b) Bad idea; Same reason you gave. In addition I do not want to reserve an open 'flink port' for every user who wants to run a job. c) From my perspective thi

Re: passing environment variables to flink program

2015-11-02 Thread Stephan Ewen
Ah, okay, I confused the issue. The environment variables would need to be defined or exported in the environment that spawns TaskManager processes. I think there is nothing for that in Flink yet, but it should not be hard to add that. Can you open an issue for that in JIRA? Thanks, Stephan On

Re: passing environment variables to flink program

2015-11-02 Thread Jian Jiang
This has less to do with JNI but much to do how to pass custom environment variables. We are using YARN and the data is in HDFS. I have run the JNI program in local mode within Eclipse with no problem since I can set up the environment variables easily by using run configurations. Just I don't kno

Re: passing environment variables to flink program

2015-11-02 Thread Stephan Ewen
Hi! What kind of setup are you using, YARN or standalone? In both modes, you should be able to pass your flags via the config entry "env.java.opts" in the flink-conf.yaml file. See here https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#other We have never passed library pa

Re: Could not load the task's invokable class.

2015-11-02 Thread Stephan Ewen
Hi! You probably miss some jars in your classpath. Usually Maven/SBT resolve that automatically, so I assume you are manually constructing the classpath here? For this particular error, you probably miss the "flink-streaming-core" (0.9.1) / "flink-streaming-java" (0.10) in your classpath. I woul

passing environment variables to flink program

2015-11-02 Thread youarehow
I am doing a Flink evaluation against Spark. My program is a JNI application and requires LD_LIBRARY_PATH to be set among other needed variables. In Spark I can do that multiple ways. One way is to use --conf 'spark.executorEnv.XXX=blah'. It works great. How do I do that in Flink? thanks jacki

Could not load the task's invokable class.

2015-11-02 Thread Saleh
I am trying to run a Java flink streaming job (basic word count with Kafka) locally. But I am stuck with the following error. Any ideas on the cause of the error? Job execution switched to status RUNNING. 10/29/2015 11:15:54 Custom Source -> Flat Map -> Map(1/8) switched to SCHEDULED 10/29/

Re: Hi, question about orderBy two columns more

2015-11-02 Thread Philip Lee
​​ You are welcome.​ I am wondering if there is a way of noticing when you update RC solving the *sortPartition* problem and then how we could apply the new version like just downloading the new relased Flink version? Thanks, Phil On Mon, Nov 2, 2015 at 2:09 PM, Fabian Hueske wrote: > Hi

Re: Creating Graphs from DataStream in Flink Gelly

2015-11-02 Thread Vasiliki Kalavri
Hi Ufuoma, Gelly doesn't support dynamic streaming graphs yet. The project Andra has linked to is a prototype for *one-pass* streaming graph analytics, i.e. no graph state is maintained. If you would like to keep and maintain the graph state in your streaming program, you would have to implement

Re: How best to deal with wide, structured tuples?

2015-11-02 Thread Johann Kovacs
Hi, thanks once again for those pointers. I did a bit of experimenting the past couple of days and came to the following conclusions: 1. Unfortunately, I don't think I can get away with option 1 (generating POJOs on runtime). At least not without generating lots of boiler plate code, because I'd li

Re: Running on a firewalled Yarn cluster?

2015-11-02 Thread Robert Metzger
Hi Niels, so the problem is that you can not submit a job to Flink using the "/bin/flink" tool, right? I assume Flink and its TaskManagers properly start and connect to each other (the number of TaskManagers is shown correctly in the web interface). I see the following solutions for the problem a

Running on a firewalled Yarn cluster?

2015-11-02 Thread Niels Basjes
Hi, Here at work our security guys have chosen (long time ago) to only allow the firewalls to have the ports open that needed (I say: good call!). For the Yarn cluster this includes things like the proxy to see the application manager of an application. For everything we've done so far (i.e. mr/pi

Re: Creating Graphs from DataStream in Flink Gelly

2015-11-02 Thread Andra Lungu
Hi, There is a separate project related to graph streaming. It's called gelly-streaming. And, if you look here: https://github.com/vasia/gelly-streaming/blob/master/src/main/java/org/apache/flink/graph/streaming/GraphStream.java You can find a constructor which creates a graph from a DataStream o

Creating Graphs from DataStream in Flink Gelly

2015-11-02 Thread UFUOMA IGHOROJE
Is there some support for creating Graphs from DataStream in Flink Gelly? My use case is to create dynamic graphs, where the graph topology can be updated from a stream of data. From the API, all I can find is creating graphs from DataSets. Best, Ufuoma signature.asc Description: Message sig

Re: Hi, question about orderBy two columns more

2015-11-02 Thread Fabian Hueske
Hi Philip, thanks for reporting the issue. I just verified the problem. It is working correctly for the Java API, but is broken in Scala. I will work on a fix and include it in the next RC for 0.10.0. Thanks, Fabian 2015-11-02 12:58 GMT+01:00 Philip Lee : > Thanks for your reply, Stephan. > >

Re: Hi, question about orderBy two columns more

2015-11-02 Thread Philip Lee
Thanks for your reply, Stephan. So you said this is same as SQL but I got this result from this code. This is what we did not expect, right? val inputTuple = Seq((2,5),(2,3),(2,4),(3,2),(3,6)) val outputTuple = env.fromCollection(inputTuple) .sortPartition(0,Order.DESCENDING) //.sortPartitio

Re: Consistent (hashing) keyBy over multiple time or streaming windows

2015-11-02 Thread Leonard Wolters
Hi Aljoscha, Thanks for the quick response. I've seen the Google Data Flow presentation @ Flink forward and understand the concepts behind it (which are also supported by Flink). I will further look into stack overflow and let you know if I have some further questions. Once again, thanks,

Re: Create triggers

2015-11-02 Thread Fabian Hueske
Hi Giacomo, there is no direct support for use cases like yours. The main issue that it is not possible to modify the execution of a submitted program. Once it is running, it cannot be adapted. It is also not possible to inject a condition into the data flow logic, e.g., if this happens follow thi

Re: Consistent (hashing) keyBy over multiple time or streaming windows

2015-11-02 Thread Aljoscha Krettek
Hi Leonard, I’m afraid you might be thinking about windows as they are supported by Spark Streaming. There windows are quite limited. In Flink you don’t necessarily have to window elements by time since Flink does not collect data in mini-batches before processing. Everything is continuously pro

Consistent (hashing) keyBy over multiple time or streaming windows

2015-11-02 Thread Leonard Wolters
Hi, I was wondering if Flink already has implemented some sort of consistent keyBy mapping over multiple windows. The underlying idea is to 'sessionize' incoming events over time (i.e. multiple streaming windows) on the same partitions. As one can understand I want to avoid heavy shuffling over