Re: Apache Flink transactions

2015-06-05 Thread Hawin Jiang
Thanks all Actually, I want to know more info about Flink SQL and Flink performance Here is the Spark benchmark. Maybe you already saw it before. https://amplab.cs.berkeley.edu/benchmark/ Thanks. Best regards Hawin On Fri, Jun 5, 2015 at 1:35 AM, Fabian Hueske wrote: > If you want to appe

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Stephan Ewen
Can you manually load the driver class, with "Class.forName(...)", or does that yield a "ClassNotFoundException" ? On Fri, Jun 5, 2015 at 11:10 PM, Flavio Pompermaier wrote: > in the fat jar > On 5 Jun 2015 19:28, "Stephan Ewen" wrote: > >> In which way is the driver in the classpath? >> >> -

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Flavio Pompermaier
in the fat jar On 5 Jun 2015 19:28, "Stephan Ewen" wrote: > In which way is the driver in the classpath? > > - fat jar? > - in the nested /out folder in the slim jar? > > > > On Fri, Jun 5, 2015 at 7:23 PM, Flavio Pompermaier > wrote: > >> Actually I just need to load it in the main method (jo

Re: scaling flink

2015-06-05 Thread Stephan Ewen
I think the main settings for the network are the number of network buffers, to make sure the shuffle runs smooth. Flink uses the netty library for the network stack. It starts 2*cores network threads by default, which is mostly good. If you have many containers on each machine, and the containers

Re: scaling flink

2015-06-05 Thread Bill Sparks
I did the translation, thanks. The code is rerunning. Are there any network settings that can effect performance. As my email suggest we are running on a Cray system, it's bigger brother uses a proprietary network and any tuning hints might make a difference. Thanks again. Bill -- Jonathan (

Re: scaling flink

2015-06-05 Thread Stephan Ewen
It was supposed to mean "please PING us" ;-) On Fri, Jun 5, 2015 at 7:21 PM, Stephan Ewen wrote: > Hi Bill! > > For the WordCount case, these numbers are not unexpected. Flink does not > yet use a hash aggregator for the "reduce(v1, v2)" call, but uses a > sort-based aggregation for that. Flink'

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Stephan Ewen
In which way is the driver in the classpath? - fat jar? - in the nested /out folder in the slim jar? On Fri, Jun 5, 2015 at 7:23 PM, Flavio Pompermaier wrote: > Actually I just need to load it in the main method (job manager) before > calling any flink operation, I retrieve the records in a

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Flavio Pompermaier
Actually I just need to load it in the main method (job manager) before calling any flink operation, I retrieve the records in a mysql table because they contain the path of files I'll need to read. Nothing more nothing less On 5 Jun 2015 19:06, "Robert Metzger" wrote: > Sure. > > So the DriverMa

Re: scaling flink

2015-06-05 Thread Stephan Ewen
Hi Bill! For the WordCount case, these numbers are not unexpected. Flink does not yet use a hash aggregator for the "reduce(v1, v2)" call, but uses a sort-based aggregation for that. Flink's sort aggregations are very reliable and very scalable compared to many hash aggregations, but often more ex

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Robert Metzger
Sure. So the DriverManager has a static variable called "registeredDrivers". When DriverManager.getConnection() is called, the method is looking up if an registered driver for that connection (in this case "mysql") is available. For drivers to be in that list, they have to register themselves usi

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Flavio Pompermaier
HI Robert, In the main method I connect to a mysql table that acts as a data-source repository that I use to know which dataset I need to load. All mysql classes are present in the shaded jar. Could you explain a little bit more in detail the solution to fix this problem please? Sorry but I didn't

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Robert Metzger
Hi Stefano, I doubt that there are conflicting dependencies because Flink does not contain MySQL dependencies. Are you using Flink's JDBCInputFormat or custom code? For drivers to register at java.sql's DriverManager, their classes need to be loaded first. To load a class, you need to call Class.

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Stefano Bortoli
Hi Robert, I answer on behalf of Flavio. He told me the driver jar was included. Smells lik class-loading issue due to 'conflicting' dependencies. Is it possible? Saluti, Stefano 2015-06-05 16:24 GMT+02:00 Robert Metzger : > Hi, > > is the MySQL driver part of the Jar file that you've build? >

scaling flink

2015-06-05 Thread Bill Sparks
Hi. I'm running some comparisons between flink, MRv2, and spark(1.3), using the new Intel HiBench suite. I've started with the stock workcount example and I'm seeing some numbers which are not where I thought I'd be. So the question I have is what the the configuration parameters which can aff

Re: Cannot instantiate Mysql connection

2015-06-05 Thread Robert Metzger
Hi, is the MySQL driver part of the Jar file that you've build? On Fri, Jun 5, 2015 at 4:11 PM, Flavio Pompermaier wrote: > Hi to all, > > I'm using a fresh build of flink-0.9-SNAPSHOT and in my flink job I set up > a mysql connection. > When I run the job from Eclipse everything is fine, > whi

Cannot instantiate Mysql connection

2015-06-05 Thread Flavio Pompermaier
Hi to all, I'm using a fresh build of flink-0.9-SNAPSHOT and in my flink job I set up a mysql connection. When I run the job from Eclipse everything is fine, while when running the job from the Web UI I get the following exception: java.sql.SQLException: No suitable driver found for jdbc:mysql:/l

Re: Working with data locality in streaming using groupBy?

2015-06-05 Thread Gyula Fóra
Hey, As for the second part of your question: If you want to apply transformations such as reduce on windows, you need to create a windowed datastream and apply your groupBy, reduce transformations on this WindowedDataStream before calling .flatten() stream.window(..).flatten() have no effect as

Re: Working with data locality in streaming using groupBy?

2015-06-05 Thread Stephan Ewen
Hi! What you can do is have a static "broker", which is essentially a map from subtaskIndex to your concurrent hash map. Then all tasks in a pipeline (who have the same subtaskIndex) will grab the same ConcurrentHashMap. You can grab the subtask index if you have a RichFunction (such as RichMapFu

Working with data locality in streaming using groupBy?

2015-06-05 Thread William Saar
Hi, How can I maintain a local state, for instance a ConcurrentHashMap, across different steps in a streaming chain on a single machine/process? Static variable? (This doesn't seem to work well when running locally as it gets shared across multiple instances, a common "pipeline" store would be h

Re: Exception with hbase

2015-06-05 Thread Hilmi Yildirim
Hi, I fround the reason for that exception. In the TableInputFormat there is the method "configure". This method creates a connection to an hbase table with "this.table = new HTable(hConf, getTableName());". This method is deprecated in the newer versions of hbase-client. You have to use Conn

Exception with hbase

2015-06-05 Thread Hilmi Yildirim
Hi, I am using the SNAPSHOT version of flink to read from hbase and it is working. But we are using a newer version of hbase and if I set the hbase-client dependency to a newer version, then I get the following exception: Caused by: java.util.concurrent.RejectedExecutionException: Task org.a

Re: NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2015-06-05 Thread Robert Metzger
That should not happen. Which function (map, join, ..) are you using? Are you using the the batch or the streaming API? Which version of Flink are you using? On Fri, Jun 5, 2015 at 11:35 AM, Ashutosh Kumar wrote: > Thanks Robert.I tried this . It does not throw NotSerializableException > excepti

Re: NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2015-06-05 Thread Ashutosh Kumar
Thanks Robert.I tried this . It does not throw NotSerializableException exception.But the open method is not getting called. On Fri, Jun 5, 2015 at 1:58 PM, Robert Metzger wrote: > Hi, > > I guess you have a user function with a field for the scripting engine. > Can you change your user function

Re: Apache Flink transactions

2015-06-05 Thread Fabian Hueske
If you want to append data to a data set that is store as files (e.g., on HDFS), you can go for a directory structure as follows: dataSetRootFolder - part1 - 1 - 2 - ... - part2 - 1 - ... - partX Flink's file format supports recursive directory scans such that you can ad

Re: NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2015-06-05 Thread Robert Metzger
Hi, I guess you have a user function with a field for the scripting engine. Can you change your user function into a Rich* function, initialize the scripting engine in the open() method and make the field transient? That should resolve it. On Fri, Jun 5, 2015 at 10:25 AM, Ashutosh Kumar wrote:

NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2015-06-05 Thread Ashutosh Kumar
I am trying to use java script engine to execute some rules on data set. But it is throwing NotSerializableException for jdk.nashorn.api.scripting.NashornScriptEngine.Not sure how to resolve this. Thanks Caused by: java.io.NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine a

Re: Apache Flink transactions

2015-06-05 Thread Aljoscha Krettek
Hi, I think the example could be made more concise by using the Table API. http://ci.apache.org/projects/flink/flink-docs-master/libs/table.html Please let us know if you have questions about that, it is still quite new. On Fri, Jun 5, 2015 at 9:03 AM, hawin wrote: > Hi Aljoscha > > Thanks for y

Re: Apache Flink transactions

2015-06-05 Thread hawin
Hi Aljoscha Thanks for your reply. Do you have any tips for Flink SQL. I know that Spark support ORC format. How about Flink SQL? BTW, for TPCHQuery10 example, you have implemented it by 231 lines of code. How to make that as simple as possible by flink. I am going to use Flink in my future