Re: Flink Scala performance

2015-07-15 Thread Vinh June
I ran it on local, from terminal. And it's the Word Count example so it's small -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Scala-performance-tp2065p2074.html Sent from the Apache Flink User Mailing List archive. mailing list archi

Re: Flink Kafka example in Scala

2015-07-15 Thread Anwar Rizal
The compilation error is because you don't define dependencies to flink streaming scala. In SBT , you define something like: libraryDependencies += "org.apache.flink" % "flink-streaming-scala" % "0.9.0" On Thu, Jul 16, 2015 at 6:36 AM, Wendong wrote: > I tried, but got error: > > [error] Test

Re: Flink Scala performance

2015-07-15 Thread Aljoscha Krettek
Hi, that depends. How are you executing the program? Inside an IDE? By starting a local cluster? And then, how big is your input data? Cheers, Aljoscha On Wed, 15 Jul 2015 at 23:45 Vinh June wrote: > I just realized that Flink program takes a lot of time to run, for example, > just the simple w

Re: Flink Kafka example in Scala

2015-07-15 Thread Wendong
I tried, but got error: [error] TestKafka.scala:11: object scala is not a member of package org.apache.flink.streaming.api [error] import org.apache.flink.streaming.api.scala._ So I switched back to my original import statements. Now I changed SimpleStringSchema to JavaDefaultStringSchema in add

Re: Flink Kafka example in Scala

2015-07-15 Thread Anwar Rizal
Have you tried to replace import org.apache.flink.streaming.api.environment._ import org.apache.flink.streaming.connectors.kafka import org.apache.flink.streaming.connectors.kafka.api._ import org.apache.flink.streaming.util.serialization._ With import org.apache.flink.streaming.api.scala._ imp

Flink Kafka example in Scala

2015-07-15 Thread Wendong
Hello, Does anyone have a simple example of Flink Kafka written in Scala? I've been struggling to make my test program working. Below is my program which has error in addSink (the part of KafkaWordCountProducer is copied from Spark sample program): import java.util.HashMap import org.apache.kaf

Re: How can handles Exist ,not Exist query on flink

2015-07-15 Thread hagersaleh
please help I want example -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-can-handles-Exist-not-Exist-query-on-flink-tp1939p2068.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: how can handles Any , All query on flink

2015-07-15 Thread hagersaleh
very thanks -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/how-can-handles-Any-All-query-on-flink-tp1997p2067.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: open multiple file from list of uri

2015-07-15 Thread Michele Bertoni
uhm, it doesn’t seem to work: it calls the configure() method that checks if filePath is null and throws an exception Actually i set that field only during the createInputSplits that is some steps later Il giorno 15/lug/2015, alle ore 13:16, Stephan Ewen mailto:se...@apache.org>> ha scritto:

Flink Scala performance

2015-07-15 Thread Vinh June
I just realized that Flink program takes a lot of time to run, for example, just the simple word count example in 0.9 takes 18s to run on my laptop (mbp mac os 10.9, i5, 8gb ram, ssd). Any one can explain this / suggest a work around ? -- View this message in context: http://apache-flink-user-m

Re: Flink Kafka runtime error

2015-07-15 Thread Wendong
Just found a workaround. I downloaded kafka_2.10-0.8.2.1.jar and flink-connector-kafka-0.9.0.jar, then put them into $FLINK_HOME/lib/. Now the runtime error is gone. But this is just a workaound. I believe there is a better solution. Wendong -- View this message in context: http://apache-flink

Flink Kafka runtime error

2015-07-15 Thread Wendong
Hello, I am using Flink 0.9, Scala 2.10.4, Kafka 0.8.2.1 and trying to consume Kafka messages in Flink. Here is the build.sbt: scalaVersion := "2.10.4" libraryDependencies += "org.apache.flink" % "flink-connector-kafka" % "0.9.0" exclude("org.apache.kafka", "kafka_${scala.binary.version}") li

Re: How to cancel a Flink DataSource from the driver code?

2015-07-15 Thread Stephan Ewen
Hi! You can also cancel jobs via the command line. See here: https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/cli.html There is also a way to do that programmatically, from Java or Scala. Greetings, Stephan On Wed, Jul 15, 2015 at 4:58 PM, LINZ, Arnaud wrote: > Hi Roger, > >

RE: How to cancel a Flink DataSource from the driver code?

2015-07-15 Thread LINZ, Arnaud
Hi Roger, In fact I am implementing another use case than the one you know about, with more sources than Kafka: we now also use Flink in the BI team (which I belong to). The problem with the web interface is that it is not easily scriptable and to my understanding it does not allow cleaning co

Re: problem with union

2015-07-15 Thread Maximilian Michels
I was able to reproduce this problem. It turns out, this has already been fixed in the snapshot version: https://issues.apache.org/jira/browse/FLINK-2229 The fix will be included in the upcoming 0.9.1 release. Thank you again for reporting! Kind regards, Max On Wed, Jul 15, 2015 at 11:33 AM, Max

Re: Order groups by their keys

2015-07-15 Thread Fabian Hueske
Yes, going to parallelism 1 is another option but you don't have to use a fake-reduce to enforce sorting. You can simply do: DataSet> result = ... result .sortPartition(1, Order.ASCENDING).setParallelism(1) // sort on first String field .output(...); Fabian 2015-07-15 15:32 GMT+02:00 Matthia

Re: Order groups by their keys

2015-07-15 Thread Matthias J. Sax
Hi Robert, global sorting of the final output is currently no supported by Flink out-of-the-box. The reason is, that a global sort requires all data to be processed by a single node (what contradicts data parallelism). For small output, you could use a final "reduce" with no key (ie, all data go

Re: Order groups by their keys

2015-07-15 Thread Fabian Hueske
Hi Robert, there are two issues involved here. 1) Flink does not support totally ordered paralllel output out-of-the box. Fully sorting data in parallel requires range partitioning which requires some knowledge of the data (distribution of the key values) to produce balanced partitions. Flink doe

Order groups by their keys

2015-07-15 Thread Robert Schmidtke
Hey everyone, I'm currently trying to implement TPC-H Q1 and that involves ordering of results. Now I'm not too familiar with the transformations yet, however for the life of me I cannot figure out how to get it to work. Consider the following toy example: final ExecutionEnvironment env = Executi

Re: open multiple file from list of uri

2015-07-15 Thread Stephan Ewen
If you want to work without the placeholder, simply do: "env.createInput(new myDelimitedInputFormat(parser)(paths)) The "createInputSplits()" method looks good. Greetings, Stephan On Tue, Jul 14, 2015 at 11:42 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Ok thank you, now I

Re: problem with union

2015-07-15 Thread Maximilian Michels
Hi Michele, Thanks for reporting the problem. It seems like we changed the way we compare generic types like your GValue type. I'm debugging that now. We can get a fix in for the 0.9.1 release. Cheers, Max On Tue, Jul 14, 2015 at 5:35 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote:

Re: Sort Benchmark infrastructure

2015-07-15 Thread Hawin Jiang
Hi George and Mike Thanks for your information. Did you use 186 i2.8xlarge servers for testing? Total one hour cost = 186 * 6.82 = $1,268.52. Do you know any person or company can sponsor this? For our test approach, I have checked an industry standard from big data bench(http://prof.ict.ac.cn/