Re: data flow example on cluster

2015-09-30 Thread Till Rohrmann
It's described here: https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/setup_quickstart.html#run-example Cheers, Till On Wed, Sep 30, 2015 at 8:24 AM, Lydia Ickler wrote: > Hi all, > > I want to run the data-flow Wordcount example on a Flink Cluster. > The local execution w

Re: data flow example on cluster

2015-10-02 Thread Till Rohrmann
Hi Lydia, I think the APIs of the versions 0.8-incubating-SNAPSHOT and 0.10-SNAPSHOT are not compatible. Thus, it’s not just simply setting the dependencies to 0.10-SNAPSHOT. You also have to fix the API changes. This might not be trivial. Therefore, I’d recommend you to simply use the ALS impleme

Re: Flink program compiled with Janino fails

2015-10-05 Thread Till Rohrmann
I’m not a Janino expert but it might be related to the fact that Janino not fully supports generic types (see http://unkrig.de/w/Janino under limitations). Maybe it works of you use the untyped MapFunction type. Cheers, Till ​ On Sat, Oct 3, 2015 at 8:04 PM, Giacomo Licari wrote: > Hi guys, > I

Re: kryo exception due to race condition

2015-10-06 Thread Till Rohrmann
Hi Stefano, we'll definitely look into it once Flink Forward is over and we've finished the current release work. Thanks for reporting the issue. Cheers, Till On Tue, Oct 6, 2015 at 9:21 AM, Stefano Bortoli wrote: > Hi guys, I could manage to complete the process crossing byte arrays I > deser

Re: Scala Code Generation

2015-10-14 Thread Till Rohrmann
If you're using Scala, then you're bound to a maximum of 22 fields in a tuple, because the Scala library does not provide larger tuples. You could generate your own case classes which have more than the 22 fields, though. On Oct 14, 2015 11:30 AM, "Ufuk Celebi" wrote: > > > On 13 Oct 2015, at 16:

Re: data sink stops method

2015-10-15 Thread Till Rohrmann
Could you post a minimal example of your code where the problem is reproducible? I assume that there has to be another problem because env.execute should actually trigger the execution. Cheers, Till ​ On Thu, Oct 8, 2015 at 8:58 PM, Florian Heyl wrote: > Hey Stephan and Pieter, > That was the

Re: reduce error

2015-10-20 Thread Till Rohrmann
Hi Michele, I will look into the problem. As Ufuk said, it would be really helpful, if you could provide us with the data set. If it's problematic to share the data via the mailing list, then you could also send me the data privately. Thanks a lot for your help. Cheers, Till On Fri, Oct 16, 2015

Re: Flink+avro integration

2015-10-21 Thread Till Rohrmann
What was your problem with using Java POJOs with the Scala API? According to https://issues.apache.org/jira/browse/AVRO-1105, the progress on adding a Scala API to Avro is kind of stalling. Cheers, Till On Tue, Oct 20, 2015 at 9:06 PM, aawhitaker wrote: > One more follow up: > > There doesn't a

Re: Flink Data Stream Union

2015-10-21 Thread Till Rohrmann
Can it be that you forgot to call unionMessageStreams in your main method? Cheers, Till ​ On Wed, Oct 21, 2015 at 3:02 PM, flinkuser wrote: > Here is the strange behavior. > > Below code works in one box but not in the other. I had it working in my > laptop the whole of yesterday, but strangely

Re: Zeppelin Integration

2015-10-21 Thread Till Rohrmann
Hi Trevor, in order to use Zeppelin with a different Flink version in local mode, meaning that Zeppelin starts a LocalFlinkMiniCluster when executing your jobs, you have to build Zeppelin and change the flink.version property in the zeppelin/flink/pom.xml file to the version you want to use. If y

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Till Rohrmann
Hi Pieter, at the moment there is no support to partition a `DataSet` into multiple sub sets with one pass over it. If you really want to have distinct data sets for each path, then you have to filter, afaik. Cheers, Till On Thu, Oct 22, 2015 at 11:38 AM, Pieter Hameete wrote: > Good morning!

Re: Reading multiple datasets with one read operation

2015-10-22 Thread Till Rohrmann
I fear that the filter operations are not chained because there are at least two of them which have the same DataSet as input. However, it's true that the intermediate results are not materialized. It is also correct that the filter operators are deployed colocated to the data sources. Thus, there

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann
If not, could you provide us with the program and test data to reproduce the error? Cheers, Till On Thu, Oct 22, 2015 at 12:34 PM, Aljoscha Krettek wrote: > Hi, > but he’s comparing it to a primitive long, so shouldn’t the Long key be > unboxed and the comparison still be valid? > > My question

Re: Multiple keys in reduceGroup ?

2015-10-22 Thread Till Rohrmann
modify my objects, why object reuse isn’t working ? > > > > Best regards, > > Arnaud > > > > > > *De :* Till Rohrmann [mailto:trohrm...@apache.org] > *Envoyé :* jeudi 22 octobre 2015 12:36 > *À :* user@flink.apache.org > *Objet :* Re: Multiple keys

Re: Zeppelin Integration

2015-10-22 Thread Till Rohrmann
or Grant > Data Scientist > https://github.com/rawkintrevo > http://stackexchange.com/users/3002022/rawkintrevo > > *"Fortunate is he, who is able to know the causes of things." -Virgil* > > > On Wed, Oct 21, 2015 at 11:57 AM, Till Rohrmann > wrote: > >&

Re: Flink+avro integration

2015-10-22 Thread Till Rohrmann
In the Java API, we only support the `max` operation for tuple types where you reference the fields via indices. Cheers, Till On Thu, Oct 22, 2015 at 4:04 PM, aawhitaker wrote: > Stephan Ewen wrote > > This is actually not a bug, or a POJO or Avro problem. It is simply a > > limitation in the f

Re: Zeppelin Integration

2015-11-04 Thread Till Rohrmann
-big-data-recipe-for-the-whole-family/ > Thanks Trevor for the great tutorial! > > On Thu, Oct 22, 2015 at 4:23 PM, Till Rohrmann > wrote: > >> Hi Trevor, >> >> that’s actually my bad since I only tested my branch against a remote >> cluster

Re: Published test artifacts for flink streaming

2015-11-06 Thread Till Rohrmann
Hi Nick, I think a flatMap operation which is instantiated with your list of predicates should do the job. Thus, there shouldn’t be a need to dig deeper than the DataStream for the first version. Cheers, Till ​ On Fri, Nov 6, 2015 at 3:58 AM, Nick Dimiduk wrote: > Thanks Stephan, I'll check th

Re: Published test artifacts for flink streaming

2015-11-06 Thread Till Rohrmann
select and where operators from within such a flatMap? > > -n > > On Fri, Nov 6, 2015 at 6:19 AM, Till Rohrmann > wrote: > > Hi Nick, > > > > I think a flatMap operation which is instantiated with your list of > > predicates should do the job. Thus, ther

Re: YARN High Availability

2015-11-18 Thread Till Rohrmann
Hi Gwenhaël, do you have access to the yarn logs? Cheers, Till On Wed, Nov 18, 2015 at 5:55 PM, Gwenhael Pasquiers < gwenhael.pasqui...@ericsson.com> wrote: > Hello, > > > > We’re trying to set up high availability using an existing zookeeper > quorum already running in our Cloudera cluster. >

Re: YARN High Availability

2015-11-18 Thread Till Rohrmann
nother question : if I have multiple HA flink jobs, are there some points > to check in order to be sure that they won’t collide on hdfs or ZK ? > > > > B.R. > > > > Gwenhaël PASQUIERS > > > > *From:* Till Rohrmann [mailto:till.rohrm...@gmail.com] > *Sent:* merc

Re: Compiler Exception

2015-11-19 Thread Till Rohrmann
Hi Kien Truong, could you share the problematic code with us? Cheers, Till On Nov 18, 2015 9:54 PM, "Truong Duc Kien" wrote: > Hi, > > I'm hitting Compiler Exception with some of my data set, but not all of > them. > > Exception in thread "main" org.apache.flink.optimizer.CompilerException: > N

Re: YARN High Availability

2015-11-19 Thread Till Rohrmann
hat installation. > > On Thu, Nov 19, 2015 at 10:45 AM, Aljoscha Krettek > wrote: > >> I think we should find a way to randomize the paths where the HA stuff >> stores data. If users don’t realize that they store data in the same paths >> this could lead to problems. &

Re: YARN High Availability

2015-11-19 Thread Till Rohrmann
Thu, Nov 19, 2015, 11:24 Till Rohrmann wrote: > >> I agree that this would make the configuration easier. However, it >> entails also that the user has to retrieve the randomized path from the >> logs if he wants to restart jobs after the cluster has crashed or >> intention

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Till Rohrmann
The constructor of Java classes after deserialization is not necessarily called. Thus, you should move the check if (this.decoder == null) { this.decoder = new KafkaAvroDecoder(vProps); } into the deserialize method of MyAvroDeserializer. Cheers, Till ​ On Thu, Nov 19, 2015 at 1:50 PM, Madh

Re: Compiler Exception

2015-11-20 Thread Till Rohrmann
alue]], >collector: Collector[Vertex[Long, Long]]) => { > if (first.hasNext) { > collector.collect(first.next) > } > } > } > } > } > println(data.collect()) > } > } > >

Re: YARN High Availability

2015-11-23 Thread Till Rohrmann
is essentially a shortcut for > configuring the root path. > >> > >> In any case, it is orthogonal to Till’s proposals. That one we need to > address as well (see FLINK-2929). The motivation for the current behaviour > was to be rather defensive when removing state in order to

Re: Adding TaskManager on Cluster

2015-11-24 Thread Till Rohrmann
Hi Welly, you can always start a new TaskManager by simply calling taskmanager.sh start [streaming|batch], depending whether you are running a streaming cluster or a batch cluster. You can find the script in /bin. Cheers, Till ​ On Tue, Nov 24, 2015 at 10:27 AM, Welly Tambunan wrote: > What i'

Re: LDBC Graph Data into Flink

2015-11-24 Thread Till Rohrmann
Nice blog post Martin! On Tue, Nov 24, 2015 at 3:14 PM, Vasiliki Kalavri wrote: > Great, thanks for sharing Martin! > > On 24 November 2015 at 15:00, Martin Junghanns > wrote: > >> Hi, >> >> I wrote a short blog post about the ldbc-flink tool including a short >> overview of Flink and a Gelly e

Re: Standalone Cluster vs YARN

2015-11-25 Thread Till Rohrmann
Hi Welly, at the moment Flink only supports HA via ZooKeeper. However, there is no limitation to use another system. The only requirement is that this system allows you to find a consensus among multiple participants and to retrieve the community decision. If this is possible, then it can be integ

Re: key

2015-11-30 Thread Till Rohrmann
Hi Radu, if you want to use custom types as keys, then these custom types have to implement the Key interface. Cheers, Till ​ On Mon, Nov 30, 2015 at 5:28 PM, Radu Tudoran wrote: > Hi, > > > > I want to apply a “keyBy operator on a stream”. > > The string is of type MyEvent. This is a simple t

Re: Working with protobuf wrappers

2015-12-01 Thread Till Rohrmann
Hi Kryzsztof, it's true that we once added the Protobuf serializer automatically. However, due to versioning conflicts (see https://issues.apache.org/jira/browse/FLINK-1635), we removed it again. Now you have to register the ProtobufSerializer manually: https://ci.apache.org/projects/flink/flink-d

Re: Question about flink message processing guarantee

2015-12-02 Thread Till Rohrmann
Just a small addition. Your sources have to be replayable to some extent. With replayable I mean that they can continue from some kind of offset. Otherwise the check pointing won't help you. The Kafka source supports that for example. Cheers, Till On Dec 1, 2015 11:55 PM, "Márton Balassi" wrote:

Re: Including option for starting job and task managers in the foreground

2015-12-02 Thread Till Rohrmann
Hi Brian, as far as I know this is at the moment not possible with our scripts. However it should be relatively easy to add by simply executing the Java command in flink-daemon.sh in the foreground. Do you want to add this? Cheers, Till On Dec 1, 2015 9:40 PM, "Brian Chhun" wrote: > Hi All, > >

Re: HA Mode and standalone containers compatibility ?

2015-12-03 Thread Till Rohrmann
Hi Arnaud, as long as you don't have HA activated for your batch jobs, HA shouldn't have an influence on the batch execution. If it interferes, then you should see additional task manager connected to the streaming cluster when you execute the batch job. Could you check that? Furthermore, could yo

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Till Rohrmann
Hi Nick, at the moment Flink uses Java serialization to ship the UDFs to the cluster. Therefore, the closures must only contain Serializable objects. The serializer registration only applies to the data which is processed by the Flink job. Thus, for the moment I would try to get rid of the ColumnI

Re: Using memory logging in Flink

2015-12-09 Thread Till Rohrmann
I assume you're looking in the taskmanager log file for the memory usage logging statements, right? Cheers, Till On Wed, Dec 9, 2015 at 12:15 AM, Filip Łęczycki wrote: > Hi, > > Thank you for your reply! > > I have made sure I restarted the TaskManager after changing config, but it > didn't res

Re: Using memory logging in Flink

2015-12-09 Thread Till Rohrmann
lse just used the memory logging with the exact described >> settings - it worked. >> >> There is probably some mixup, you may be looking into the wrong log file, >> or may setting the a value in a different config... >> >> Stephan >> >> >> On

Re: Problems with using ZipWithIndex

2015-12-14 Thread Till Rohrmann
I just tested the zipWithIndex method with Flink 0.10.1 and it worked. I used the following code: import org.apache.flink.api.scala._ import org.apache.flink.api.scala.utils._ object Job { def main(args: Array[String]): Unit = { val env = ExecutionEnvironment.getExecutionEnvironment v

Re: global watermark across multiple kafka consumers

2015-12-16 Thread Till Rohrmann
Hi Andrew, as far as I know, there is nothing such as a prescribed way of handling this kind of situation. If you want to synchronize the watermark generation given a set of KafkaConsumers you need some kind of ground truth. This could be, for example, a central registry such as ZooKeeper in whic

Re: Problem to show logs in task managers

2015-12-17 Thread Till Rohrmann
Hi Ana, you can simply modify the `log4j.properties` file in the `conf` directory. It should be automatically included in the Yarn application. Concerning your logging problem, it might be that you have set the logging level too high. Could you share the code with us? Cheers, Till On Thu, Dec 1

Re: Problem to show logs in task managers

2015-12-18 Thread Till Rohrmann
Serializable { > private static final long serialVersionUID = -2932037991574118651L; > > static Logger loggerTestClass = > LoggerFactory.getLogger("WordCountExample.TestClass"); > > List integerList; > public TestClass(List integerList

Re: Configure log4j with XML files

2015-12-21 Thread Till Rohrmann
Hi Gwenhaël, as far as I know, there is no direct way to do so. You can either adapt the flink-daemon.sh script in line 68 to use a different configuration or you can test whether the dynamic property -Dlog4j.configurationFile:CONFIG_FILE overrides the -Dlog4j.confguration property. You can set th

Re: Monitoring single-run job statistics

2016-01-04 Thread Till Rohrmann
Hi Filip, at the moment it is not possible to retrieve the job statistics after the job has finished with flink run -m yarn-cluster. The reason is that the YARN cluster is only alive as long as the job is executed. Thus, I would recommend you to execute your jobs with a long running Flink cluster

Re: 2015: A Year in Review for Apache Flink

2016-01-04 Thread Till Rohrmann
Happy New Year :-) Hope everyone had a great start into the new year. On Thu, Dec 31, 2015 at 12:57 PM, Slim Baltagi wrote: > Happy New Year to you and your families! > Let’s make 2016 the year of Flink: General Availability, faster growth, > wider industry adoption, … > Slim Baltagi > Chicago,

Re: Scala API and sources with timestamp

2016-01-04 Thread Till Rohrmann
Hi Don, yes that's exactly how you use an anonymous function as a source function. Cheers, Till On Tue, Dec 22, 2015 at 3:16 PM, Don Frascuchon wrote: > Hello, > > There is a way for define a EventTimeSourceFunction with anonymous > functions from the scala api? Like that: > > env.addSource

Re: Problem to show logs in task managers

2016-01-04 Thread Till Rohrmann
am not restarting the Flink > JobManager and TaskManagers as I should… Any idea? > > Thanks, > Ana > > On 18 Dec 2015, at 16:29, Till Rohrmann wrote: > > In which log file are you exactly looking for the logging statements? And > on what machine? You have to look on the mach

Re: kafka integration issue

2016-01-05 Thread Till Rohrmann
Hi Alex, this is a bug in the `0.10` release. Is it possible for you to switch to version `1.0-SNAPSHOT`. With this version, the error should no longer occur. Cheers, Till On Tue, Jan 5, 2016 at 1:31 AM, Alex Rovner wrote: > Hello Flinkers! > > The below program produces the following error wh

Re: flink kafka scala error

2016-01-06 Thread Till Rohrmann
Hi Madhukar, could you check whether your Flink installation contains the flink-dist-0.10.1.jar in the lib folder? This file contains the necessary scala-library.jar which you are missing. You can also remove the line org.scala-lang:scala-library which excludes the scala-library dependency to be i

Re: Problem to show logs in task managers

2016-01-08 Thread Till Rohrmann
2250761414_0005 finished with state > FINISHED and final state SUCCEEDED at 1452259876445 > 13:31:16,747 INFO org.apache.flink.yarn.FlinkYarnCluster > - YARN Client is shutting down > > > You can see the log messages from the WordCountExample and TestClass > classes. But I

Re: Problem to show logs in task managers

2016-01-11 Thread Till Rohrmann
; tell me how to check if there are some Yarn containers running after the > Flink job has finished? I have tried: > hadoop job -list > but I cannot see any jobs there, although I am not sure that it means that > there are not containers running... > > Thanks, > Ana > > On

Re: Machine Learning on Apache Fink

2016-01-11 Thread Till Rohrmann
Hi Ashutosh, Flink’s ML library flinkML is maybe not as extensive as Spark’s MLlib. However, Flink has native support for iterations which makes them blazingly fast. Iterations in Flink are a distinct operator so that they don’t have to communicate after each iteration with the client. Furthermore

Re: Problem to show logs in task managers

2016-01-11 Thread Till Rohrmann
onsidered? (I > have also tried with ./bin/yarn-session.sh and then ./bin/flink run without > success…). > > I am not sure if this is related to flink anymore, should I move my > problem to the yarn community instead? > > Thanks, > Ana > > On 11 Jan 2016, at 10:37, Till Rohr

Re: eigenvalue solver

2016-01-12 Thread Till Rohrmann
Hi Lydia, there is no Eigenvalue solver included in FlinkML yet. But if you want to, then you can give it a try :-) [1] http://www.cs.newpaltz.edu/~lik/publications/Ruixuan-Li-CCPE-2015.pdf Cheers, Till On Tue, Jan 12, 2016 at 9:47 AM, Lydia Ickler wrote: > Hi, > > I wanted to know if there a

Re: eigenvalue solver

2016-01-12 Thread Till Rohrmann
t from? > Best regards, > Lydia > > > Von meinem iPhone gesendet > Am 12.01.2016 um 10:46 schrieb Till Rohrmann : > > Hi Lydia, > > there is no Eigenvalue solver included in FlinkML yet. But if you want to, > then you can give it a try :-) > > [1] http://w

Re: Exception using flink-connector-elasticsearch

2016-01-12 Thread Till Rohrmann
Hi Javier, it seems as if you either are missing the lucene-codecs jar in your classpath or that you have a wrong version (not 4.10.4). Could you check in your job jar whether it includes lucence-codecs? If so, could you run mvn dependency:tree in the root directory of your project. There you shou

Re: flink 1.0-SNAPSHOT scala 2.11 compilation error

2016-01-15 Thread Till Rohrmann
Hi David, this is definitely an error on our side which might be caused by the latest changes to the project structure (removing flink-staging directory). I’ve filed a JIRA issue https://issues.apache.org/jira/browse/FLINK-3241. It should be fixed soon. In the meantime it should work if you build

Re: akka.pattern.AskTimeoutException

2016-01-15 Thread Till Rohrmann
You can set Flink’s log level to DEBUG in the log4j.properties file. Furthermore, you can activate logging of Akka’s life cycle events via akka.log.lifecycle.events: true which you specify in flink-conf.yaml. Cheers, Till ​ On Fri, Jan 15, 2016 at 12:41 PM, Frederick Ayala wrote: > Hi Stephan,

Re: InvalidTypesException - Input mismatch: Basic type 'Integer' expected but was 'Long'

2016-01-18 Thread Till Rohrmann
Hi Biplob, which version of Flink are you using? With version 1.0-SNAPSHOT, I cannot reproduce your problem. Cheers, Till ​ On Sun, Jan 17, 2016 at 4:56 PM, Biplob Biswas wrote: > Hi, > > I am getting the following exception when i am using the map function > > Exception in thread "main" >> or

Re: InvalidTypesException - Input mismatch: Basic type 'Integer' expected but was 'Long'

2016-01-18 Thread Till Rohrmann
wrong it corresponds to the > 1.0-Snapshot you mentioned. > > [image: Inline image 1] > > If wrong, please suggest what should I do to fix it. > > Thanks & Regards > Biplob Biswas > > On Mon, Jan 18, 2016 at 11:23 AM, Till Rohrmann > wrote: > >> Hi B

Re: DataSet in Streaming application under Flink

2016-01-19 Thread Till Rohrmann
Hi Sylvain, what you could do for example is to load a static data set, e.g. from HDFS, in the open method of your comparator and cache it there. The open method is called for each task once when it is created. The comparator could then be a RichMapFunction implementation. By making the field stor

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread Till Rohrmann
It seems that the web server could not been instantiated. The reason for this problem should be in your logs. Could you look it up and post the reason here? Additionally, we should build in a sanity check to avoid the NPE. Cheers, Till On Wed, Jan 20, 2016 at 5:06 PM, HungChang wrote: > The or

Re: Unexpected out of bounds error in UnilateralSortMerger

2016-01-20 Thread Till Rohrmann
You could change the version of Stephan’s branch via mvn versions:set -DnewVersion=MyCustomBuildVersion and then mvn versions:commit. Now after you install the Flink binaries you can reference them in your project by setting the version of your Flink dependencies to MyCustomBuildVersion. That way,

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-20 Thread Till Rohrmann
I guess it’s easiest to simply enable logging and see what the problem is. If you run it from the IDE then you can also set a breakpoint in WebMonitorUtils.startWebRuntimeMonitor and see what the exception is. Cheers, Till ​ On Wed, Jan 20, 2016 at 6:04 PM, HungChang wrote: > Yea I'm wondering

Re: Reading Binary Data (Matrix) with Flink

2016-01-20 Thread Till Rohrmann
With readHadoopFile you can use all of Hadoop’s FileInputFormats and thus you can also do everything with Flink, what you can do with Hadoop. Simply take the same Hadoop FileInputFormat which you would take for your MapReduce job. Cheers, Till ​ On Wed, Jan 20, 2016 at 3:16 PM, Saliya Ekanayake

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-21 Thread Till Rohrmann
leaderRetrievalService will retrieve the leading JobManager. Take a look at LeaderRetrievalUtils in order to see how it is created and what options are supported. actorSystem is the ActorSystem which is used to resolve the leader’s Akka URL into an ActorRef. You can simply create one or use an exis

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-21 Thread Till Rohrmann
Could you add flink-runtime-web to your dependencies of your project? It seems as if it is missing in your project. Cheers, Till ​ On Thu, Jan 21, 2016 at 4:45 PM, HungChang wrote: > The following message is obtained after putting > BasicConfigurator.configure() > in main(); > But I don't under

Re: Cannot start FlinkMiniCluster.WebServer using different port in FlinkMiniCluster

2016-01-21 Thread Till Rohrmann
Great to hear :-) On Thu, Jan 21, 2016 at 4:55 PM, HungChang wrote: > After adding the dependency it totally works! Thank you a lot! > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cannot-start-FlinkMiniCluster-WebServer-using-diff

Re: rowmatrix equivalent

2016-01-24 Thread Till Rohrmann
Hi Lydia, Flink does not come with a distributed matrix implementation as Spark does it with the RowMatrix, yet. However, you can easily implement it yourself. This would also be a good contribution to the project if you want to tackle the problem Cheers, Till On Sun, Jan 24, 2016 at 4:03 PM, Ly

Re: MatrixMultiplication

2016-01-25 Thread Till Rohrmann
Hi Lydia, Since matrix multiplication is O(n^3), I would assume that it would simply take 1000 times longer than the multiplication of the 100 x 100 matrix. Have you waited so long to see whether it completes or is there another problem? Cheers, Till On Mon, Jan 25, 2016 at 2:13 PM, Lydia Ickler

Re: continous time triger

2016-01-26 Thread Till Rohrmann
Hi Radu, you can register processing and event time time triggers using the TriggerContext which is given to the onElement, onProcessingTime and onEventTime methods of Trigger. In case you register a processing time timer, the onProcessingTime method will be called once the system clock has passed

Re: maxtime / watermark for GlobaWindow

2016-01-26 Thread Till Rohrmann
Hi Radu, If I’m not mistaken, then it’s not possible with the current GlobalWindow implementation. However, you could extend the GlobalWindow which adds a new field into which the timestamp of the triggering element is stored. This field can then be read from within the WindowFunction to retrieve

Re: Task Manager metrics per job on Flink 0.9.1

2016-01-27 Thread Till Rohrmann
Hi Pieter, you're right that it would be nice to record the metrics for a later analysis. However, at the moment this is not supported. You could use the REST interface to obtain the JSON representation of the shown data in the web interface. By doing this repeatedly and parsing the metric data yo

Re: Imbalanced workload between workers

2016-01-27 Thread Till Rohrmann
Could it be that your data is skewed? This could lead to different loads on different task managers. With the latest Flink version, the web interface should show you how many bytes each operator has written and received. There you could see if one operator receives more elements than the others.

Re: about blob.storage.dir and .buffer files

2016-01-28 Thread Till Rohrmann
Hi Gwenhael, in theory the blob storage files can be any binary data. At the moment, this is however only used to distribute the user code jars. The jars are kept around as long as the job is running. Every library-cache-manager.cleanup.interval interval the files are checked and those which are n

Re: cluster execution

2016-01-28 Thread Till Rohrmann
Hi Lydia, what do you mean with master? Usually when you submit a program to the cluster and don’t specify the parallelism in your program, then it will be executed with the parallelism.default value as parallelism. You can specify the value in your cluster configuration flink-config.yaml file. Al

Re: Issue on watermark meaning

2016-01-29 Thread Till Rohrmann
Hi Lorenzo, you're right that we should stick to the same terminology between the online documentation and the code, otherwise it's confusing. In this case, though, a lower numeric timestamp is equivalent to an older event. The older an element is, the lower is its timestamp. However, there is a

Re: Left join with unbalanced dataset

2016-01-31 Thread Till Rohrmann
Hi Arnaud, the unmatched elements of A will only end up on the same worker node if they all share the same key. Otherwise, they will be evenly spread out across your cluster. However, I would also recommend you to use Flink's leftOuterJoin. Cheers, Till On Sun, Jan 31, 2016 at 5:27 AM, Chiwan Pa

Re: cluster execution

2016-02-01 Thread Till Rohrmann
not be created. > at > org.apache.flink.api.common.io.FileOutputFormat.initializeGlobal(FileOutputFormat.java:295) > at > org.apache.flink.runtime.jobgraph.OutputFormatVertex.initializeOnMaster(OutputFormatVertex.java:84) > at > org.apache.flink.runtime.jobmanager.JobManager

Re: join with no element appearing in multiple join-pairs

2016-02-01 Thread Till Rohrmann
Hi Fridtjof, I might miss something, but can’t you assign the ids once before starting the iteration and then reuse them throughout the iterations? Of course you would have to add another field to your input data but then you don’t have to run the zipWithIndex for every iteration. Cheers, Till ​

Re: join with no element appearing in multiple join-pairs

2016-02-01 Thread Till Rohrmann
are 0 and 2, resulting in equal joins > flags. Sequential elements always have to have alternating flags, which > gets violated here. > > Best > Fridtjof > > Am 01.02.16 um 12:26 schrieb Till Rohrmann: > > Hi Fridtjof, > > I might miss something, but can’t you assign t

Re: Change #TaskSlots in web interface

2016-02-02 Thread Till Rohrmann
Hi Sendoh In order to change the configuration you have to modify `conf/flink-config.yaml` and then restart the cluster. Cheers, Till On Tue, Feb 2, 2016 at 10:14 AM, HungChang wrote: > Hi, > > I remember there is a web interface(port: 6XXX) that can change > configuration of Job Manager. > e.

Re: Understanding code of CountTrigger

2016-02-03 Thread Till Rohrmann
Hi Nirmalya, the CountTrigger always works together with the CountEvictor which will make sure that only count elements are kept in the window. Evictors can evict elements from the window after the trigger event. That is the reason why the CountTrigger does not have to purge the window explicitly.

Re: Distribution of sinks among the nodes

2016-02-03 Thread Till Rohrmann
Hi Gwenhäel, if you set the number of slots for each TaskManager to 4, then all of your mapper will be evenly spread out. The sources should also be evenly spread out. However, for the sinks since they depend on all mappers, it will be most likely random where they are deployed. So you might end u

Re: Checkpoints and event ordering

2016-02-04 Thread Till Rohrmann
Hi Shikhar, the currently open windows are also part of the operator state. Whenever a window operator receives a barrier it will checkpoint the state of the user function and additionally all uncompleted windows. This also means that the window operator does not buffer the barriers. Once it has t

Re: FlinkML 0.10.1 - Using SparseVectors with MLR does not work

2016-02-04 Thread Till Rohrmann
Hi Sourigna, it turned out to be a bug in the GradientDescent implementation which cannot handle sparse gradients. That is not so problematic by itself, because the sum of gradient vectors is usually dense even if the individual gradient vectors are sparse. We simply forgot to initialize the initi

Re: release of task slot

2016-02-04 Thread Till Rohrmann
Hi Radu, what does the log of the TaskManager 10.204.62.80:57910 say? Cheers, Till ​ On Wed, Feb 3, 2016 at 6:00 PM, Radu Tudoran wrote: > Hello, > > > > > > I am facing an error which for which I cannot figure the cause. Any idea > what could cause such an error? > > > > > > > > java.lang.Exc

Re: DistributedMatrix in Flink

2016-02-04 Thread Till Rohrmann
Hi Lydia, Spark and Flink are not identical. Thus, you’ll concepts in both system which won’t have a corresponding counter part in the other system. For example, rows.context.broadcast(v) broadcasts the value v so that you can use it on all Executors. Flink follows a slightly different concept whe

Re: release of task slot

2016-02-05 Thread Till Rohrmann
in task resource cleanup >>>>>> >>>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded >>>>>> >>>>>> 11:21:55,160 ERROR >>>>>> org.apache.flink.runtime.taskmanager.Task >>>>>> - FATAL -

Re: Error while reading binary file

2016-02-08 Thread Till Rohrmann
Hi Saliya, in order to set the file path for the SerializedInputFormat you first have to create it and then explicitly call setFilePath. final SerializedInputFormat inputFormat = new SerializedInputFormat(); inputFormat.setFilePath(PATH_TO_FILE); env.createInput(inputFormat, myTypeInfo); Cheers

Re: Quick question about enableObjectReuse()

2016-02-09 Thread Till Rohrmann
Yes, you're right Arnaud. Cheers, Till On Tue, Feb 9, 2016 at 10:42 AM, LINZ, Arnaud wrote: > Hi, > > > > I just want to be sure : when I set enableObjectReuse, I don’t need to > create copies of objects that I get as input and return as output but which > I don’t keep inside my user function ?

Re: Join two Datasets --> InvalidProgramException

2016-02-09 Thread Till Rohrmann
Could you share the code for your types SourceA and SourceB. It seems as if Flink does not recognize them to be POJOs because he assigned them the GenericType type. Either there is something wrong with the type extractor or your implementation does not fulfil the requirements for POJOs, as indicate

Re: Join two Datasets --> InvalidProgramException

2016-02-09 Thread Till Rohrmann
nds Parent{ > private Integer id; > private String username; > > public SourceB () { > super(); > } > //GETTER & SETTER > > } > > Am 09.02.2016 um 12:06 schrieb Till Rohrmann: > > Could you share the code for your types SourceA and SourceB. It seem

Re: Join two Datasets --> InvalidProgramException

2016-02-09 Thread Till Rohrmann
).equalTo("sessionId").print(); > > Thanks a lot! > Dominique > > > Am 09.02.2016 um 14:36 schrieb Till Rohrmann: > > Could you post the complete example code (Flink job including the type > definitions). For example, if the data sets are of type DataSet, > then

Re: Merge or minus Dataset API missing

2016-02-12 Thread Till Rohrmann
Why don’t you simply use a fullOuterJoin to do that? Cheers, Till ​ On Fri, Feb 12, 2016 at 4:48 PM, Flavio Pompermaier wrote: > Hi to all, > > I have a use case where I have to merge 2 datasets but I can't find a > direct dataset API to do that. > I want to execute some function when there's a

Re: Regarding Concurrent Modification Exception

2016-02-15 Thread Till Rohrmann
But isn't that a normal stack trace which you see when you submit a job to the cluster via the CLI and somewhere in the compilation process something fails? Anyway, it would be helpful to see the program which causes this problem. Cheers, Till On Mon, Feb 15, 2016 at 12:25 PM, Fabian Hueske wro

Re: Kyro Intermittent Exception for Large Data

2016-02-19 Thread Till Rohrmann
Thanks for the pointer Ken. As far as I know, we’re using the StdInstantiatorStrategy as the fallback instantiator strategy for our Kryo instances. Cheers, Till ​ On Fri, Feb 19, 2016 at 12:39 AM, Ken Krugler wrote: > I've seen this type of error when using Kryo with a Cascading scheme >

Re: Finding the average temperature

2016-02-21 Thread Till Rohrmann
Hi Nirmalya, if you want to calculate the running average over all measurements independent of the probe ID, then you cannot parallelize the computation. In this case you have to use a global window. Cheers, Till On Feb 19, 2016 6:30 PM, "Nirmalya Sengupta" wrote: > Hello Aljoscha , > > My sin

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-22 Thread Till Rohrmann
Hi Shikhar, I just wanted to let you know that we've found the problem with the failing assembly plugin. It was caused by incompatible classes [1, 2]. Once these PRs are merged, the merge problems should be resolved. By the way, we've also added now a SBT template for Flink projects using giter8

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
Hi Ovidiu, at the moment Flink's batch fault tolerance restarts the whole job in case of a failure. However, parts of the logic to do partial backtracking such as intermediate result partitions and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to complete the

  1   2   3   4   5   6   7   8   9   10   >