Monitoring REST API

2016-12-21 Thread Lydia Ickler
Hi all, I have a question regarding the Monitoring REST API; I want to analyze the behavior of my program with regards to I/O MiB/s, Network MiB/s and CPU % as the authors of this paper did. (https://hal.inria.fr/hal-01347638v2/document ) From the

Monitoring Flink on Yarn

2016-12-19 Thread Lydia Ickler
Hi all, I am using Flink 1.1.3 on Yarn and I wanted to ask how I can save the monitoring logs, e.g. for I/O or network, to HDFS or local FS? Since Yarn closes the Flink session after finishing the job I can't access the log via REST API. I am looking forward to your answer! Best regards, Lydia

multiple k-means in parallel

2016-11-27 Thread Lydia Ickler
Hi, I want to run k-means with different k in parallel. So each worker should calculate its own k-means. Is that possible? If I do a map on a list of integers to then apply k-means I get the following error: Task not serializable I am looking forward to your answers! Lydia

Write matrix/vector

2016-05-29 Thread Lydia Ickler
Hi, I would like to know how to write a Matrix or Vector (Dense/Sparse) to file? Thanks in advance! Best regards, Lydia

sparse matrix

2016-05-29 Thread Lydia Ickler
Hi all, I have two questions regarding sparse matrices: 1. I have a sparse Matrix: val sparseMatrix = SparseMatrix.fromCOO(row, col, csvInput.collect()) and now I would like to extract all values that are in a specific row X. How would I tackle that? flatMap() and filter() do not seem to be sup

Re: Scatter-Gather Iteration aggregators

2016-05-13 Thread Lydia Ickler
e getPreviousIterationAggregate() > method. > Let me know if that clears things up! > > -Vasia. > > On 13 May 2016 at 08:57, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi Vasia, > > yes, but only independently within each Function or not? > &g

Re: Scatter-Gather Iteration aggregators

2016-05-12 Thread Lydia Ickler
016 um 08:04 schrieb Vasiliki Kalavri : > > Hi Lydia, > > registered aggregators through the ScatterGatherConfiguration are accessible > both in the VertexUpdateFunction and in the MessageFunction. > > Cheers, > -Vasia. > > On 12 May 2016 at 20:08, Lydia Ickler

Scatter-Gather Iteration aggregators

2016-05-12 Thread Lydia Ickler
Hi, I have a question regarding the Aggregators of a Scatter-Gather Iteration. Is it possible to have a global aggregator that is accessible in VertexUpdateFunction() and MessagingFunction() at the same time? Thanks in advance, Lydia

normalize vertex values

2016-05-12 Thread Lydia Ickler
Hi all, If I have a Graph g: Graph g and I would like to normalize all vertex values by the absolute max of all vertex values -> what API function would I choose? Thanks in advance! Lydia

Re: Find differences

2016-04-07 Thread Lydia Ickler
Nevermind! I figured it out with groupby and Reducegroup Von meinem iPhone gesendet > Am 07.04.2016 um 11:51 schrieb Lydia Ickler : > > Hi, > > If i have 2 DataSets A and B of Type Tuple3 how would > I get a subset of A (based on the fields (0,1)) that does not occur in B

Find differences

2016-04-07 Thread Lydia Ickler
Hi, If i have 2 DataSets A and B of Type Tuple3 how would I get a subset of A (based on the fields (0,1)) that does not occur in B? Is there maybe an already implemented method? Best regards, Lydia Von meinem iPhone gesendet

varying results: local VS cluster

2016-04-04 Thread Lydia Ickler
Hi all, I have an issue regarding execution on 1 machine VS 5 machines. If I execute the following code the results are not the same though I would expect them to be since the input file is the same. Do you have any suggestions? Thanks in advance! Lydia ExecutionEnvironment env = ExecutionEnviro

Re: wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
t; starts, by overriding "open()" and "close()" from the RichFunction interface. > > Stephan > > > On Thu, Mar 31, 2016 at 4:45 PM, Till Rohrmann <mailto:trohrm...@apache.org>> wrote: > I think I don't completely understand your question. &

Re: wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
all downstream operators which depend on the bulk iteration will wait > implicitly until data from the iteration operator is available. > > Cheers, > Till > > On Thu, Mar 31, 2016 at 9:39 AM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi all, > > is

wait until BulkIteration finishes

2016-03-31 Thread Lydia Ickler
Hi all, is there a way to tell the program that it should wait until the BulkIteration finishes before the rest of the program is executed? Best regards, Lydia

BulkIteration and BroadcastVariables

2016-03-30 Thread Lydia Ickler
Hi all, I have a question regarding the BulkIteration and BroadcastVariables: The BulkIteration by default has one input variable and sends one variable into the next iteration, right? What if I need to collect some intermediate results in each iteration? How would I do that? For example in my c

for loop slow

2016-03-26 Thread Lydia Ickler
Hi, I have an issue with a for-loop. If I set the maximal iteration number i to more than 3 it gets stuck and I cannot figure out why. With 1, 2 or 3 it runs smoothly. I attached the code below and marked the loop with //PROBLEM. Thanks in advance! Lydia package org.apache.flink.contrib.lifesci

Re: normalizing DataSet with cross()

2016-03-22 Thread Lydia Ickler
s > well, I would assume. > > > On Tue, Mar 22, 2016 at 3:15 PM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi Till, > > maybe it is doing so because I rewrite the ds in the next step again and then > the working steps get mixed? > I am reading

Re: normalizing DataSet with cross()

2016-03-22 Thread Lydia Ickler
ram or do you read the > data from a source with varying data? Maybe you could send us a compilable > and complete program which reproduces your problem. > > Cheers, > Till > > On Tue, Mar 22, 2016 at 2:21 PM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrot

normalizing DataSet with cross()

2016-03-22 Thread Lydia Ickler
Hi all, I have a question. If I have a DataSet DataSet> ds and I want to normalize all values (at position 2) in it by the maximum of the DataSet (ds.aggregate(Aggregations.MAX, 2)). How do I tackle that? If I use the cross operator my result changes every time I run the program (see code bel

Help with DeltaIteration

2016-03-19 Thread Lydia Ickler
Hi, I have a question regarding the Delta Iteration. I basically want to iterate as long as the former and the new calculated set are different. Stop if they are the same. Right now I get a result set that has entries with duplicate „row“ indices which should not be the case. I guess I am doing

MatrixMultiplication

2016-03-14 Thread Lydia Ickler
Hi, I wrote to you before about the MatrixMultiplication in Flink … Unfortunately, the multiplication of a pair of 1000 x 1000 matrices is taking already almost a minute. Would you please take a look at my attached code. Maybe you can suggest something to make it faster? Or would it be better

filter dataset

2016-02-29 Thread Lydia Ickler
Hi all, I have a DataSet and I want to apply a filter to only get back all entries with e.g. first Integer in tuple == 0. With a normal filter I do not have the possibility to pass an an additional argument but I have to set that parameter inside the filter function. Is there a possibility to

DistributedMatrix in Flink

2016-02-04 Thread Lydia Ickler
Hi all, as mentioned before I am trying to import the RowMatrix from Spark to Flink… In the code I already ran into a dead end… In the function multiplyGramianMatrixBy() (see end of mail) there is the line: rows.context.broadcast(v) (rows is a DataSet[Vector] What exactly is this line doing? Do

Re: cluster execution

2016-02-01 Thread Lydia Ickler
xD… a simple "hdfs dfs -chmod -R 777 /users" fixed it! > Am 01.02.2016 um 12:17 schrieb Till Rohrmann : > > Hi Lydia, > > I looks like that. I guess you should check your hdfs access rights. > > Cheers, > Till > > On Mon, Feb 1, 2016

Re: cluster execution

2016-02-01 Thread Lydia Ickler
your cluster configuration flink-config.yaml file. Alternatively > you can always specify the parallelism via the CLI client with the -p option. > > Cheers, > Till > > > On Thu, Jan 28, 2016 at 9:53 AM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi al

cluster execution

2016-01-28 Thread Lydia Ickler
Hi all, I am doing some operations on a DataSet> … (see code below) When I run my program on a cluster with 3 machines I can see within the web client that only my master is executing the program. Do I have to specify somewhere that all machines have to participate? Usually the cluster execute

Re: rowmatrix equivalent

2016-01-26 Thread Lydia Ickler
yet. However, you can easily implement it yourself. This > would also be a good contribution to the project if you want to tackle the > problem > > Cheers, > Till > > On Sun, Jan 24, 2016 at 4:03 PM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi

Re: MatrixMultiplication

2016-01-25 Thread Lydia Ickler
ication of the 100 x 100 matrix. Have > you waited so long to see whether it completes or is there another problem? > > Cheers, > Till > > On Mon, Jan 25, 2016 at 2:13 PM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi, > > I want do a simpl

MatrixMultiplication

2016-01-25 Thread Lydia Ickler
Hi, I want do a simple MatrixMultiplication and use the following code (see bottom). For matrices 50x50 or 100x100 it is no problem. But already with matrices of 1000x1000 it would not work anymore and gets stuck in the joining part. What am I doing wrong? Best regards, Lydia package de.tube

rowmatrix equivalent

2016-01-24 Thread Lydia Ickler
Hi all, this is maybe a stupid question but what within Flink is the equivalent to Sparks’ RowMatrix ? Thanks in advance, Lydia

Re: eigenvalue solver

2016-01-12 Thread Lydia Ickler
uded in FlinkML yet. But if you want to, > then you can give it a try :-) > > [1] http://www.cs.newpaltz.edu/~lik/publications/Ruixuan-Li-CCPE-2015.pdf > > Cheers, > Till > >> On Tue, Jan 12, 2016 at 9:47 AM, Lydia Ickler >> wrote: >> Hi, >> >>

eigenvalue solver

2016-01-12 Thread Lydia Ickler
Hi, I wanted to know if there are any implementations yet within the Machine Learning Library or generally that can efficiently solve eigenvalue problems in Flink? Or if not do you have suggestions on how to approach a parallel execution maybe with BLAS or Breeze? Thanks in advance! Lydia

Re: writeAsCsv

2015-10-07 Thread Lydia Ickler
ok, thanks! :) I will try that! > Am 07.10.2015 um 21:35 schrieb Lydia Ickler : > > Hi, > > stupid question: Why is this not saved to file? > I want to transform an array to a DataSet but the Graph stops at collect(). > > //Transform Spectrum to DataSet > List

writeAsCsv

2015-10-07 Thread Lydia Ickler
Hi, stupid question: Why is this not saved to file? I want to transform an array to a DataSet but the Graph stops at collect(). //Transform Spectrum to DataSet List> dataList = new LinkedList>(); double[][] arr = filteredSpectrum.getAs2DDoubleArray(); for (int i=0;i

source binary file

2015-10-06 Thread Lydia Ickler
Hi, how would I read a BinaryFile from HDFS with the Flink Java API? I can only find the Scala way… All the best, Lydia

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
t; > java -Xmx2g -cp target/youruberjar.jar yourclass arg1 arg2 > > hope it helps, > Stefano > > 2015-10-02 12:21 GMT+02:00 Lydia Ickler <mailto:ickle...@googlemail.com>>: > Hi, > > I did not create anything by myself. > I just downloaded the file

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
APSHOT? > Otherwise, I would recommend to use the latest stable release (0.9.1) for > your flink job and on the cluster. > > On Fri, Oct 2, 2015 at 11:55 AM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi, > > but inside the pom of flunk-job is the flink

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
> Am 02.10.2015 um 11:55 schrieb Lydia Ickler : > > 0.10-SNAPSHOT

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
atest stable release (0.9.1) for > your flink job and on the cluster. > > On Fri, Oct 2, 2015 at 11:55 AM, Lydia Ickler <mailto:ickle...@googlemail.com>> wrote: > Hi, > > but inside the pom of flunk-job is the flink version set to 0.8 > > 0.8-incuba

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
tween the Flink version you've used to > compile your job and the Flink version installed on the cluster. > > Maven automagically pulls newer 0.10-SNAPSHOT versions every time you're > building your job. > > On Fri, Oct 2, 2015 at 11:45 AM, Lydia Ickler <mailto:ic

Re: data flow example on cluster

2015-10-02 Thread Lydia Ickler
Hi Till, I want to execute your Matrix Completion program „ALSJoin“. Locally it works perfect. Now I want to execute it on the cluster with: run -c com.github.projectflink.als.ALSJoin -cp /tmp/icklerly/flink-jobs-0.1-SNAPSHOT.jar 0 2 0.001 10 1 1 but I get the following error: java.lang.NoSuchM

DataSet transformation

2015-10-01 Thread Lydia Ickler
Hi all, so I have a case class Spectrum(mz: Float, intensity: Float) and a DataSet[Spectrum] to read my data in. Now I want to know if there is a smart way to transform my DataSet into a two dimensional Array ? Thanks in advance, Lydia

error message

2015-09-30 Thread Lydia Ickler
Hi, what jar am I missing ? The error is: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile$default$4()Z

data flow example on cluster

2015-09-29 Thread Lydia Ickler
Hi all, I want to run the data-flow Wordcount example on a Flink Cluster. The local execution with „mvn exec:exec -Dinput=kinglear.txt -Doutput=wordcounts.txt“ is already working. How is the command to execute it on the cluster? Best regards, Lydia

Re: HBase issue

2015-09-24 Thread Lydia Ickler
rowse/HBASE-10304 >> >> In your log I see the same exception. Anyone has any idea what we could do >> about this? >> >> >>> On Tue, 22 Sep 2015 at 22:40 Lydia Ickler wrote: >>> Hi, >>> >>> I am trying to get the HBaseReadExampl

Re: HBase issue

2015-09-24 Thread Lydia Ickler
cha Krettek <mailto:aljos...@apache.org>> wrote: > It might me that this is causing the problem: > https://issues.apache.org/jira/browse/HBASE-10304 > <https://issues.apache.org/jira/browse/HBASE-10304> > > In your log I see the same exception. Anyone has any idea wh

Re: no valid hadoop home directory can be found

2015-09-23 Thread Lydia Ickler
oop.home.dir’ (-Dhadoop.home.dir=…) > > – Ufuk > >> On 23 Sep 2015, at 12:43, Lydia Ickler wrote: >> >> Hi all, >> >> I get the following error message that no valid hadoop home directory can be >> found when trying to initialize the HBase configuration. >

no valid hadoop home directory can be found

2015-09-23 Thread Lydia Ickler
Hi all, I get the following error message that no valid hadoop home directory can be found when trying to initialize the HBase configuration. Where would I specify that path? 12:41:02,043 INFO org.apache.flink.addons.hbase.TableInputFormat - Initializing HBaseConfiguration 12:41

HBase issue

2015-09-22 Thread Lydia Ickler
Hi, I am trying to get the HBaseReadExample to run. I have filled a table with the HBaseWriteExample and purposely split it over 3 regions. Now when I try to read from it the first split seems to be scanned (170 rows) fine and after that the Connections of Zookeeper and RCP are suddenly closed

Re: Job stuck at "Assigning split to host..."

2015-07-27 Thread Lydia Ickler
Hi Ufuk, yes, I figured out that the HMaster of hbase did not start properly! Now everything is working :) Thanks for your help! Best regards, Lydia > Am 27.07.2015 um 11:45 schrieb Ufuk Celebi : > > Any update on this Lydia? > > On 23 Jul 2015, at 16:38, Ufuk Celebi wrote: > >> Unfortuna

Re: Job stuck at "Assigning split to host..."

2015-07-23 Thread Lydia Ickler
rows are read etc.). > > Do you mind setting the log level to DEBUG and then posting the logs again? > > – Ufuk > > On 23 Jul 2015, at 14:12, Lydia Ickler wrote: > >> Hi, >> >> I am trying to read data from a HBase Table via the HBaseReadExample.ja

Job stuck at "Assigning split to host..."

2015-07-23 Thread Lydia Ickler
Hi, I am trying to read data from a HBase Table via the HBaseReadExample.java Unfortunately, my run gets always stuck at the same position. Do you guys have any suggestions? In the master node it says: 14:05:04,239 INFO org.apache.flink.runtime.jobmanager.JobManager - Received job bb9

Re: HBase on 4 machine cluster - OutOfMemoryError

2015-07-18 Thread Lydia Ickler
ently the RPC message is very > large. > > Is the data that you request in one row? > > Am 18.07.2015 00:50 schrieb "Lydia Ickler" <mailto:ickle...@googlemail.com>>: > Hi all, > > I am trying to read a data set from HBase within a cluster application

HBase on 4 machine cluster - OutOfMemoryError

2015-07-17 Thread Lydia Ickler
Hi all, I am trying to read a data set from HBase within a cluster application. The data is about 90MB big. When I run the program on a cluster consisting of 4 machines (8GB RAM) I get the following error on the head-node: 16:57:41,572 INFO org.apache.flink.api.common.io.LocatableInputSplitAs

DataSet Conversion

2015-07-13 Thread Lydia Ickler
Hi guys, is it possible to convert a Java DataSet to a Scala Dataset? Right now I get the following error: Error:(102, 29) java: incompatible types: 'org.apache.flink.api.java.DataSet cannot be converted to org.apache.flink.api.scala.DataSet‘ Thanks in advance, Lydia

HBase & Machine Learning

2015-07-11 Thread Lydia Ickler
Dear Sir or Madame, I would like to use the Flink-HBase addon to read out data that then serves as an input for the machine learning algorithms, respectively the SVM and MLR. Right now I first write the extracted data to a temporary file and then read it in via the libSVM method...but i guess t