date:20160204

RE: Distribution of sinks among the nodes

2016-02-04 Thread Gwenhael Pasquiers

Don’t we need to set the number of slots to 24 (4 sources + 16 mappers + 4 sinks) ? Or is there a way not to set the number of slots per TaskManager instead of globally so that they are at least equally dispatched among the nodes ? As for the sink deployment : that’s not good news ; I mean we w

Stream conversion

2016-02-04 Thread Sane Lee

Dear all, I want to convert the data from each window of stream to dataset. What is the best way to do that? So, while streaming, at the end of each window I want to convert those data to dataset and possible apply dataset transformations to it. Any suggestions? -best -sane

Re: Where does Flink store intermediate state and triggering/scheduling a delayed event?

2016-02-04 Thread Soumya Simanta

Fabian, Thank a lot for your response. Really appreciated. I've some additional questions (please see inline) On Wed, Feb 3, 2016 at 2:42 PM, Fabian Hueske wrote: > Hi, > > 1) At the moment, state is kept on the JVM heap in a regular HashMap. > Is this state replicated across JVMs in a cluster

Re: Stream conversion

2016-02-04 Thread Matthias J. Sax

Hi Sane, Currently, DataSet and DataStream API a strictly separated. Thus, this is not possible at the moment. What kind of operation do you want to perform on the data of a window? Why do you want to convert the data into a data set? -Matthias On 02/04/2016 10:11 AM, Sane Lee wrote: > Dear all

Re: Possibility to get the line numbers?

2016-02-04 Thread Fridtjof Sander

I had that problem/question some time ago, too. The quick fix is to just put the line number in the line itself. Go for it. However, we worked out a solution for another distributed processing system, that did the following: Read each partition, count the lines, broadcast a map "partition->lin

Re: Stream conversion

2016-02-04 Thread Jeyhun Karimov

Hi Matthias, This need not to be necessarily in api functions. I just want to get a roadmap to add this functionality. Should I save each window's data into disk and create a new dataset environment in parallel? Or change trigger functionality maybe? I have large windows. As I asked in previous q

Re: Stream conversion

2016-02-04 Thread Sane Lee

I have also, similar scenario. Any suggestion would be appreciated. On Thu, Feb 4, 2016 at 10:29 AM Jeyhun Karimov wrote: > Hi Matthias, > > This need not to be necessarily in api functions. I just want to get a > roadmap to add this functionality. Should I save each window's data into > disk an

Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Gwenhael Pasquiers

Hi, I’ve got two more questions on different topic… First one : Is there a way to monitor the buffers status. In order to find bottleneck in our application we though it could be usefull to be able to have a look at the different exchange buffers’ status. To know if they are full (or as an exa

Re: Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Robert Metzger

Hi Gwen, let me answer the second question: There is a JIRA to reintroduce the configuration parameter: https://issues.apache.org/jira/browse/FLINK-2213. I will try to get a fix for this into the 1.0 release. I think I removed back then because users were unable to define the number of vcores ind

Re: Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Stephan Ewen

Concerning the first question: What you are looking for is backpressure monitoring. If a task cannot push its data to the next task, it is backpressured. This pull request adds a first version of backpressure monitoring: https://github.com/apache/flink/pull/1578 We will try and get it merged soo

Re: Where does Flink store intermediate state and triggering/scheduling a delayed event?

2016-02-04 Thread Fabian Hueske

Hi Soumya, Operator state is partitioned across JVMs and not replicated. However, it is checkpointed (e.g., to HDFS) at regular intervals to guarantee fault-tolerance with exactly-once semantics. In case of a failure, all operator states are recovered from checkpoints. The state backend is respon

Re: Stream conversion

2016-02-04 Thread Stephan Ewen

Hi! If I understand you correctly, what you are looking for is a kind of periodic batch job, where the input data for each batch is a large window. We have actually thought about this kind of application before. It is not on the short term road map that we shared a few weeks ago, but I think it w

Re: Flink cluster and Java 8

2016-02-04 Thread Flavio Pompermaier

Anyone looking into this? Java 7 reached its end of life at april 2015 with its last public update (numer 80) and the ability to run Java 8 jobs would be more and more important in the future. IMHO, the default target of the maven compiler plugin should be set to 1.8 in the 1.0 release. In most of

Flink writeAsCsv

2016-02-04 Thread Radu Prodan

Hi all, I am new to flink. I wrote a simple program and I want it to output as csv file. timeWindowAll(Time.of(3, TimeUnit.MINUTES)) .apply(newFunction1()) .writeAsCsv("file:///user/someuser/Documents/somefile.csv"); When I change the sink to . print(), it works and outputs some results. I w

Re: Flink cluster and Java 8

2016-02-04 Thread Stephan Ewen

Hi! I am running Java 8 for a year without an issue. The code is compiled for target Java 7, but can be run with Java 8. User code that is targeted for Java 8 can be run if Flink is run with Java 8. The initial error you got was because you probably compiled with Java 8 as the target, and ran it

Re: Flink writeAsCsv

2016-02-04 Thread Márton Balassi

Hey Radu, As you are using the streaming api I assume that you call env.execute() in both cases. Is that the case? Do you see any errors appearing? My first call would be if your data type is not a tuple type then writeAsCsv does not work by default. Best, Marton On Thu, Feb 4, 2016 at 11:36 A

Re: Flink cluster and Java 8

2016-02-04 Thread Flavio Pompermaier

I've tested several configurations (also changing my compilation to 1.7 but then sesame 4 was causing the error [1]): 1. Flink compiled with java 1.7 (default), runned within Eclipse with Java 8: OK 2. Flink compiled with java 1.7 (default), runned the cluster with java 8: not able to

Re: Flink writeAsCsv

2016-02-04 Thread Radu Prodan

Hi Marton, Thanks to your comment I managed to get it worked. At least it outputs the results. However, what I need is to output each window result seperately. Now, it outputs the results of parallel working windows (I think) and appends the new results to them. For example, If I have parallelism

RE: Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Gwenhael Pasquiers

Ok thanks ! All that’s left is to wait then. B.R. From: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] On Behalf Of Stephan Ewen Sent: jeudi 4 février 2016 11:19 To: user@flink.apache.org Subject: Re: Internal buffers supervision and yarn vCPUs Concerning the first question: What you ar

Re: Internal buffers supervision and yarn vCPUs

2016-02-04 Thread Ufuk Celebi

> On 04 Feb 2016, at 12:02, Gwenhael Pasquiers > wrote: > > Ok thanks ! > > All that’s left is to wait then. If you have spare time and are working with the current snapshot version, it would be great to get some feedback on the pull request. :-) – Ufuk

Merging of streams

2016-02-04 Thread Ashutosh Kumar

I found that merge method for datastream does not exist in latest version . What is the equivalent for it ? Shall I use union or join ? Thanks Ashutosh

Re: Understanding code of CountTrigger

2016-02-04 Thread Aljoscha Krettek

Very good, you are absolutely right. :D > On 04 Feb 2016, at 05:07, Nirmalya Sengupta > wrote: > > on

Re: Merging of streams

2016-02-04 Thread Aljoscha Krettek

Hi, sorry for the inconvenience. The new name of merge is now union. Regards, Aljoscha > On 04 Feb 2016, at 13:01, Ashutosh Kumar wrote: > > I found that merge method for datastream does not exist in latest version . > What is the equivalent for it ? Shall I use union or join ? > > Thanks >

RE: Flink writeAsCsv

2016-02-04 Thread Radu Tudoran

Hi Radu, It is indeed interesting to know how each window could be registered separately - I am not sure it any of the existing mechanisms in Flink support this. I think you need to create your own output sink. It is a bit tricky to pass the window sequence number (actually I do not think such

Re: Flink writeAsCsv

2016-02-04 Thread Fabian Hueske

You can get the end time of a window from the TimeWindow object which is passed to the AllWindowFunction. This is basically a window ID / index. I would go for a custom output sink which writes records to files based on their timestamp. IMO, this would be cleaner & easier than implementing the file

Re: Stream conversion

2016-02-04 Thread Robert Metzger

I'm wondering which kind of transformations you want to apply to the window you cannot apply with the DataStream API? Would it be sufficient for you to have the windows as files in HDFS and then run batch jobs against the windows on disk? If so, you could use our filesystem sink, which creates fil

RE: Distribution of sinks among the nodes

2016-02-04 Thread Gwenhael Pasquiers

Sorry I was confused about the number of slots, it’s good now. However, is disableChaing or disableOperatorChaining working properly ? I called those methods everywhere I could but it still seems that some of my operators are being chained together I can’t go over 16 used slot where I should be

Re: Checkpoints and event ordering

2016-02-04 Thread Till Rohrmann

Hi Shikhar, the currently open windows are also part of the operator state. Whenever a window operator receives a barrier it will checkpoint the state of the user function and additionally all uncompleted windows. This also means that the window operator does not buffer the barriers. Once it has t

Re: Distribution of sinks among the nodes

2016-02-04 Thread Stephan Ewen

Hi Gwen! You actually need not 24 slots, but only as many as the highest parallelism is (16). Slots do not hold individual tasks, but "pipelines". Here is an illustration how that works. https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html#configuring-taskmanager-process

Re: Distribution of sinks among the nodes

2016-02-04 Thread Stephan Ewen

To your other question, there are two things in Flink: (1) Chaining. Tasks are folded together into one task, run by one thread. (2) Resource groups: Tasks stay separate, have separate threads, but share a slot (which means share memory resources). See the link in my previous mail for an explanat

DistributedMatrix in Flink

2016-02-04 Thread Lydia Ickler

Hi all, as mentioned before I am trying to import the RowMatrix from Spark to Flink… In the code I already ran into a dead end… In the function multiplyGramianMatrixBy() (see end of mail) there is the line: rows.context.broadcast(v) (rows is a DataSet[Vector] What exactly is this line doing? Do

Re: Flink cluster and Java 8

2016-02-04 Thread Maximilian Michels

Hi Flavio, To address your points: 1) It runs. That's fine. 2) It doesn't work to run a Java 8 compiled Flink job with Java 7 Flink cluster if you use Java 8 non-backwards-compatible features in your job. 3) I compile Flink daily with Java 8. Also, we have Travis CI tests which uses OpenJDK and O

RE: Distribution of sinks among the nodes

2016-02-04 Thread Gwenhael Pasquiers

Okay ; Then I guess that the best we can do is to disable chaining (we really want one thread per operator since they are doing long operations) and have the same parallelism for sinks as mapping : that way each map will have it’s own sink and there will be no exchanges between flink instances.

Re: FlinkML 0.10.1 - Using SparseVectors with MLR does not work

2016-02-04 Thread Till Rohrmann

Hi Sourigna, it turned out to be a bug in the GradientDescent implementation which cannot handle sparse gradients. That is not so problematic by itself, because the sum of gradient vectors is usually dense even if the individual gradient vectors are sparse. We simply forgot to initialize the initi

Re: Flink cluster and Java 8

2016-02-04 Thread Flavio Pompermaier

Flink compiles correctly using java 8 as long as you leave java 1.7 source and target in the maven java compiler. If you change them to 1.8 flink-core doesn't compile anymore. On Thu, Feb 4, 2016 at 4:23 PM, Maximilian Michels wrote: > Hi Flavio, > > To address your points: > > 1) It runs. That'

Re: release of task slot

2016-02-04 Thread Till Rohrmann

Hi Radu, what does the log of the TaskManager 10.204.62.80:57910 say? Cheers, Till On Wed, Feb 3, 2016 at 6:00 PM, Radu Tudoran wrote: > Hello, > > > > > > I am facing an error which for which I cannot figure the cause. Any idea > what could cause such an error? > > > > > > > > java.lang.Exc

Re: release of task slot

2016-02-04 Thread Gyula Fóra

Hey, I am actually facing a similar issue lately, where the job manager release the task slots as it cannot contact the taskmanager. Meanwhile the taskmanager is also trying to connect to the Jobmanager and fails multiple times. This happens on multiple taskmanagers seemingly randomly. So the TM

RE: release of task slot

2016-02-04 Thread Radu Tudoran

Hi, Well…yesterday when I looked into it there was no additional info than the one I have send. Today I reproduced the problem and I could see in the log file. akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scal

Re: release of task slot

2016-02-04 Thread Stephan Ewen

@Gyula Do you see log messages about quarantined actor systems? There may be an issue with Akka Death watches that once the connection is lost, it cannot be re-established unless the TaskManager is restarted http://doc.akka.io/docs/akka/current/scala/remoting.html#Lifecycle_and_Failure_Recovery_M

Re: release of task slot

2016-02-04 Thread Gyula Fóra

Yes exactly , it says it is quarantined. Gyula Gyula On Thu, Feb 4, 2016 at 4:09 PM Stephan Ewen wrote: > @Gyula Do you see log messages about quarantined actor systems? > > There may be an issue with Akka Death watches that once the connection is > lost, it cannot be re-established unless the

Re: release of task slot

2016-02-04 Thread Stephan Ewen

Okay, here are the docs for the Akka version we are using: http://doc.akka.io/docs/akka/2.3.14/scala/remoting.html#Lifecycle_and_Failure_Recovery_Model It says that after a remote deathwatch trigger, the actor system must be restarted before it can connect again. We probably need to do the follow

Re: DistributedMatrix in Flink

2016-02-04 Thread Till Rohrmann

Hi Lydia, Spark and Flink are not identical. Thus, you’ll concepts in both system which won’t have a corresponding counter part in the other system. For example, rows.context.broadcast(v) broadcasts the value v so that you can use it on all Executors. Flink follows a slightly different concept whe

Re: release of task slot

2016-02-04 Thread Stephan Ewen

We should probably add to the TaskManager a "restart on quarantined" strategy anyways. We can detect it as follows: http://stackoverflow.com/questions/32471088/akka-cluster-detecting-quarantined-state On Thu, Feb 4, 2016 at 5:18 PM, Stephan Ewen wrote: > Okay, here are the docs for the Akka ve

Re: Flink cluster and Java 8

2016-02-04 Thread Maximilian Michels

I see. Did you perform a full "mvn clean package -DskipTests" after you changed the source level to 1.8? On Thu, Feb 4, 2016 at 4:33 PM, Flavio Pompermaier wrote: > Flink compiles correctly using java 8 as long as you leave java 1.7 source > and target in the maven java compiler. > If you change

Re: Flink cluster and Java 8

2016-02-04 Thread Flavio Pompermaier

yes I did On Thu, Feb 4, 2016 at 5:44 PM, Maximilian Michels wrote: > I see. Did you perform a full "mvn clean package -DskipTests" after > you changed the source level to 1.8? > > On Thu, Feb 4, 2016 at 4:33 PM, Flavio Pompermaier > wrote: > > Flink compiles correctly using java 8 as long as y

Re: Distribution of sinks among the nodes

2016-02-04 Thread Aljoscha Krettek

I added a new Ticket: https://issues.apache.org/jira/browse/FLINK-3336 This will implement the data shipping pattern that you mentioned in your initial mail. I also have the Pull request almost ready. > On 04 Feb 2016, at 16:25, Gwenhael Pasquiers > wrote: > > Okay ; > > Then I guess that t

Flink_Kafka

2016-02-04 Thread Vinaya M S

Hi, I'm new to Flink. I was working with Flink and Kafka before. It was working fine. But now Flink is not able to consume data from Kafka even with newly created topic. This is the log: 15:12:59,684 INFO org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer - No prior offsets found f

Re: Flink_Kafka

2016-02-04 Thread Robert Metzger

Hi, I don't understand what you are trying to achieve. If you want to read the topic from the beginning, use a different group.id. Flink should consume data from the topic when you produce something into it. As you can see from the log statement, its at the "INFO" log level, hence its not an issue

GC on TaskManagers stats

2016-02-04 Thread Guido

Hello, I have few questions regarding garbage collector’s stats on Taskmanagers and any help or further documentation would be great. I have collected “1 second polling requesting" stats on 7 Taskmanagers, through the relative request (/taskmanagers//) of the Monitoring REST API while a job, t

Re: Stream conversion

2016-02-04 Thread Jeyhun Karimov

For example, I will do aggregate operations with other windows (n-window aggregations) that are already outputted. I tried your suggestion and used filesystem sink, outputted to HDFS. I got k files in HDFS directory where k is the number of parallelism (I used single machine). These files get bigg

50 matches

Mail list logo