Re: Documentation Error

2015-06-30 Thread Stephan Ewen
+1 for moving the FAQ to the website. On Tue, Jun 30, 2015 at 5:09 PM, Robert Metzger wrote: > +1 > lets remove the FAQ from the source repo and put it on the website only. > > On Thu, Jun 25, 2015 at 3:14 PM, Ufuk Celebi wrote: > >> >> On 25 Jun 2015, at 14:31, Maximilian Michels wrote: >> >>

Re: OutOfMemoryException: unable to create native thread

2015-06-30 Thread Stephan Ewen
I agree with Aljoscha and Ufuk. As said, it will be hard for the system (currently) to handle 1500 sources, but handling a parallel source with 1500 files will be very efficient. This is possible, if all sources (files) deliver the same data type and would be unioned. If that is true, you can -

Re: OutOfMemoryException: unable to create native thread

2015-06-30 Thread Ufuk Celebi
Hey Chan, the problem is that all sources are scheduled at once for pipelined execution mode (default). There is work in progress to support your workload better in batch execution mode, e.g. run each source one after the other and materialize intermediate results. This will hopefully be in the

Re: OutOfMemoryException: unable to create native thread

2015-06-30 Thread Aljoscha Krettek
Hi Chan, Flink sources support giving a directory as an input path in a source. If you do this it will read each of the files in that directory. They way you do it leads to a very big plan, because the plan will be replicated 1500 times, this could lead to the OutOfMemoryException. Is there a spec

OutOfMemoryException: unable to create native thread

2015-06-30 Thread chan fentes
Hello, how many data sources can I use in one Flink plan? Is there any limit? I get an java.lang.OutOfMemoryException: unable to create native thread when having approx. 1500 files. What I basically do is the following: DataSource ->Map -> Map -> GroupBy -> GroupReduce per file and then Union -> G

Re: Execution graph

2015-06-30 Thread Maximilian Michels
Yes, the web client always shows parallelism 1. That is a bug but it does not affect the execution of your program. If you specify the default parallelism in your Flink config, you don't have to set it in your program or via the command line argument (-p). However, if you leave it at its default a

Re: Documentation Error

2015-06-30 Thread Robert Metzger
+1 lets remove the FAQ from the source repo and put it on the website only. On Thu, Jun 25, 2015 at 3:14 PM, Ufuk Celebi wrote: > > On 25 Jun 2015, at 14:31, Maximilian Michels wrote: > > > Thanks for noticing, Chiwan. I have the feeling this problem arose when > the website was updated. The pr

Re: Datasets union CompilerException

2015-06-30 Thread Stephan Ewen
I don't think it is related, but another bug... On Tue, Jun 30, 2015 at 4:44 PM, Michele Bertoni < michele1.bert...@mail.polimi.it> wrote: > Hi yesterday on the union I faced an other problem: > at runtime it was saying something like “Union cannot work with dataset of > two different types” the

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Maximilian Michels
HI Mihail, Thank you for your question. Do you have a short example that reproduces the problem? It is hard to find the cause without an error message or some example code. I wonder how your loop works without WriteMode.OVERWRITE because it should throw an exception in this case. Or do you change

Re: Execution graph

2015-06-30 Thread Michele Bertoni
Hi everybody and thanks for the answer So if I understood you said that apart from some operation, most of them are executed at the default parallelism value (that is what I expected) but the viewer will always show 1 if something different is not set via setParallelism is it right? I don’t ha

Re: Datasets union CompilerException

2015-06-30 Thread Michele Bertoni
Hi yesterday on the union I faced an other problem: at runtime it was saying something like “Union cannot work with dataset of two different types” then it was showing the types and they were exactly the same (Tuple5 I solved it changing on field of the tuple from a custom object (MyClass) that

Re: The slot in which the task was scheduled has been killed (probably loss of TaskManager)

2015-06-30 Thread Andra Lungu
Hey Till, I managed to reproduce the bug; The logs are in the corresponding JIRA [hopefully I got the right ones :)]: FLINK-2299 As a side line. Guys, these two issues (FLINK-2299 and FLINK-2293

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
I think my problem is related to a loop in my job. Before the loop, the writeAsCsv method works fine, even in overwrite mode. In the loop, in the first iteration, it writes an empty folder containing empty files to HDFS. Even though the DataSet it is supposed to write contains elements. Need

Re: Apache Flink and serious streaming stateful processing

2015-06-30 Thread Stephan Ewen
Hi Krzysztof, Thanks for the kind words! I think that Flink is to a good extend set up and provide what you are looking for. The remaining gaps are WIP. Let me elaborate a bit on Gyula's answers: 1) Backpressure is very much there, it has always been working well, also better than in Storm, as f

Re: Apache Flink and serious streaming stateful processing

2015-06-30 Thread Ufuk Celebi
On 30 Jun 2015, at 14:23, Gyula Fóra wrote: > 2. We have support for stateful processing in Flink in many ways you have > described in your question. Unfortunately the docs are down currently but you > should check out the 'Stateful processing' section in the 0.10 docs (once its > back online)

Re: Apache Flink and serious streaming stateful processing

2015-06-30 Thread Gyula Fóra
Hi Krzysztof, Thank you for your questions, we are happy to help you getting started. Regarding your questions: 1. There is backpressure for the streams, so if the downstream operators cannot keep up the sources will slow down. 2. We have support for stateful processing in Flink in many ways yo

Apache Flink and serious streaming stateful processing

2015-06-30 Thread Krzysztof Zarzycki
Greetings! I'm extremely interested in Apache Flink, I think you're doing really a great job! But please allow me to share two things that I would require from Apache Flink to consider it as groundbreaking (it is what I need for Streaming framework): 1. Stream backpressure. When stream processing

Re: Flink documentation is offline

2015-06-30 Thread Maximilian Michels
Cool. Thanks! On Tue, Jun 30, 2015 at 12:22 PM, Ufuk Celebi wrote: > > On 30 Jun 2015, at 11:30, Chiwan Park wrote: > > > Hi, > > > > We already know this issue. There are some problems in Apache > Infrastructure. > > Infra Team is working on this issue. You can see progress via a blog > post [

Re: Flink documentation is offline

2015-06-30 Thread Ufuk Celebi
On 30 Jun 2015, at 11:30, Chiwan Park wrote: > Hi, > > We already know this issue. There are some problems in Apache Infrastructure. > Infra Team is working on this issue. You can see progress via a blog post [1]. > It will be okay soon. I've pushed a manual build of the docs. If everything w

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
Hi Till, thank you for your reply. I have the following code snippet: /intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath, "\n", ";", WriteMode.OVERWRITE);/ When I remove the WriteMode parameter, it works. So I can reason that the DataSet contains data elements. Cheers, Mihail

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Till Rohrmann
Hi Mihail, have you checked that the DataSet you want to write to HDFS actually contains data elements? You can try calling collect which retrieves the data to your client to see what’s in there. Cheers, Till ​ On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru < vi...@informatik.hu-berlin.de> wrote

writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-06-30 Thread Mihail Vieru
Hi, the writeAsCsv method is not writing anything to HDFS (version 1.2.1) when the WriteMode is set to OVERWRITE. A file is created but it's empty. And no trace of errors in the Flink or Hadoop logs on all nodes in the cluster. What could cause this issue? I really really need this feature..

Re: Flink documentation is offline

2015-06-30 Thread Chiwan Park
Hi, We already know this issue. There are some problems in Apache Infrastructure. Infra Team is working on this issue. You can see progress via a blog post [1]. It will be okay soon. Regards, Chiwan Park [1] https://blogs.apache.org/infra/entry/buildbot_master_currently_off_line > On Jun 30,

Flink documentation is offline

2015-06-30 Thread LINZ, Arnaud
Hello, You are probably aware of the issue, but currently every access to the documentation from https://flink.apache.org (http://ci.apache.org/projects/flink) leads to a “No Such Resource” page. Best regards, Arnaud L'intégrité de ce message n'étant pas as

Re: Datasets union CompilerException

2015-06-30 Thread Fabian Hueske
Also, can you open a JIRA for the issue? Otherwise it might get lost on the mailing list. Thanks you! 2015-06-30 10:56 GMT+02:00 Fabian Hueske : > Hi, is it possible to get a smaller version of that program that > reproduces the bug or give a few more details about the structure of the > job? >

Re: Datasets union CompilerException

2015-06-30 Thread Fabian Hueske
Hi, is it possible to get a smaller version of that program that reproduces the bug or give a few more details about the structure of the job? Without any hints, it is very hard to reproduce and fix the bug. 2015-06-24 18:23 GMT+02:00 Flavio Pompermaier : > Unfortunately not in public..moreover t

Re: The slot in which the task was scheduled has been killed (probably loss of TaskManager)

2015-06-30 Thread Till Rohrmann
Do you have the JobManager and TaskManager logs of the corresponding TM, by any chance? On Mon, Jun 29, 2015 at 8:12 PM, Andra Lungu wrote: > Something similar in flink-0.10-SNAPSHOT: > > 06/29/2015 10:33:46 CHAIN Join(Join at main(TriangleCount.java:79)) -> > Combine (Reduce at main(Triangl

Re: Execution graph

2015-06-30 Thread Fabian Hueske
As an addition, some operators can only be run with a parallelism of 1. For example data sources based on collections and (un-grouped) all reduces. In some cases, the parallelism of the following operators will as well be set to 1 to avoid a network shuffle. If you do: env.fromCollection(myCollec

Re: Execution graph

2015-06-30 Thread Ufuk Celebi
The web client currently does not support to configure the parallelism. There is an issue for it. So it will soon be fixed. --- What you can do right now: 1) Either configure the following key in flink-conf.yaml parallelism.default: PARALLELISM 2) Or set it via the environment: final Executi

Re: Execution graph

2015-06-30 Thread Maximilian Michels
Hi Michele, If you don't set the parallelism, the default parallelism is used. For the visualization in the web client, a parallelism of one is used. When you run your example from your IDE, the default parallelism is set to the number of (virtual) cores of your CPU. Moreover, Flink will currentl