Re: Flink application does not scale as expected, please help!

2018-06-18 Thread Ovidiu-Cristian MARCU
Hi all, Allow me to add some comments/questions on this issue that is very interesting. According to documentation [1] the pipeline example assumes the source is running with the same parallelism as successive map operator and the workflow optimizes to collocate source and map tasks if possible.

Re: parallelism for window operations

2017-01-27 Thread Ovidiu-Cristian MARCU
10:43, Ovidiu-Cristian MARCU > wrote: > > Thank you, Fabian! > > It works, what I did and results, as an example for other users: > Total slots occupied are 7 (not sure how to check that Source + Flat Map are > in the same slot, assumed slot S1 will be that; also

Re: Monitoring REST API

2016-12-21 Thread Ovidiu-Cristian MARCU
Hi Lydia, I have used sar monitoring (sar -u -n DEV -p -d -r 1) and plotted the average over multiple nodes. 1)So for each node you can collect the sar output, and obtain for example: Linux 3.2.0-4-amd64 (parasilo-4.rennes.grid5000.fr) 2016-01-27 _x86_64_(16 CPU) 12:54:09

Re: Parameters to Control Intra-node Parallelism

2016-07-13 Thread Ovidiu-Cristian MARCU
s case) > than what's suggested in Flink (#slots-per-TM^2 * #TMs * 4, which would be > 12*12*32*4 = 18432). Otherwise, it would throw me the not enough buffers > error. > > Thank you, > Saliya > > > > On Tue, Jul 12, 2016 at 7:39 AM, Ovidiu-Cristian MARCU &

Re: Parameters to Control Intra-node Parallelism

2016-07-12 Thread Ovidiu-Cristian MARCU
Hi, Can you post your configuration parameters (exclude default settings) and cluster description? Best, Ovidiu > On 11 Jul 2016, at 17:49, Saliya Ekanayake wrote: > > Thank you Greg, I'll check if this was the cause for my TMs to disappear. > > On Mon, Jul 11, 2016 at 11:34 AM, Greg Hogan <

Re: Optimizations not performed - please confirm

2016-06-29 Thread Ovidiu-Cristian MARCU
s are done in the > Table API/SQL that will be be released in an updated version in 1.1. > > Cheers, > Aljoscha > > +Timo, Explicitly adding Timo > > On Tue, 28 Jun 2016 at 21:41 Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi

Optimizations not performed - please confirm

2016-06-28 Thread Ovidiu-Cristian MARCU
Hi, The optimizer internals described in this document [1] are probably not up-to-date. Can you please confirm if this is still valid: “The following optimizations are not performed Join reordering (or operator reordering in general): Joins / Filters / Reducers are not re-ordered in Flink. This

Re: Flink Version 1.1

2016-05-18 Thread Ovidiu-Cristian MARCU
Hi We are also very interested on the SQL (SQL on Streaming) future support in the next release (even if it is partial work that works :) ) Thank you! Best, Ovidiu > On 18 May 2016, at 14:42, Stephan Ewen wrote: > > Hi! > > That question is coming up more and more. > I think we should start

What / Where / When / How questions in Spark 2.0 ?

2016-05-16 Thread Ovidiu-Cristian MARCU
Hi, We can see in [2] many interesting (and expected!) improvements (promises) like extended SQL support, unified API (DataFrames, DataSets), improved engine (Tungsten relates to ideas from modern compilers and MPP databases - similar to Flink [3]), structured streaming etc. It seems we somehow

Hash tables - joins, cogroup, deltaIteration

2016-04-18 Thread Ovidiu-Cristian MARCU
Hi, Can you please confirm if there is any update regarding the hash tables use cases, as in [1] it is specified that Hash tables are used in Joins and for the Solution set in iterations (pending work to use them for grouping/aggregations)? I am interested in the pending work progress and also

Re: Flink performance pre-packaged vs. self-compiled

2016-04-14 Thread Ovidiu-Cristian MARCU
Hi, Your assumption may be incorrect related to the TeraSort use case for eastcirclek's implementation. How many time did you run your program? It would be helpful to give more details about your experiment, in terms of configuration, dataset size. Best, Ovidiu > On 14 Apr 2016, at 17:14, Rob

Re: Not enough free slots to run the job

2016-03-21 Thread Ovidiu-Cristian MARCU
: It depends. In the example above, the job would restart. As long as there > are enough slots available, jobs will restart. > > > On Mon, Mar 21, 2016 at 3:30 PM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi Robert, > > I am not

Re: Not enough free slots to run the job

2016-03-21 Thread Ovidiu-Cristian MARCU
remaining slots. > That's why the spare slots approach is currently the only way to go. > > Regards, > Robert > > On Fri, Mar 18, 2016 at 1:30 PM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi, > > For the situation w

Re: off-heap size feature request

2016-03-19 Thread Ovidiu-Cristian MARCU
gt; the parameters to configure the amount of managed memory > (taskmanager.memory.size, taskmanager.memory.fraction) are valid for on and > off-heap memory. > > Have you tried these parameters and didn't they work as expected? > > Best, Fabian > >

Re: off-heap size feature request

2016-03-18 Thread Ovidiu-Cristian MARCU
the overall process size will be roughly > 4GB. The parameter name "taskmanager.heap.mb" is a bit confusing in case of > off-heap memory usage, because it does not define this size of the heap but > of the overall process. > > Hope this helps, > Fabian > > > > 2016

Not enough free slots to run the job

2016-03-18 Thread Ovidiu-Cristian MARCU
Hi, For the situation where a program specify a maximum parallelism (so it is supposed to use all available task slots) we can have the possibility that one of the task managers is not registered for various reasons. In this case the job will fail for not enough free slots to run the job. For m

off-heap size feature request

2016-03-16 Thread Ovidiu-Cristian MARCU
Hi, Is it possible to add a parameter off-heap.size for the task manager off-heap memory [1]? It is not possible to limit the off-heap memory size, at least I found nothing in the documentation. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#managed-memory

Re: Memory ran out PageRank

2016-03-16 Thread Ovidiu-Cristian MARCU
ely. Can you confirm this? > > The solution set for delta iterations is currently implemented as an > in-memory hash table that works on managed memory segments, but is not > spillable. > > – Ufuk > > On Mon, Mar 14, 2016 at 6:30 PM, Ovidiu-Cristian MARCU > wrote: >

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Correction: successfully CC I am running is on top of your friend, Spark :) Best, Ovidiu > On 14 Mar 2016, at 20:38, Ovidiu-Cristian MARCU > wrote: > > Yes, largely different. I was expecting for the solution set to be spillable. > This is somehow very hard limitation, the lay

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
h table that works on managed memory segments, but is not > spillable. > > – Ufuk > > On Mon, Mar 14, 2016 at 6:30 PM, Ovidiu-Cristian MARCU > wrote: >> >> This problem is surprising as I was able to run PR and CC on a larger graph >> (2bil edges) but with thi

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
that? > > Cheers, > Martin > > > On 14.03.2016 17:55, Ovidiu-Cristian MARCU wrote: >> Thank you for this alternative. >> I don’t understand how the workaround will fix this on systems with limited >> memory and maybe larger graph. >> >> Running Conn

Re: Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
x/flink-user/201508.mbox/%3CCAELUF_ByPAB%2BPXWLemPzRH%3D-awATeSz4sGz4v9TmnvFku3%3Dx3A%40mail.gmail.com%3E > > On 14.03.2016 16:55, Ovidiu-Cristian MARCU wrote: >> Hi, >> >> While running PageRank on a synthetic graph I run into this problem: >&

Memory ran out PageRank

2016-03-14 Thread Ovidiu-Cristian MARCU
Hi, While running PageRank on a synthetic graph I run into this problem: Any advice on how should I proceed to overcome this memory issue? IterationHead(Vertex-centric iteration (org.apache.flink.graph.library.PageRank$VertexRankUpdater@7712cae0 | org.apache.flink.graph.library.PageRank$RankMe

Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
d or exist as a PR [1]. So we hope to complete the partial > backtracking soon. > > [1] https://github.com/apache/flink/pull/640 > <https://github.com/apache/flink/pull/640> > > Cheers, > Till > > On Mon, Feb 22, 2016 at 6:00 PM, Ovidiu-Cristian MARCU >

Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Ovidiu-Cristian MARCU
Hi In case of failure of a node what does it mean 'Fault tolerance for programs in the DataSet API works by retrying failed executions’ [1] ? -work already done by the rest of the nodes is not lost, only work of the lost node is recomputed, job execution will continue or -entire job execution is

Re: Apache Flink Web Dashboard - Completed Job history

2015-12-16 Thread Ovidiu-Cristian MARCU
correct me if I am wrong. > > -Matthias > > On 12/16/2015 03:16 PM, Ufuk Celebi wrote: >> >>> On 16 Dec 2015, at 15:00, Ovidiu-Cristian MARCU >>> wrote: >>> >>> Hi >>> >>> If I restart the Flink I don’t see anymore the hi

Apache Flink Web Dashboard - Completed Job history

2015-12-16 Thread Ovidiu-Cristian MARCU
Hi If I restart the Flink I don’t see anymore the history of the completed jobs. Is this a missing feature or what should I do to see the completed job list history? Best regards, Ovidiu

Features with major priority/future release/s

2015-12-07 Thread Ovidiu-Cristian MARCU
Hi, Can you try to describe what is planned for the future releases and eventually link the Jira issues/bugs to it? Some very important features have a Major priority, like: [1] Add a SQL API (on top of Table API) [2] Add KMeans clustering algorithm to ML Library (kmeans ++ & ||) [3] Create eva

Re: flink connectors

2015-11-27 Thread Ovidiu-Cristian MARCU
Hi, The main question here is why the distribution release doesn’t contain the connector dependencies. It is fair to say that it does not have to (which connector to include or all). So just like Spark does, Flink offers binary distribution for hadoop only without considering other dependencies

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
asticity you mentioned. > > Yes, resource elasticity in Flink will mitigate such issues. We would be able > to respond to YARN's preemption requests if jobs with higher priorities are > requesting additional resources. > > On Fri, Nov 20, 2015 at 2:07 PM, Ovidiu-Cristian

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
#x27;ll > see what we can do. > > Regards, > Robert > > > > On Fri, Nov 20, 2015 at 1:24 PM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi, > > The link to FAQ > (https://ci.apache.org/projects/flink/f

Re: Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
; > In general, we recommend to start a YARN session per program. You can also > directly submit a Flink program to YARN. > > Where did you find the link to the FAQ? The link on the front page is > working: http://flink.apache.org/faq.html <http://flink.apache.org/faq.html

Apache Flink on Hadoop YARN using a YARN Session

2015-11-20 Thread Ovidiu-Cristian MARCU
Hi, I am currently interested in experimenting on Flink over Hadoop YARN. I am documenting from the documentation we have here: https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/yarn_setup.html

Re: Creating a representative streaming workload

2015-11-16 Thread Ovidiu-Cristian MARCU
Regarding Flink vs Spark / Storm you can check here: http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark