Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-12-03 Thread Piotr Nowojski
Hi, Yes, it is only related to **batch** jobs, but not necessarily only to DataSet API jobs. If you are using for example Blink SQL/Table API to process some bounded data streams (tables), it could also be visible/affected there. If not, I would suggest to start a new user mailing list question

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-12-02 Thread Victor Wong
Hi, We encountered similar issues that the task manager kept being killed by yarn. - flink 1.9.1 - heap usage is low. But our job is a **streaming** job, so I want to ask if this issue is only related to **batch** job or not? Thanks! Best, Victor yingjie 于2019年11月28日周四 上午11:43写道: > Piotr is

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-27 Thread yingjie
Piotr is right, that depend on the data size you are reading and the memory pressure. Those memory occupied by mmapped region can be recycled and used by other processes if memory pressure is high, that is, other process or service on the same node won't be affected because the OS will recycle the

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
> using mmap for data reading? > > > Best, > Jiayi Liao > > Original Message > Sender: yingjie > Recipient: user > Date: Tuesday, Nov 26, 2019 18:10 > Subject: Re: CoGroup SortMerger performance degradatio

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread bupt_ljy
right about this. @yingjie Do you have any idea how much memory will be stolen from OS when using mmap for data reading? Best, Jiayi Liao Original Message Sender: yingjie Recipient: user Date: Tuesday, Nov 26, 2019 18:10 Subject: Re: CoGroup SortMerger performance degradation from 1.6.4

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread Piotr Nowojski
Thanks for the confirmation, I’ve created Jira ticket to track this issue [1] https://issues.apache.org/jira/browse/FLINK-14952 Piotrek > On 26 Nov 2019, at 11:10, yingjie wrote: > > The new BlockingSubpartition implementation in 1.9 uses mm

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-26 Thread yingjie
The new BlockingSubpartition implementation in 1.9 uses mmap for data reading by default which means it steals memory from OS. The mmaped region memory is managed by JVM, so there should be no OutOfMemory problem reported by JVM and the OS memory is also not exhausted, so there should be no kernal

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-25 Thread Piotr Nowojski
rs which fed into a CoGroup? > > // ah > > <>From: Zhijiang > Sent: Thursday, November 21, 2019 9:48 PM > To: Hailu, Andreas [Engineering] ; Piotr > Nowojski > Cc: user@flink.apache.org > Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.

RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-22 Thread Hailu, Andreas
? This is why we were seeing failures in our pipelines which had operators which fed into a CoGroup? // ah From: Zhijiang Sent: Thursday, November 21, 2019 9:48 PM To: Hailu, Andreas [Engineering] ; Piotr Nowojski Cc: user@flink.apache.org Subject: Re: CoGroup SortMerger performance

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-21 Thread Zhijiang
. // ah From: Piotr Nowojski On Behalf Of Piotr Nowojski Sent: Thursday, November 21, 2019 10:14 AM To: Hailu, Andreas [Engineering] Cc: Zhijiang ; user@flink.apache.org Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1? Hi, I would suspect this: https://issues.apache.org

RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-21 Thread Hailu, Andreas
Thanks, Piotr. We’ll rerun our apps today with this and get back to you. // ah From: Piotr Nowojski On Behalf Of Piotr Nowojski Sent: Thursday, November 21, 2019 10:14 AM To: Hailu, Andreas [Engineering] Cc: Zhijiang ; user@flink.apache.org Subject: Re: CoGroup SortMerger performance

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-21 Thread Piotr Nowojski
<>From: Zhijiang > Sent: Wednesday, November 20, 2019 11:32 PM > To: Hailu, Andreas [Engineering] ; > user@flink.apache.org > Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1? > > Hi Andreas, > > You are running a batch job, so the

RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-21 Thread Hailu, Andreas
6.65GB, so it sounds like the problem lies somewhere in the changes around mapped memory. // ah From: Zhijiang Sent: Wednesday, November 20, 2019 11:32 PM To: Hailu, Andreas [Engineering] ; user@flink.apache.org Subject: Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1? Hi Andreas

Re: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-20 Thread Zhijiang
Hi Andreas, You are running a batch job, so there should be no native memory used by rocked state backend. Then I guess it is either heap memory or direct memory over used. The heap managed memory is mainly used by batch operators and direct memory is used by network shuffle. Can you further ch

RE: CoGroup SortMerger performance degradation from 1.6.4 - 1.9.1?

2019-11-20 Thread Hailu, Andreas
Going through the release notes today - we tried fiddling with the taskmanager.memory.fraction option, going as low as 0.1 with unfortunately no success. It still leads to the container running beyond physical memory limits. // ah From: Hailu, Andreas [Engineering] Sent: Tuesday, November 19, 2

Re: coGroup exception or something else in Gelly job

2017-06-22 Thread Kaepke, Marc
Hi Greg if you have an idea, I'm still interested. In case you didn't, please give me a feedback too. Best, Marc Sent from my iPhone On 15. Jun 2017, at 15:19, Kaepke, Marc mailto:marc.kae...@haw-hamburg.de>> wrote: Hi Greg, I wanna ask if there was any news about the implementation or oppo

Re: coGroup exception or something else in Gelly job

2017-06-15 Thread Kaepke, Marc
Hi Greg, I wanna ask if there was any news about the implementation or opportunities? Thanks and best regards, Marc Am 12.06.2017 um 19:28 schrieb Kaepke, Marc mailto:marc.kae...@haw-hamburg.de>>: I’m working on an implementation of SemiClustering [1]. I used two graph models (Pregel aka. vert

Re: coGroup exception or something else in Gelly job

2017-06-12 Thread Kaepke, Marc
I’m working on an implementation of SemiClustering [1]. I used two graph models (Pregel aka. vertexcentric iteration and gather scatter). Short description of the algorithm * input is a weighted, undirected graph * output are greedy clusters * Each vertex V maintains a list contain

Re: coGroup exception or something else in Gelly job

2017-06-12 Thread Greg Hogan
I don't think it is possible for anyone to debug your exception without the source code. Storing the adjacency list within the Vertex value is not scalable. Can you share a basic description of the algorithm you are working to implement? On Mon, Jun 12, 2017 at 5:47 AM, Kaepke, Marc wrote: > It

Re: coGroup exception or something else in Gelly job

2017-06-12 Thread Kaepke, Marc
It seems Flink used a different exception graph outside of my IDE (intellij) The job anatomy is: load data from csv and build an initial graph => reduce that graph (remove loops and combine multi edges) => extend the modified graph by a new vertex value => run a gather-scatter iteration I have

Re: coGroup exception or something else in Gelly job

2017-06-09 Thread Greg Hogan
Have you looked at org.apache.flink.gelly.GraphExtension.CustomVertexValue.createInitSemiCluster(CustomVertexValue.java:51)? > On Jun 9, 2017, at 4:53 PM, Kaepke, Marc wrote: > > Hi everyone, > > I don’t have any exceptions if I execute my Gelly job in my IDE (local) > directly. > The next s

Re: Cogroup hints/performance

2017-02-07 Thread Fabian Hueske
Hi Billy, A CoGroup does not have any freedom in its execution strategy. It requires that both inputs are partitioned on the grouping keys and are then performs a local sort-merge join, i.e, both inputs are sorted. Existing partitioning or sort orders can be reused. Since there is only one execut

Re: cogroup

2015-06-29 Thread Michele Bertoni
ok thanks! then by now i will use it until true outer join is ready Il giorno 29/giu/2015, alle ore 18:22, Fabian Hueske mailto:fhue...@gmail.com>> ha scritto: Yes, if you need outer join semantics you have to go with CoGroup. Some members of the Flink community are working on true outer joins

Re: cogroup

2015-06-29 Thread Fabian Hueske
Yes, if you need outer join semantics you have to go with CoGroup. Some members of the Flink community are working on true outer joins for Flink, but I don't know what the progress is. Best, Fabian 2015-06-29 18:05 GMT+02:00 Michele Bertoni : > thanks both for answering, > that’s what i expect

Re: cogroup

2015-06-29 Thread Michele Bertoni
thanks both for answering, that’s what i expected I was using join at first but sadly i had to move from join to cogroup because I need outer join the alternative to the cogroup is to “complete” the inner join extracting from the original dataset what did not matched in the cogroup by differenc

Re: cogroup

2015-06-29 Thread Fabian Hueske
If you just want to do the pairwise comparison try join(). Join is an inner join and will give you all pairs of elements with matching keys. For CoGroup, there is no other way than collecting one side in memory. Best, Fabian 2015-06-29 17:42 GMT+02:00 Matthias J. Sax : > Why do you not use a joi

Re: cogroup

2015-06-29 Thread Matthias J. Sax
Why do you not use a join? CoGroup seems not to be the right operator. -Matthias On 06/29/2015 05:40 PM, Michele Bertoni wrote: > Hi I have a question on cogroup > > when I cogroup two dataset is there a way to compare each element on the left > with each element on the right (inside a group) w

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Mustafa Elbehery
@ Stephan, I was trying to follow the concept of *Nest Join. *In other words, I wanted to follow certain implementation to achieve my goal. @Fabian, Well, solving the exception this way will lead to incorrect result, as they key will always exist on one side, the iterator of the other side will co

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Fabian Hueske
I am not sure if I got your question right. You can easily prevent the NoSuchElementException, but calling next() only if hasNext() returns true. 2015-06-04 11:18 GMT+02:00 Mustafa Elbehery : > Yes, Its working now .. But my assumption is that I want to join different > datasets on the common ke

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Stephan Ewen
The regular JOIN has the semantics of an inner join, filtering out cases where no matching tuple is found on one side. CoGroup follows the semantics of an outer join on groups, delivering also empty groups on some sides. On Thu, Jun 4, 2015 at 11:18 AM, Mustafa Elbehery wrote: > Yes, Its workin

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Mustafa Elbehery
Yes, Its working now .. But my assumption is that I want to join different datasets on the common key, so it will be normal to have many tuples on side, which does not exist on the other side .. How to fix that ?!! On Thu, Jun 4, 2015 at 11:00 AM, Fabian Hueske wrote: > Hi, > > one of the itera

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Fabian Hueske
Hi, one of the iterables of a CoGroup function can be empty. Calling iterator.next() on an empty iterator raises the NoSuchElementException. This is the expected behavior of the function. Are you sure your assumption about your data are correct, i.e., that the iterator should always have (at leas

Re: coGroup Iterator NoSuchElement

2015-06-04 Thread Mustafa Elbehery
Hi, public static class ComputeStudiesProfile implements CoGroupFunction { Person person; @Override public void coGroup(Iterable iterable, Iterable iterable1, Collector collector) throws Exception { Iterator iterator = iterable.iterator(); person = iterator.next();

Re: coGroup Iterator NoSuchElement

2015-06-03 Thread Stephan Ewen
Hi! The code snippet is not very revealing. Can you also shot the implementations of the CoGroupFunctions? Thanks! On Wed, Jun 3, 2015 at 3:50 PM, Mustafa Elbehery wrote: > Code Snippet :) > > DataSet updatedPersonOne = inPerson.coGroup(inStudent) >.where("name").eq

Re: coGroup Iterator NoSuchElement

2015-06-03 Thread Mustafa Elbehery
Code Snippet :) DataSet updatedPersonOne = inPerson.coGroup(inStudent) .where("name").equalTo("name") .with(new ComputeStudiesProfile()); DataSet updatedPersonTwo = updatedPersonOne.coGroup(inJobs) .where("name").equ