Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

2017-03-23 Thread Gábor Gévay
Hello, > Second, additional classes will turn performance critical callsites > megamorphic. Yes, this is a completely valid point, thanks for raising this issue Greg. We were planning to have an offline discussion tomorrow with Pattarawat about this. We have a few options: 1. We could fuse the Q

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread Gábor Gévay
Hello, Pat, the table in your email is somehow not visible in my gmail, but it is visible here: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLINK-5734-Code-Generation-for-NormalizedKeySorter-tt15804.html#a15936 Maybe the problem is caused by the formatting. > FLINK-3722 > appro

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-08 Thread Gábor Gévay
Hello Till, > Why did you decide to generate the code on the TMs? If we generated on the client side, then we would need to serialize instances of the generated classes when shipping the job to the TMs, but we would really like to avoid serializing instances of the generated classes. In the other

Backpressure rationale

2017-01-12 Thread Gábor Gévay
Hello, I would like to ask about the rationale behind the backpressure mechanism in Flink. As I understand it, backpressure is for handling the problem of one operator (or source) producing records faster then the next operator can consume them. However, an alternative solution would be to have a

Re: [DISCUSS] Add Side Input/Broadcast Set For Streaming API

2016-12-20 Thread Gábor Gévay
Hello, I am also interested in this feature for a paper that I'm writing. I have the "slowly evolving side input" case with a complicated custom "update precondition" that would be expressible by a stateful UDF that makes its decisions from looking at the elements of the main stream. Best, Gábor

Re: [DISCUSS] FLIP-14: Loops API and Termination

2016-12-15 Thread Gábor Gévay
Hi Paris and Fouad, I finally had some time to delve into this. Thanks for the nice proposal! +1 for also having a CoLoopFunction. That might be useful even if the input and feedback have the same type, as it might happen that I want to treat the elements coming on the feedback in a different way

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

2016-10-27 Thread Gábor Gévay
Hello, Thanks Fabian for starting this discussion. I would just like to add a few thougths about why does the FieldAccessors even exist. One could say that we should instead just re-use ExpressionKeys for the aggregations, and we are done. As far as I can see, the way ExpressionKeys is often used

Re: Contribution

2016-09-09 Thread Gábor Gévay
Hi Hasan, Welcome! There is the "starter" label on some Jiras, which means that the issue is good for getting started. Best, Gábor 2016-09-09 13:46 GMT+02:00 Hasan Gürcan : > Hi devs, > > i contributed to Stratosphere as I was studying computer science at the FU > Berlin. Currently i am work

Nested field expressions PR

2016-08-14 Thread Gábor Gévay
Hello, I have a PR for making FieldAccessors support nested field expressions (like .sum("f1.foo.bar")) that has been open for quite a long time: https://github.com/apache/flink/pull/2094 Although it looks like that it is a lot of code, most of it is actually fairly straightforward, so it shouldn'

Re: N-ary stream operators - status

2016-08-10 Thread Gábor Gévay
ers, > Aljoscha > > On Tue, 9 Aug 2016 at 17:54 Gábor Gévay wrote: > >> Hello, >> >> There is this Google Doc about adding n-ary stream operators to Flink: >> >> https://docs.google.com/document/d/1ZFzL_0xGuUEnBsFyEiHwWcmCcjhd9ArWsmhrgnt05RI/edit >>

N-ary stream operators - status

2016-08-09 Thread Gábor Gévay
Hello, There is this Google Doc about adding n-ary stream operators to Flink: https://docs.google.com/document/d/1ZFzL_0xGuUEnBsFyEiHwWcmCcjhd9ArWsmhrgnt05RI/edit I would like to ask what are the plans for when will this feature be available? Best, Gábor

Re: Iteration Intermediate Output

2016-05-30 Thread Gábor Gévay
Hello, > Would the best way be to extend the iteration operators to support > intermediate outputs or revisit the idea of caching intermediate results > and thus allow efficient for-loop iterations? Caching intermediate results would also help a lot to projects that are targeting Flink as a backe

Re: Dataset split/demultiplex

2016-05-13 Thread Gábor Gévay
;> > > >> Collector[URLOutputData]) { >> > > > >> > > > iter.foreach { >> > > > >> > > > i => >> > > > >> > > > try { >> > > > >> > > > if (predicate(i)) >> > > > >> >

Re: Dataset split/demultiplex

2016-05-12 Thread Gábor Gévay
Hello, You can split a DataSet into two DataSets with two filters: val xs: DataSet[A] = ... val split1: DataSet[A] = xs.filter(f1) val split2: DataSet[A] = xs.filter(f2) where f1 and f2 are true for those elements that should go into the first and second DataSets respectively. So far, the splits

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Gábor Gévay
Hello, There are at least three Gábors in the Flink community, :) so assuming that the Gábor in the list of maintainers of the DataSet API is referring to me, I'll be happy to do it. :) Best, Gábor G. 2016-05-10 11:24 GMT+02:00 Stephan Ewen : > Hi everyone! > > We propose to establish some li

Re: [DISCUSS] Macro-benchmarking for performance tuning and regression detection

2016-04-09 Thread Gábor Gévay
Hello, I think that creating a macro-benchmarking module would be a very good idea. It would make doing performance-related changes much easier and safer. I have also used Peel, and can confirm that it would be a good fit for this task. > I've also been looking recently at some of the hot code a

Re: Guarantees for object reuse modes and documentation

2016-02-20 Thread Gábor Gévay
Thanks, Ken! I was wondering how other systems handle these issues. Fortunately, the deep copy - shallow copy problem doesn't arise in Flink: when we copy an object, it is always a deep copy (at least, I hope so :)). Best, Gábor 2016-02-19 22:29 GMT+01:00 Ken Krugler : > Not sure how useful th

Re: Memory manager behavior in iterative jobs

2016-02-02 Thread Gábor Gévay
s the original behavior and does not have > any downside effects for batch programs. > > The effect of the switch on the performance of iterative jobs is > interesting and it sounds like it should be improved. > > Best, Fabian > > 2016-01-30 14:04 GMT+01:00 Gábor Gévay : >

Memory manager behavior in iterative jobs

2016-01-30 Thread Gábor Gévay
Hello! We have a strangely behaving iterative Flink job: when we give it more memory, it gets much slower (more than 10 times). The problem seems to be mostly caused by GCs. Enabling object reuse didn’t help. With some profiling and debugging, we traced the problem to the operators requesting new

Re: Object reuse documentation should be improved

2015-12-14 Thread Gábor Gévay
tricky to know whether stuff is chained or not > (for users, and even for us developers…). > > >> On 13 Dec 2015, at 19:24, Gábor Gévay wrote: >> >> Hello, >> >> I find the documentation about object reuse [1] very confusing. I >> started a Google

Object reuse documentation should be improved

2015-12-13 Thread Gábor Gévay
Hello, I find the documentation about object reuse [1] very confusing. I started a Google Doc [2] about clarifying/rewriting it. First, it states four questions that I think should have answers stated explicitly in the documentation, and then lists some concrete problems (ambiguities) in the curr

Hash-based aggregation

2015-10-01 Thread Gábor Gévay
Hello, I would really like to see FLINK-2237 solved. I would implement this feature over the weekend, if the CompactingHashTable can be used to solve it (see my comment there). Could you please give me some advice on whether is this a viable approach, or you perhaps see some difficulties that I'm

Re: Streaming KV store abstraction

2015-09-08 Thread Gábor Gévay
Hello, As for use cases, in my old job at Ericsson we were building a streaming system that was processing data from telephone networks, and it was using key-value stores a LOT. For example, keeping track of various state info of the users (which cell are they currently connected to, what bearers

Re: Flink Forward webpage down

2015-08-14 Thread Gábor Gévay
It works for me now too. Best regards, Gabor 2015-08-14 16:10 GMT+02:00 Chiwan Park : > Currently, the site looks okay. > > Regards, > Chiwan Park > > >> On Aug 14, 2015, at 6:05 PM, Gábor Gévay wrote: >> >> Hello, >> >> I would like to subm

Flink Forward webpage down

2015-08-14 Thread Gábor Gévay
Hello, I would like to submit an abstract to Flink Forward, but the webpage of the conference (flink-forward.org) seems to be down. It prints "Error establishing a database connection" for me. It worked yesterday. Best regards, Gabor

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
this CompactingHashTable and this parameter > deactivates its usage. I am very curious what happens in your case. If you > could tell us the outcome, I'd greately appreciate it. > > > On Thu, Jul 30, 2015 at 7:17 PM, Gábor Gévay wrote: > >> Yes, in a VertexCentricItera

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
Yes, in a VertexCentricIteration with a few million nodes, running locally on my laptop with about 10 GB of memory given to java. Best, Gabor 2015-07-30 18:32 GMT+02:00 Andra Lungu : > Hi Gabor, > > Within a delta iteration right? > > On Thu, Jul 30, 2015 at 6:31 PM, Gáb

Re: A soft reminder

2015-07-30 Thread Gábor Gévay
Hi, I have also run into this problem just now. It only happens with much data. Best regards, Gabor 2015-07-27 11:35 GMT+02:00 Felix Neutatz : > Hi, > > I also encountered the EOF exception for a delta iteration with "more > data". With less data it works ... > > Best regards, > Felix > Am 27.

Re: Failing tests on Windows

2015-07-17 Thread Gábor Gévay
> BlobUtilsTest.before:45 null > BlobUtilsTest.before:45 null > BlobServerDeleteTest.testDeleteFails:291 null > BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not > remove write permissions from cache directory > BlobServerPutTest.testPutBufferFails:224 null > BlobServerP

Failing tests on Windows

2015-07-17 Thread Gábor Gévay
Hello! I tried to setup a development environment on Windows, but several tests are failing: 1. The setWritable problem. This will be worked around by [1] 2. The tryCleanupOnError before close problem [2]. This could be half-fixed by doing fixing 2. in the comment I wrote there, but I think that

Re: Thoughts About Streaming

2015-06-25 Thread Gábor Gévay
uple 0,6847279255682044995) > (Tuple 0,6847279255682044995) > (Tuple 0,-5390528042713628318) > (Tuple 0,-5390528042711551780) > (Tuple 0,-5390528042711551780) > > So at some point the pre-reducer seems to go haywire and does not recover > from it. The good thing is that it do

Re: Thoughts About Streaming

2015-06-25 Thread Gábor Gévay
Hello, Aljoscha, can you please try the performance test of Current/Reduce with the InversePreReducer in PR 856? (If you just call sum, it will use an InversePreReducer.) It would be an interesting test, because the inverse function optimization really depends on the stream being ordered, and I th

Re: Question: SourceFunction

2015-06-22 Thread Gábor Gévay
Hi, There is one more tricky issue here if the variable is not volatile, which can cause a problem on any architecture: If the compiler determines that the code inside the loop will never modify isRunning, then it might "optimize" the exit condition into just while(true). And this can actually ha

Re: Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
an me can estimate how likely this is? Best regards, Gabor 2015-06-10 15:55 GMT+02:00 Gábor Gévay : >> "it does not feel right to add an API package to a core package > > Yes, that makes sense. I just tried removing the flink-java dependency > from flink-streaming to see wha

Re: Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
hed some light on the required dependencies. >> >> Cheers, >> Till >> >> >> On Wed, Jun 10, 2015 at 2:13 PM Gábor Gévay wrote: >> >>> Hello, >>> >>> I would like to ask if it would be OK if I added flink-scala as a >>> dep

Adding flink-scala as a dependency to flink-streaming-core

2015-06-10 Thread Gábor Gévay
Hello, I would like to ask if it would be OK if I added flink-scala as a dependency to flink-streaming-core. An alternative would be to move the Scala typeutils to flink-core (to where the Java typeutils are). Why I need this: While I am implementing the fast median calculation for windows as par

Re: Memleak in the SessionWindowing example

2015-06-08 Thread Gábor Gévay
at 10:01 PM, Gábor Gévay wrote: > >> > Let's not get all dramatic :D >> >> Ok, sorry :D >> >> > If we don't call any methods on the empty groups we can still keep them >> > off-memory in a persistent storage with a lazy checkpoint/state-access

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
ll keep them > off-memory in a persistent storage with a lazy checkpoint/state-access > logic with practically 0 memory overhead. > > Automatically dropping everything will break a lot of programs without > people noticing. > > On Thu, May 28, 2015 at 7:48 PM, Gábor Gévay wrote

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
there are already empty >> windows for a group why not drop the previous states? >> >> On Thu, May 28, 2015 at 3:01 PM, Gábor Gévay wrote: >> >> > Hi, >> > >> > At Ericsson, we are implementing something similar to what the >> > Sessi

Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
Hi, At Ericsson, we are implementing something similar to what the SessionWindowing example does: There are events belonging to phone calls (sessions), and every event has a call_id, which tells us which call it belongs to. At the end of every call, a large event has to be emitted that contains s

Re: GSoC proposal

2015-03-26 Thread Gábor Gévay
e should also add some practical >> stuff like top-k, distinct etc). >> >> Here is a list of interesting papers that seems to be related to this >> project >> >> https://gist.github.com/debasishg/8172796 >> >> Cheers, >> Gyula >> >> On

GSoC proposal

2015-03-26 Thread Gábor Gévay
Hello, I will be applying to the Google Summer of Code, and I wrote most of the proposal: http://compalg.inf.elte.hu/~ggevay/Proposal.pdf I would appreciate it if you could comment on it. Gyula Fora, git blame is telling me that you wrote most of the relevant parts of the windowing code, so I wou