Re: Performance tuning

2017-02-24 Thread Dmitry Golubets
x27;ll just serialize an integer id. So the > amount of data being transferred goes down drastically. > > The disableAutoTypeRegistration flag is ignored in the DataStream API at > the moment. > > > > > > > > On Thu, Feb 23, 2017 at 7:00 PM, Dmitry Golubets > wrote: > >> Hi

Re: Performance tuning

2017-02-23 Thread Dmitry Golubets
e classes? If so, what are > you using for doing that? > > Regards, > Robert > > On Fri, Feb 17, 2017 at 9:17 PM, Dmitry Golubets > wrote: > >> Hi Daniel, >> >> I've implemented a macro that generates message pack serializers in our >> codebase. >

Re: Performance tuning

2017-02-17 Thread Dmitry Golubets
izer(..) > . > > I'm interested on knowing what have you done there for a boost of about > 50% . > > Some small or simple example would be very nice. > > Thank you very much in advance. > > Kind Regards, > > Daniel Santos > > On 02/17/2017 12:43 PM, Dmitry Go

Re: How important is 'registerType'?

2017-02-17 Thread Dmitry Golubets
t; Till > ​ > > On Fri, Feb 17, 2017 at 12:38 PM, Dmitry Golubets > wrote: > >> Hi, >> >> I was using ```cs.knownDirectSubclasses``` recursively to find and >> register subclasses, which may have resulted in order mess. >> Later I changed that to >

Performance tuning

2017-02-17 Thread Dmitry Golubets
Hi, My streaming job cannot benefit much from parallelization unfortunately. So I'm looking for things I can tune in Flink, to make it process sequential stream faster. So far in our current engine based on Akka Streams (non distributed ofc) we have 20k msg/sec. Ported to Flink I'm getting 14k so

Re: How important is 'registerType'?

2017-02-17 Thread Dmitry Golubets
ing the savepoint? Flink upgrade, Job upgrade, changing Kryo version, > changing order in which you register Kryo serialisers? > > Best, > Aljoscha > > On Fri, 10 Feb 2017 at 18:26 Dmitry Golubets wrote: > >> The docs say that it may improve performance. >> >>

Akka 2.4

2017-02-16 Thread Dmitry Golubets
Hi, Can I force Flink to use Akka 2.4 (recompile if needed)? Is it going to misbehave in a subtle way? Best regards, Dmitry

Re: A way to control redistribution of operator state?

2017-02-14 Thread Dmitry Golubets
> implementation as default. > > However, I’m not sure of the plans in exposing this to the user and making > it configurable. > Looping in Stefan (in cc) who mostly worked on this part and see if he can > provide more info. > > - Gordon > > On February 14, 2017 at 2:3

A way to control redistribution of operator state?

2017-02-13 Thread Dmitry Golubets
Hi, It looks impossible to implement a keyed state with operator state now. I know it sounds like "just use a keyed state", but latter requires updating it on every value change as opposed to operator state and thus can be expensive (especially if you have to deal with mutable structures inside w

How important is 'registerType'?

2017-02-10 Thread Dmitry Golubets
The docs say that it may improve performance. How true is it, when custom serializers are provided? There is also 'disableAutoTypeRegistration' method in the config class, implying Flink registers types automatically. So, given that I have an hierarchy: trait A class B extends A class C extends A

Where to put "pre-start" logic and how to detect recovery?

2017-02-09 Thread Dmitry Golubets
Hi, I need to re-create a Kafka topic when a job is started in "clean" mode. I can do it, but I'm not sure if I do it in the right place. Is it fine to put this kind of code in the "main"? Then it's called on every job submit. But.. how to detect if a job is being started from a savepoint? Or is

Re: logback

2017-02-08 Thread Dmitry Golubets
Update: I've now used 1.1.3 versions as in the example in the docs and it works! Looks like these is an incompatibility with the latest logback. Best regards, Dmitry On Wed, Feb 8, 2017 at 3:20 PM, Dmitry Golubets wrote: > Hi Robert, > > After reading that link I've adde

Re: logback

2017-02-08 Thread Dmitry Golubets
ink/flink-docs-release-1.2/monitoring/best_ > practices.html#use-logback-when-running-flink-on-a-cluster > > On Tue, Feb 7, 2017 at 1:07 PM, Dmitry Golubets > wrote: > >> Hi, >> >> documentation says: "Users willing to use logback instead of log4j can >

logback

2017-02-07 Thread Dmitry Golubets
Hi, documentation says: "Users willing to use logback instead of log4j can just exclude log4j (or delete it from the lib/ folder)." But then Flink just doesn't start. I added logback-classic 1.10 to it's lib folder, but still get NoClassDefFoundError: ch/qos/logback/core/joran/spi/JoranException

Re: Parallelism and max-parallelism

2017-02-06 Thread Dmitry Golubets
ree to report it here. > > The PRs will be merged later today. > > > On Mon, Feb 6, 2017 at 4:41 PM, Dmitry Golubets > wrote: > > Hi guys, > > > > I would appreciate if someone could explain to me what's the difference > > between those two. > > >

Parallelism and max-parallelism

2017-02-06 Thread Dmitry Golubets
Hi guys, I would appreciate if someone could explain to me what's the difference between those two. The current description refers to "dynamic scaling", and yet I can't find anything about it in Flink's docs. Best regards, Dmitry

Re: User configuration

2017-01-26 Thread Dmitry Golubets
t; passing-them-around-in-your-flink-application > > On Thu, Jan 26, 2017 at 5:38 PM, Dmitry Golubets > wrote: > >> Hi, >> >> Is there a place for user defined configuration settings? >> How to read them? >> >> Best regards, >> Dmitry >> > >

User configuration

2017-01-26 Thread Dmitry Golubets
Hi, Is there a place for user defined configuration settings? How to read them? Best regards, Dmitry

Re: Flink dependencies shading

2017-01-26 Thread Dmitry Golubets
aybe you can resolve the > issue on your side for now. > I've filed a JIRA for this issue: https://issues.apache. > org/jira/browse/FLINK-5661 > > > > On Wed, Jan 25, 2017 at 8:24 PM, Dmitry Golubets > wrote: > >> I've build latest Flink from sources and it

Flink dependencies shading

2017-01-25 Thread Dmitry Golubets
I've build latest Flink from sources and it seems that httpclient dependency from flink-mesos is not shaded. It causes troubles with latest AWS SDK. Do I build it wrong or is it a known problem? Best regards, Dmitry

Why is IdentityObjectIntMap.get called so often?

2017-01-24 Thread Dmitry Golubets
Hi, I've just added my custom MsgPack serializers hoping to see performance increase. I covered all data types in between chains. However this Kryo method still takes a lot of CPU: IdentityObjectIntMap.get Is there something else should be configured? Or is there no way to get away from Kryo ove

Count window on partition

2017-01-23 Thread Dmitry Golubets
Hi, I'm looking for the right way to do the following scheme: 1. Read data 2. Split it into partitions for parallel processing 3. In every partition group data in N elements batches 4. Process these batches My first attempt was: *dataStream.keyBy(_.key).countWindow(..)* But countWindow groups by

Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
t; Overall, that seemed the more scalable design to us. > Can your use case follow a similar approach? > > Stephan > > > > On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets > wrote: > >> Hi Timo, >> >> I don't have any key to join on, so I'm

Re: Three input stream operator and back pressure

2017-01-17 Thread Dmitry Golubets
> your own operator. That depends on your use case though. > > You can maintain backpressure by using Flink's operator state. But did you > also thought about a Window Join instead? > > I hope that helps. > > Timo > > > > > Am 17/01/17 um 00:20 s

Three input stream operator and back pressure

2017-01-16 Thread Dmitry Golubets
Hi, there are only *two *interfaces defined at the moment: *OneInputStreamOperator* and *TwoInputStreamOperator.* Is there any way to define an operator with arbitrary number of inputs? My another concern is how to maintain *backpressure *in the operator? Let's say I read events from two Kafka s

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
a is > serialized into a fixed number of buffers instead of being put on the JVM > heap. > > Best, Fabian > > 2017-01-16 14:21 GMT+01:00 Dmitry Golubets : > >> Hi Ufuk, >> >> Do you know what's the reason for serialization of data between different >>

Re: Can serialization be disabled between chains?

2017-01-16 Thread Dmitry Golubets
Hi Ufuk, Do you know what's the reason for serialization of data between different threads? Also, thanks for the link! Best regards, Dmitry On Mon, Jan 16, 2017 at 1:07 PM, Ufuk Celebi wrote: > Hey Dmitry, > > this is not possible if I'm understanding you correctly. > > A task chain is execut

Can serialization be disabled between chains?

2017-01-13 Thread Dmitry Golubets
Hi, Let's say we have multiple subtask chains and all of them are executing in the same task manager slot (i.e. in the same JVM). What's the point in serializing data between them? Can it be disabled? The reason I want keep different chains is that some subtasks should be executed in parallel to