Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2016-12-08 Thread Aljoscha Krettek
Great to hear! On Fri, 9 Dec 2016 at 01:02 Cliff Resnick wrote: > It turns out that most of the time in RocksDBFoldingState was spent on > serialization/deserializaton. RocksDb read/write was performing well. By > moving from Kryo to custom serialization we were able to increase > throughput dra

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Aljoscha Krettek
I commented on the issue with a way that should work. On Fri, 9 Dec 2016 at 01:00 Chesnay Schepler wrote: > Done. https://issues.apache.org/jira/browse/FLINK-5299 > > On 08.12.2016 16:50, Ufuk Celebi wrote: > > Would you like to open an issue for this for starters Chesnay? Would be > good to fix

Re: separation of JVMs for different applications

2016-12-08 Thread Manu Zhang
If there are not any existing jira for standalone v2.0, may I open a new one ? Thanks, Manu On Wed, Dec 7, 2016 at 12:39 PM Manu Zhang wrote: > Good to know that. > > Is it the "standalone setup v2.0" section ? The wiki page has no > Google-Doc-like change histories. > Any jiras opened for that

Re: Serializers and Schemas

2016-12-08 Thread Matt
Hi people, This is what I was talking about regarding a generic de/serializer for POJO classes [1]. The Serde class in [2] can be used in both Kafka [3] and Flink [4], and it works out of the box for any POJO class. Do you see anything wrong in this approach? Any way to improve it? Cheers, Matt

Re: dataartisans flink training maven build error

2016-12-08 Thread Conny Gu
Hi all, I think, I found the solution, I just used a new version of Ubuntu and install the latest version of maven, then everything is settled down. best regards, Conny 2016-12-08 17:51 GMT+01:00 Conny Gu [via Apache Flink User Mailing List archive.] : > Hi all, > > I try to use the follow the

Re: Partitioning operator state

2016-12-08 Thread Dominik Safaric
Dear Stefan, Thanks for the clarification. How is however the state recovered in the case of a task failure? Assuming there is a topology of 10 workers and a worker dies. The state in this case, after restarting the entire execution, will how exactly be distributed across the workers? Domi

dataartisans flink training maven build error

2016-12-08 Thread Conny Gu
Hi all, I try to use the follow the dataartisans github training tutorial, http://dataartisans.github.io/flink-training/devEnvSetup.html but I got errors at the setup the DevelopmentEnvironment The errors happend at the step: "Clone and build the flink-training-exercises project" git clon

Re: Resource under-utilization when using RocksDb state backend [SOLVED]

2016-12-08 Thread Cliff Resnick
It turns out that most of the time in RocksDBFoldingState was spent on serialization/deserializaton. RocksDb read/write was performing well. By moving from Kryo to custom serialization we were able to increase throughput dramatically. Load is now where it should be. On Mon, Dec 5, 2016 at 1:15 PM,

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Chesnay Schepler
Done. https://issues.apache.org/jira/browse/FLINK-5299 On 08.12.2016 16:50, Ufuk Celebi wrote: Would you like to open an issue for this for starters Chesnay? Would be good to fix for the upcoming release even. On 8 December 2016 at 16:39:58, Chesnay Schepler (ches...@apache.org) wrote: It wo

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Ufuk Celebi
Would you like to open an issue for this for starters Chesnay? Would be good to fix for the upcoming release even. On 8 December 2016 at 16:39:58, Chesnay Schepler (ches...@apache.org) wrote: > It would be neat if we could support arrays as keys directly; it should > boil down to checking the ke

Re: conditional dataset output

2016-12-08 Thread Chesnay Schepler
Hello Lars, The only other way i can think of how this could be done is by wrapping the used outputformat in a custom format, which calls open on the wrapped outputformat when you receive the first record. This should work but is quite hacky though as it interferes with the format life-cycle

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Chesnay Schepler
It would be neat if we could support arrays as keys directly; it should boil down to checking the key type and in case of an array injecting a KeySelector that calls Arrays.hashCode(array). This worked for me when i ran into the same issue while experimenting with some stuff. The batch API can

conditional dataset output

2016-12-08 Thread lars . bachmann
Hi, let's assume I have a dataset and depending on the input data and different filter operations this dataset can be empty. Now I want to output the dataset to HD, but I want that files are only created if the dataset is not empty. If the dataset is empty I don't want any files. The default

Re: Parallelism and stateful mapping with Flink

2016-12-08 Thread Ufuk Celebi
@Aljoscha: I remember that someone else ran into this, too. Should we address arrays as keys specifically in the API? Prohibit? Document this? – Ufuk On 7 December 2016 at 17:41:40, Andrew Roberts (arobe...@fuze.com) wrote: > Sure! > > (Aside, it turns out that the issue was using an `Array[By

Re: Replace Flink job while cluster is down

2016-12-08 Thread Ufuk Celebi
With HA enabled, Flink checks the configured ZooKeeper node for pre-existing jobs and checkpoints when starting. What Stefan meant was that you can configure a different ZooKeeper node, which will start the cluster with a clean state. You can check the available config options here:  https://ci

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-08 Thread Ufuk Celebi
Great! :) On 8 December 2016 at 15:28:05, LINZ, Arnaud (al...@bouyguestelecom.fr) wrote: > -yjm works, and suits me better than a global fink-conf.yml parameter. I've > looked for > a command line parameter like that, but I've missed it in the doc, my mistake. > Thanks, > Arnaud > > -Mes

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-08 Thread LINZ, Arnaud
-yjm works, and suits me better than a global fink-conf.yml parameter. I've looked for a command line parameter like that, but I've missed it in the doc, my mistake. Thanks, Arnaud -Message d'origine- De : Ufuk Celebi [mailto:u...@apache.org] Envoyé : jeudi 8 décembre 2016 14:43 À : LIN

Re: Replace Flink job while cluster is down

2016-12-08 Thread Al-Isawi Rami
Hi Stefan, Yes, a cluster of 3 machines. Version 1.1.1 I did not get what is the difference between “remove entry from zookeeper” and “using flink zookeeper namespaces feature”. Eventually, I started the cluster and it did recover the old program. However, I was fast enough to click Cancel in

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-08 Thread Ufuk Celebi
Good point with the collect() docs. Would you mind opening a JIRA issue for that? I'm not sure whether you can specify it via that key for YARN. Can you try to use -yjm 8192 when submitting the job? Looping in Robert who knows best whether this config key is picked up or not for YARN. – Ufuk

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-08 Thread LINZ, Arnaud
Hi Ufuk, Yes, I have a large set of data to collect for a data science job that cannot be distributed easily. Increasing the akka.framesize size do get rid of the collect hang (maybe you should highlight this parameter in the collect() documentation, 10Mb si not that big), thanks. However my j

Re: Recursive directory traversal with TextInputFormat

2016-12-08 Thread Ufuk Celebi
Looping in Kostas who recently worked on the continuous file inputs. @Kostas: do you have an idea what's happening here? – Ufuk On 8 December 2016 at 08:43:32, Lukas Kircher (lukas.kirc...@uni-konstanz.de) wrote: > Hi Stefan, > > thanks for your answer. > > > I think there is a field in Fil

RE: Collect() freeze on yarn cluster on strange recover/deserialization error

2016-12-08 Thread Ufuk Celebi
I also don't get why the job is recovering, but the oversized message is very likely the cause for the freezing collect, because the data set is gather via Akka. You can configure the frame size via "akka.framesize", which defaults to  10485760b (10 MB). Is the collected result larger than that

Re: [DISCUSS] "Who's hiring on Flink" monthly thread in the mailing lists?

2016-12-08 Thread Robert Metzger
Thank you for speaking up Kanstantsin. I really don't want to downgrade the experience on the user@ list. I wonder if jobs@flink would be a too narrowly-scoped mailing list. Maybe we could also start a community@flink (alternatively also general@) mailing list for everything relating to the broade

Re: Partitioning operator state

2016-12-08 Thread Stefan Richter
Hi Dominik, as Gordon’s response only covers keyed-state, I will briefly explain what happens for non-keyed operator state. In contrast to Flink 1.1, Flink 1.2 checkpointing does not write a single blackbox object (e.g. ONE object that is a set of all kafka offsets is emitted), but a list of bl