Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
Based on the requests listed in this thread Stephan and myself cherry-picked the following commits: 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation (rmetzger) 40a2705 [FLINK-1266] Update mongodb link and address pull request comments (rmetzger) cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when initializing … (rmetzger) 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger) e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger) 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger) e83ccd0 [FLINK-1385] Print warning if not resources are _currently_ available… (rmetzger) 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type parameters (aljoscha) fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer (aljoscha) ada35eb [FLINK-1378] [scala] Add support for Try[A] (Success/Failure) (aljoscha) a64abe1 [docs] Update README and internals (scheduling) for graduation and fi… (SephanEwen) 4f95ce7 Fix typo in README.md (uce) Both of us committed some other fixes mainly for the dependencies and licences, the distribution and the quickstarts. Pushing the results as soon as the tests pass. Does anyone have any other issue that has to be included in the next release candidate? On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger wrote: > I've tested it on an empty YARN cluster, allocating more containers than > available. > Flink will then allocate as many containers as possible. > > On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen wrote: > > > Seems reasonable. Have you tested it on a cluster with concurrent YARN > > jobs? > > > > On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger > > wrote: > > > > > I would really like to include this commit into the 0.8 release as > well: > > > > > > > > > https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704 > > > A user is affected by this issue. > > > If you agree, I can merge it. > > > > > > On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen > wrote: > > > > > > > I have gone through the code, cleaned up dependencies and made sure > > that > > > > all licenses are correctly handled. > > > > The changes are in the public release-0.8 branch. > > > > > > > > From that side, the code is now good to go in my opinion. > > > > > > > > > > > > Stephan > > > > > > > > > > > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger < > rmetz...@apache.org> > > > > wrote: > > > > > > > > > I've updated the docs/_config.yml variables to reflect that hadoop2 > > is > > > > now > > > > > the default profile: https://github.com/apache/flink/pull/294 > > > > > > > > > > On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen > > wrote: > > > > > > > > > > > Just as a heads up. I am almost through with checking > dependencies > > > and > > > > > > licenses. > > > > > > Will commit that later tonight or tomorrow. > > > > > > > > > > > > On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen > > > wrote: > > > > > > > > > > > > > I vote to include it as well. It is sort of vital for advanced > > use > > > of > > > > > the > > > > > > > Scala API. > > > > > > > > > > > > > > It is also an isolated change that does not affect other > > > components, > > > > so > > > > > > it > > > > > > > should be testable very well. > > > > > > > > > > > > > > On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger < > > > rmetz...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > >> I'd say yes because its affecting a user. > > > > > > >> > > > > > > >> On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek < > > > > aljos...@apache.org > > > > > > > > > > > > >> wrote: > > > > > > >> > > > > > > >> > I have a fix for this issue: > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK > > > > > > >> > and I think this should also make it into the 0.8.0 release. > > > > > > >> > > > > > > > >> > What do you think? > > > > > > >> > > > > > > > >> > On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi < > > > > > > >> balassi.mar...@gmail.com> > > > > > > >> > wrote: > > > > > > >> > > Sure, thanks Ufuk. > > > > > > >> > > > > > > > > >> > > On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi < > u...@apache.org > > > > > > > > wrote: > > > > > > >> > > > > > > > > >> > >> Marton, could you also cherry-pick 7f659f6 and 7e08fa1 > for > > > the > > > > > next > > > > > > >> RC? > > > > > > >> > >> It's a minor update to the README describing the IDE > setup. > > > > > > >> > >> > > > > > > >> > >> I will closed the respective issue FLINK-1109. > > > > > > >> > >> > > > > > > >> > >> On 08 Jan 2015, at 23:50, Henry Saputra < > > > > henry.sapu...@gmail.com > > > > > > > > > > > > >> > wrote: > > > > > > >> > >> > > > > > > >> > >> > Marton, could you close this VOTE thread by replying to > > the > > > > > > >> original > > > > > > >> > >> > email and append [CANCEL] in the subject line. > > > > > > >> > >> > > > > > > > >> > >> > - Henry > > > > > > >> > >> > > > > > > > >> > >> > On Thu
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
Thanks, Marton. Yes, I think https://issues.apache.org/jira/browse/FLINK-1376. Till has a fix coming up for it. Should we wait for this or postpone it to 0.8.1? – Ufuk On 12 Jan 2015, at 10:16, Márton Balassi wrote: > Based on the requests listed in this thread Stephan and myself > cherry-picked the following commits: > > 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation > (rmetzger) > 40a2705 [FLINK-1266] Update mongodb link and address pull request comments > (rmetzger) > cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when > initializing … (rmetzger) > 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger) > e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger) > 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger) > e83ccd0 [FLINK-1385] Print warning if not resources are _currently_ > available… (rmetzger) > > 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type parameters > (aljoscha) > fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer (aljoscha) > ada35eb [FLINK-1378] [scala] Add support for Try[A] > (Success/Failure) (aljoscha) > > a64abe1 [docs] Update README and internals (scheduling) for graduation and > fi… (SephanEwen) > 4f95ce7 Fix typo in README.md (uce) > > Both of us committed some other fixes mainly for the dependencies and > licences, the distribution and the quickstarts. Pushing the results as soon > as the tests pass. > > Does anyone have any other issue that has to be included in the next > release candidate? > > > On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger wrote: > >> I've tested it on an empty YARN cluster, allocating more containers than >> available. >> Flink will then allocate as many containers as possible. >> >> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen wrote: >> >>> Seems reasonable. Have you tested it on a cluster with concurrent YARN >>> jobs? >>> >>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger >>> wrote: >>> I would really like to include this commit into the 0.8 release as >> well: >>> >> https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704 A user is affected by this issue. If you agree, I can merge it. On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen >> wrote: > I have gone through the code, cleaned up dependencies and made sure >>> that > all licenses are correctly handled. > The changes are in the public release-0.8 branch. > > From that side, the code is now good to go in my opinion. > > > Stephan > > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger < >> rmetz...@apache.org> > wrote: > >> I've updated the docs/_config.yml variables to reflect that hadoop2 >>> is > now >> the default profile: https://github.com/apache/flink/pull/294 >> >> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen >>> wrote: >> >>> Just as a heads up. I am almost through with checking >> dependencies and >>> licenses. >>> Will commit that later tonight or tomorrow. >>> >>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen wrote: >>> I vote to include it as well. It is sort of vital for advanced >>> use of >> the Scala API. It is also an isolated change that does not affect other components, > so >>> it should be testable very well. On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger < rmetz...@apache.org> wrote: > I'd say yes because its affecting a user. > > On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek < > aljos...@apache.org >>> > wrote: > >> I have a fix for this issue: >> > >>> >> > >>> >> https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK >> and I think this should also make it into the 0.8.0 release. >> >> What do you think? >> >> On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi < > balassi.mar...@gmail.com> >> wrote: >>> Sure, thanks Ufuk. >>> >>> On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi < >> u...@apache.org >> wrote: >>> Marton, could you also cherry-pick 7f659f6 and 7e08fa1 >> for the >> next > RC? It's a minor update to the README describing the IDE >> setup. I will closed the respective issue FLINK-1109. On 08 Jan 2015, at 23:50, Henry Saputra < > henry.sapu...@gmail.com >>> >> wrote: > Marton, could you close this VOTE thread by replying to >>> the > original > email and append [CANCEL] in the subject line. > > - Henry > > On Thu, Jan 8, 2015 at 9:35 AM, Márton Balas
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
I only have to fix a last dead lock and then the fix should be working. On Mon, Jan 12, 2015 at 11:06 AM, Ufuk Celebi wrote: > Thanks, Marton. Yes, I think > https://issues.apache.org/jira/browse/FLINK-1376. > > Till has a fix coming up for it. Should we wait for this or postpone it to > 0.8.1? > > – Ufuk > > On 12 Jan 2015, at 10:16, Márton Balassi wrote: > > > Based on the requests listed in this thread Stephan and myself > > cherry-picked the following commits: > > > > 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation > > (rmetzger) > > 40a2705 [FLINK-1266] Update mongodb link and address pull request > comments > > (rmetzger) > > cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when > > initializing … (rmetzger) > > 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger) > > e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger) > > 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger) > > e83ccd0 [FLINK-1385] Print warning if not resources are _currently_ > > available… (rmetzger) > > > > 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type > parameters > > (aljoscha) > > fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer > (aljoscha) > > ada35eb [FLINK-1378] [scala] Add support for Try[A] > > (Success/Failure) (aljoscha) > > > > a64abe1 [docs] Update README and internals (scheduling) for graduation > and > > fi… (SephanEwen) > > 4f95ce7 Fix typo in README.md (uce) > > > > Both of us committed some other fixes mainly for the dependencies and > > licences, the distribution and the quickstarts. Pushing the results as > soon > > as the tests pass. > > > > Does anyone have any other issue that has to be included in the next > > release candidate? > > > > > > On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger > wrote: > > > >> I've tested it on an empty YARN cluster, allocating more containers than > >> available. > >> Flink will then allocate as many containers as possible. > >> > >> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen wrote: > >> > >>> Seems reasonable. Have you tested it on a cluster with concurrent YARN > >>> jobs? > >>> > >>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger > >>> wrote: > >>> > I would really like to include this commit into the 0.8 release as > >> well: > > > >>> > >> > https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704 > A user is affected by this issue. > If you agree, I can merge it. > > On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen > >> wrote: > > > I have gone through the code, cleaned up dependencies and made sure > >>> that > > all licenses are correctly handled. > > The changes are in the public release-0.8 branch. > > > > From that side, the code is now good to go in my opinion. > > > > > > Stephan > > > > > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger < > >> rmetz...@apache.org> > > wrote: > > > >> I've updated the docs/_config.yml variables to reflect that hadoop2 > >>> is > > now > >> the default profile: https://github.com/apache/flink/pull/294 > >> > >> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen > >>> wrote: > >> > >>> Just as a heads up. I am almost through with checking > >> dependencies > and > >>> licenses. > >>> Will commit that later tonight or tomorrow. > >>> > >>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen > wrote: > >>> > I vote to include it as well. It is sort of vital for advanced > >>> use > of > >> the > Scala API. > > It is also an isolated change that does not affect other > components, > > so > >>> it > should be testable very well. > > On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger < > rmetz...@apache.org> > wrote: > > > I'd say yes because its affecting a user. > > > > On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek < > > aljos...@apache.org > >>> > > wrote: > > > >> I have a fix for this issue: > >> > > > >>> > >> > > > > >>> > >> > https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK > >> and I think this should also make it into the 0.8.0 release. > >> > >> What do you think? > >> > >> On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi < > > balassi.mar...@gmail.com> > >> wrote: > >>> Sure, thanks Ufuk. > >>> > >>> On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi < > >> u...@apache.org > > >> wrote: > >>> > Marton, could you also cherry-pick 7f659f6 and 7e08fa1 > >> for > the > >> next > > RC? > It's a minor update to the README describing the IDE > >> setup. > > I will closed the respective issue FLINK-1109. > >
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
It would be good to have the patch, but it is also a very tricky patch, so pushing it hastily may be problematic. We could put out 0.8 and very soon a 0.8.1 with that fix. Am 12.01.2015 11:12 schrieb "Till Rohrmann" : > I only have to fix a last dead lock and then the fix should be working. > > On Mon, Jan 12, 2015 at 11:06 AM, Ufuk Celebi wrote: > > > Thanks, Marton. Yes, I think > > https://issues.apache.org/jira/browse/FLINK-1376. > > > > Till has a fix coming up for it. Should we wait for this or postpone it > to > > 0.8.1? > > > > – Ufuk > > > > On 12 Jan 2015, at 10:16, Márton Balassi > wrote: > > > > > Based on the requests listed in this thread Stephan and myself > > > cherry-picked the following commits: > > > > > > 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation > > > (rmetzger) > > > 40a2705 [FLINK-1266] Update mongodb link and address pull request > > comments > > > (rmetzger) > > > cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when > > > initializing … (rmetzger) > > > 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger) > > > e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger) > > > 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger) > > > e83ccd0 [FLINK-1385] Print warning if not resources are _currently_ > > > available… (rmetzger) > > > > > > 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type > > parameters > > > (aljoscha) > > > fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer > > (aljoscha) > > > ada35eb [FLINK-1378] [scala] Add support for Try[A] > > > (Success/Failure) (aljoscha) > > > > > > a64abe1 [docs] Update README and internals (scheduling) for graduation > > and > > > fi… (SephanEwen) > > > 4f95ce7 Fix typo in README.md (uce) > > > > > > Both of us committed some other fixes mainly for the dependencies and > > > licences, the distribution and the quickstarts. Pushing the results as > > soon > > > as the tests pass. > > > > > > Does anyone have any other issue that has to be included in the next > > > release candidate? > > > > > > > > > On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger > > wrote: > > > > > >> I've tested it on an empty YARN cluster, allocating more containers > than > > >> available. > > >> Flink will then allocate as many containers as possible. > > >> > > >> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen > wrote: > > >> > > >>> Seems reasonable. Have you tested it on a cluster with concurrent > YARN > > >>> jobs? > > >>> > > >>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger > > > >>> wrote: > > >>> > > I would really like to include this commit into the 0.8 release as > > >> well: > > > > > > >>> > > >> > > > https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704 > > A user is affected by this issue. > > If you agree, I can merge it. > > > > On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen > > >> wrote: > > > > > I have gone through the code, cleaned up dependencies and made sure > > >>> that > > > all licenses are correctly handled. > > > The changes are in the public release-0.8 branch. > > > > > > From that side, the code is now good to go in my opinion. > > > > > > > > > Stephan > > > > > > > > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger < > > >> rmetz...@apache.org> > > > wrote: > > > > > >> I've updated the docs/_config.yml variables to reflect that > hadoop2 > > >>> is > > > now > > >> the default profile: https://github.com/apache/flink/pull/294 > > >> > > >> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen > > >>> wrote: > > >> > > >>> Just as a heads up. I am almost through with checking > > >> dependencies > > and > > >>> licenses. > > >>> Will commit that later tonight or tomorrow. > > >>> > > >>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen > > wrote: > > >>> > > I vote to include it as well. It is sort of vital for advanced > > >>> use > > of > > >> the > > Scala API. > > > > It is also an isolated change that does not affect other > > components, > > > so > > >>> it > > should be testable very well. > > > > On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger < > > rmetz...@apache.org> > > wrote: > > > > > I'd say yes because its affecting a user. > > > > > > On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek < > > > aljos...@apache.org > > >>> > > > wrote: > > > > > >> I have a fix for this issue: > > >> > > > > > >>> > > >> > > > > > > > >>> > > >> > > > https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK > > >> and I think this should also make it into the 0.8.0 release. > > >> > > >> What do you think? > > >> > > >> On Fri, Jan 9, 2
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen wrote: > It would be good to have the patch, but it is also a very tricky patch, so > pushing it hastily may be problematic. > I agree. @Till: would that be OK with you? If yes, I think we are good to go for the next RC.
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
Yeah I agree with that. On Mon, Jan 12, 2015 at 11:30 AM, Ufuk Celebi wrote: > On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen wrote: > > > It would be good to have the patch, but it is also a very tricky patch, > so > > pushing it hastily may be problematic. > > > > I agree. @Till: would that be OK with you? If yes, I think we are good to > go for the next RC. >
Re: [VOTE] Release Apache Flink 0.8.0 (RC1)
Thanks, guys. FLINK-1376 goes to 0.8.1. I'm building the next release candidate for 0.8.0 then. On Mon, Jan 12, 2015 at 11:36 AM, Till Rohrmann wrote: > Yeah I agree with that. > > On Mon, Jan 12, 2015 at 11:30 AM, Ufuk Celebi wrote: > > > On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen wrote: > > > > > It would be good to have the patch, but it is also a very tricky patch, > > so > > > pushing it hastily may be problematic. > > > > > > > I agree. @Till: would that be OK with you? If yes, I think we are good to > > go for the next RC. > > >
Gather a distributed dataset
Hi there, I wished for intermediate datasets, and Santa Ufuk made my wishes come true (thank you, Santa)! Now that FLINK-986 is in the mainline, I want to ask some practical questions. In Spark, there is a way to put a value from the local driver to the distributed runtime via val x = env.parallelize(...) // expose x to the distributed runtime val y = dataflow(env, x) // y is produced by a dataflow which reads from x and also to get a value from the distributed runtime back to the driver val z = env.collect("y") As far as I know, in Flink we have an equivalent for parallelize val x = env.fromCollection(...) but not for collect. Is this still the case? If yes, how hard would it be to add this feature at the moment? Can you give me some pointers? Regards, Alexander
Re: Type Hints in the Java API
Thanks for the doc! I will add a complete list of all currently supported types (including Enums, Writables etc. not metioned in the doc) and types that will be supported in near future (e.g. multi-dimensional arrays). On 09.01.2015 17:37, Stephan Ewen wrote: Here is a doc that details type handling/extraction for both the Java API and the Scala API, including the type hints. https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md Enjoy :-) On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri wrote: Hi, thanks for the nice explanation and the great work! This will simplify our Graph API-lives a lot ^^ Cheers, V. On 9 January 2015 at 11:59, Stephan Ewen wrote: I am adding a derivative of that text to the docs right now. On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger wrote: Thank you! It would be amazing if you or somebody else could copy paste this into our documentation. On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen wrote: Hi everyone! We recently introduced type hints for the Java API. Since that is a pretty useful feature, I wanted to quickly explain what it is. Kudos to Timo Walther, who did a large part of this work. *Background* Flink tries to figure out as much information about what types enter and leave user functions as possible. - For the POJO API (where one refers to field names), we need that information to make checks (for typos and type compatibility) before the job is executed. - For the upcoming logical programs (see roadmap draft) we need this to know the "schema" of functions. - The more we know, the better serialization and data layout schemes the compiler/optimizer can develop. That is quite important for the memory usage paradigm in Flink (work on serialized data inside/outside the heap and make serialization very cheap) - Finally, it also spares users having to worry about serialization frameworks and having to register types at those frameworks. *Problem* Scala is an easy case, because it preserves generic type information (ClassTags / Type Manifests), but Java erases generic type info in most cases. We do reflection analysis on the user function classes to get the generic types. This logic also contains some simple type inference in case the functions have type variables (such as a MapFunction Long>>). Not in all cases can we figure out the data types of functions reliably in Java. Some issues remain with generic lambdas (we are trying to solve this with the Java community, see below) and with generic type variables that we cannot infer. *Solution: Type Hints* To make this cases work easily, a recent addition to the 0.9-SNAPSHOT master introduced type hints. They allow you to tell the system types that it cannot infer. You can write code like DataSet result = dataSet.map(new MyGenericNonInferrableFunction()).returns(SomeType.class); To make specification of generic types easier, it also comes with a parser for simple string representations of generic types: .returns("Tuple2") We suggest to use this instead of the "ResultTypeQueryable" workaround that has been used in some cases. *Improving Type information in Java* One Flink committer (Timo Walther) has actually become active in the Eclipse JDT compiler community and in the OpenJDK community to try and improve the way type information is available for lambdas. Greetings, Stephan
Fwd: The Apache Software Foundation Announces Apache™ Flink™ as a Top-Level Project
-- Forwarded message -- From: *Sally Khudairi* Date: Monday, January 12, 2015 Subject: The Apache Software Foundation Announces Apache™ Flink™ as a Top-Level Project To: Apache Announce List >> this announcement is available online at http://s.apache.org/YrZ Open Source distributed Big Data system for expressive, declarative, and efficient batch and streaming data processing and analysis Forest Hill, MD –12 January 2015– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache™ Flink™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles. Apache Flink is an Open Source distributed data analysis engine for batch and streaming data. It offers programming APIs in Java and Scala, as well as specialized APIs for graph processing, with more libraries in the making. "I am very happy that the ASF has become the home for Flink," said Stephan Ewen, Vice President of Apache Flink. "For a community-driven effort, I can think of no better umbrella. It is great to see the project is maturing and many new people are joining the community." Flink uses a unique combination of streaming/pipelining and batch processing techniques to create a platform that covers and unifies a broad set of batch and streaming data analytics use cases. The project has put significant efforts into making a system that runs reliably and fast in a wide variety of scenarios. For that reason, Flink contained its own type serialization, memory management, and cost-based query optimization components from the early days of the project. Apache Flink has its roots in the Stratosphere research project that started in 2009 at TU Berlin together with the Berlin and later the European data management communities, including HU Berlin, Hasso Plattner Institute, KTH (Stockholm), ELTE (Budapest), and others. Several Flink committers recently started data Artisans, a Berlin-based startup committed to growing Flink both in code and community as 100% Open Source. More than 70 people have by now contributed to Flink. "Becoming a Top-Level Project in such short time is a great milestone for Flink and reflects the speed with which the community has been growing," said Kostas Tzoumas, co-founder and CEO of data Artisans. "The community is currently working on some exciting new features that make Flink even more powerful and accessible to a wider audience, and several companies around the world are including Flink in their data infrastructure." "We use Apache Flink as part of our production data infrastructure," said Ijad Madisch, co-founder and CEO of ResearchGate. "We are happy all around and excited that Flink provides us with the opportunity for even better developer productivity and testability, especially for complex data flows. It’s with good reason that Flink is now a top-level Apache project." "I have been experimenting with Flink, and we are very excited to hear that Flink is becoming a top-level Apache project," said Anders Arpteg, Analytics Machine Learning Manager at Spotify. Denis Arnaud, Head of Data Science Development of Travel Intelligence at Amadeus said, "At Amadeus, we continually seek for better improvement in our analytic platform and our experiments with Apache Flink for analytics on our travel data show a lot of potential in the system for our production use." "Flink was a pleasure to mentor as a new Apache project," said Alan Gates, Apache Flink Incubator champion at the ASF, and architect/co-founder at Hortonworks. "The Flink team learned The Apache Way very quickly. They worked hard at being open in their decision making and including new contributors. Those of us mentoring them just needed to point them in the right direction and then let them get to work." Availability and Oversight As with all Apache products, Apache Flink software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Flink, visit http://flink.apache.org/ and @ApacheFlink on Twitter. About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 500 individual Members and 4,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and th
RE: Gather a distributed dataset
Hello Alexander, Intermediate results are indeed looking promising, also for finally implementing a proper flink-shell for exploratory data analysis. We are also looking at the moment on how to implement a collect() for the flink-streaming scala api that returns a Seq that can be consumed at the client side as a part of Flink-1344 [1]. It looks like intermediate results support will help, basically I would like to be able to initiate a stream endpoint at the client side via the JobClient perhaps, referencing an intermediate result id for example. For streaming this is a feature that Spark doesn't explicitly have (one has to use foreach and collect on a dstream which is quite inefficient) so I guess it would be nice to add. Paris [1] https://issues.apache.org/jira/browse/FLINK-1344 From: Alexander Alexandrov [alexander.s.alexand...@gmail.com] Sent: Monday, January 12, 2015 11:42 AM To: dev@flink.apache.org Subject: Gather a distributed dataset Hi there, I wished for intermediate datasets, and Santa Ufuk made my wishes come true (thank you, Santa)! Now that FLINK-986 is in the mainline, I want to ask some practical questions. In Spark, there is a way to put a value from the local driver to the distributed runtime via val x = env.parallelize(...) // expose x to the distributed runtime val y = dataflow(env, x) // y is produced by a dataflow which reads from x and also to get a value from the distributed runtime back to the driver val z = env.collect("y") As far as I know, in Flink we have an equivalent for parallelize val x = env.fromCollection(...) but not for collect. Is this still the case? If yes, how hard would it be to add this feature at the moment? Can you give me some pointers? Regards, Alexander
Re: Gather a distributed dataset
Hey Alexander, On 12 Jan 2015, at 11:42, Alexander Alexandrov wrote: > Hi there, > > I wished for intermediate datasets, and Santa Ufuk made my wishes come true > (thank you, Santa)! > > Now that FLINK-986 is in the mainline, I want to ask some practical > questions. > > In Spark, there is a way to put a value from the local driver to the > distributed runtime via > > val x = env.parallelize(...) // expose x to the distributed runtime > val y = dataflow(env, x) // y is produced by a dataflow which reads from x > > and also to get a value from the distributed runtime back to the driver > > val z = env.collect("y") > > As far as I know, in Flink we have an equivalent for parallelize > > val x = env.fromCollection(...) > > but not for collect. Is this still the case? Yes, but this will change soon. > If yes, how hard would it be to add this feature at the moment? Can you > give me some pointers? There is a "alpha" version/hack of this using accumulators. See https://github.com/apache/flink/pull/210. The problem is that each collect call results in a new program being executed from the sources. I think Stephan is working on the scheduling to support this kind of programs. From the runtime perspective, it is not a problem to transfer the produced intermediate results back to the job manager. The job manager can basically use the same mechanism that the task managers use. Even the accumulator version should be fine as a initial version.
[jira] [Created] (FLINK-1387) Integrate website index.html into jekyll layout.
Robert Metzger created FLINK-1387: - Summary: Integrate website index.html into jekyll layout. Key: FLINK-1387 URL: https://issues.apache.org/jira/browse/FLINK-1387 Project: Flink Issue Type: Task Components: Project Website Reporter: Robert Metzger The index.html of the Flink website is not integrated into the jekyll layout. This makes any change harder (in particular version updates / releases) because updates need to be done multiple times. Also, markdown parsing is not available to the web frontpage. Also: - Add twitter account to front page - Fix broken Privacy Policy link -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Gather a distributed dataset
Thanks, I am currently looking at the new ExecutionEnvironment API. > I think Stephan is working on the scheduling to support this kind of programs. @Stephan: is there a feature branch for that somewhere? 2015-01-12 12:05 GMT+01:00 Ufuk Celebi : > Hey Alexander, > > On 12 Jan 2015, at 11:42, Alexander Alexandrov < > alexander.s.alexand...@gmail.com> wrote: > > > Hi there, > > > > I wished for intermediate datasets, and Santa Ufuk made my wishes come > true > > (thank you, Santa)! > > > > Now that FLINK-986 is in the mainline, I want to ask some practical > > questions. > > > > In Spark, there is a way to put a value from the local driver to the > > distributed runtime via > > > > val x = env.parallelize(...) // expose x to the distributed runtime > > val y = dataflow(env, x) // y is produced by a dataflow which reads from > x > > > > and also to get a value from the distributed runtime back to the driver > > > > val z = env.collect("y") > > > > As far as I know, in Flink we have an equivalent for parallelize > > > > val x = env.fromCollection(...) > > > > but not for collect. Is this still the case? > > Yes, but this will change soon. > > > If yes, how hard would it be to add this feature at the moment? Can you > > give me some pointers? > > There is a "alpha" version/hack of this using accumulators. See > https://github.com/apache/flink/pull/210. The problem is that each > collect call results in a new program being executed from the sources. I > think Stephan is working on the scheduling to support this kind of > programs. From the runtime perspective, it is not a problem to transfer the > produced intermediate results back to the job manager. The job manager can > basically use the same mechanism that the task managers use. Even the > accumulator version should be fine as a initial version.
[jira] [Created] (FLINK-1388) POJO support for writeAsCsv
Timo Walther created FLINK-1388: --- Summary: POJO support for writeAsCsv Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b "Hello World", 42 "Hello World 2", 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat
Robert Metzger created FLINK-1389: - Summary: Allow setting custom file extensions for files created by the FileOutputFormat Key: FLINK-1389 URL: https://issues.apache.org/jira/browse/FLINK-1389 Project: Flink Issue Type: New Feature Reporter: Robert Metzger Priority: Minor A user requested the ability to name avro files with the "avro" extension. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Flink @ Heise
FYI http://www.heise.de/newsticker/meldung/Big-Data-Apache-Flink-wird-Top-Level-Projekt-2516177.html
Re: Flink @ Heise
Hey Andreas, thanks for sharing. Due to the press announcement by the ASF today, there is quite some attention in the news. For all non-germans: Heise is one of the biggest IT-news sites here. On Mon, Jan 12, 2015 at 3:10 PM, Andreas Kunft wrote: > FYI > > http://www.heise.de/newsticker/meldung/Big-Data- > Apache-Flink-wird-Top-Level-Projekt-2516177.html >
[jira] [Created] (FLINK-1390) java.lang.ClassCastException: X cannot be cast to X
Robert Metzger created FLINK-1390: - Summary: java.lang.ClassCastException: X cannot be cast to X Key: FLINK-1390 URL: https://issues.apache.org/jira/browse/FLINK-1390 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 0.8 Reporter: Robert Metzger A user is affected by an issue, which is probably caused by different classloaders being used for loading user classes. Current state of investigation: - the error happens in yarn sessions (there is only a YARN environment available) - the error doesn't happen on the first time the job is being executed. It only happens on subsequent executions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1391) Kryo fails to properly serialize avro collection types
Robert Metzger created FLINK-1391: - Summary: Kryo fails to properly serialize avro collection types Key: FLINK-1391 URL: https://issues.apache.org/jira/browse/FLINK-1391 Project: Flink Issue Type: Improvement Affects Versions: 0.8, 0.9 Reporter: Robert Metzger Before FLINK-610, Avro was the default generic serializer. Now, special types coming from Avro are handled by Kryo .. which seems to cause errors like: {code} Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.NullPointerException at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244) at org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56) at org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71) at org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189) at org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176) at org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53) at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[VOTE] Release Apache Flink 0.8.0 (RC2)
Please vote on releasing the following candidate as Apache Flink version 0.8.0 This release will be the first major release for Flink as a top level project. - The commit to be voted on is in the branch "release-0.8.0-rc2" (commit 9ee74f3): https://git-wip-us.apache.org/repos/asf/flink/commit/9ee74f3 The release artifacts to be voted on can be found at: http://people.apache.org/~mbalassi/flink-0.8.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/mbalassi.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapacheflink-1022 - Please vote on releasing this package as Apache Flink 0.8.0. The vote is open for the next 72 hours and passes if a majority of at least three +1 PMC votes are cast. [ ] +1 Release this package as Apache Flink 0.8.0 [ ] -1 Do not release this package because ...
Re: Flink @ Heise
Thanks for sharing =) On Mon, Jan 12, 2015 at 6:10 AM, Andreas Kunft wrote: > FYI > > http://www.heise.de/newsticker/meldung/Big-Data-Apache-Flink-wird-Top-Level-Projekt-2516177.html
[jira] [Created] (FLINK-1392) Serializing Protobuf - issue 1
Felix Neutatz created FLINK-1392: Summary: Serializing Protobuf - issue 1 Key: FLINK-1392 URL: https://issues.apache.org/jira/browse/FLINK-1392 Project: Flink Issue Type: Bug Reporter: Felix Neutatz Priority: Minor Hi, I started to experiment with Parquet using Protobuf. When I use the standard Protobuf class: com.twitter.data.proto.tutorial.AddressBookProtos The code which I run, can be found here: [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java] I get the following exception: {code:xml} Exception in thread "main" java.lang.Exception: Deserializing the InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: Could not read the user code wrapper: Error while deserializing element from collection at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could not read the user code wrapper: Error while deserializing element from collection at org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285) at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57) ... 25 more Caused by: java.io.IOException: Error while deserializing element from collection at org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apa
[jira] [Created] (FLINK-1393) Serializing Protobuf - issue 2
Felix Neutatz created FLINK-1393: Summary: Serializing Protobuf - issue 2 Key: FLINK-1393 URL: https://issues.apache.org/jira/browse/FLINK-1393 Project: Flink Issue Type: Bug Reporter: Felix Neutatz Priority: Minor Additionally to [FLINK-1392|https://issues.apache.org/jira/browse/FLINK-1392] I also tried another Protobuf class, which throws a different exception: {code:xml} Exception in thread "main" java.lang.Exception: Deserializing the InputFormat ([(null,url: "www.test.de" archiveTime: 123 scripts: "script.de" scripts: "script1.de" iframes: "iframe.de" iframes: "iframe1.de" links: "link.de" links: "link1.de" images: "img.de" images: "img1.de" )]) failed: unread block data at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274) at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236) at org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:281) at org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57) ... 25 more {code} The program which causes this exception can be found here: [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput2.java] -- This message was sent by Atlassian JIRA (v6.3.4#6332)