Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Márton Balassi
Based on the requests listed in this thread Stephan and myself
cherry-picked the following commits:

7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
(rmetzger)
40a2705 [FLINK-1266] Update mongodb link and address pull request comments
(rmetzger)
cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
initializing … (rmetzger)
3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
available… (rmetzger)

4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type parameters
(aljoscha)
fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer (aljoscha)
ada35eb [FLINK-1378] [scala] Add support for Try[A]
(Success/Failure) (aljoscha)

a64abe1 [docs] Update README and internals (scheduling) for graduation and
fi… (SephanEwen)
4f95ce7 Fix typo in README.md (uce)

Both of us committed some other fixes mainly for the dependencies and
licences, the distribution and the quickstarts. Pushing the results as soon
as the tests pass.

Does anyone have any other issue that has to be included in the next
release candidate?


On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger  wrote:

> I've tested it on an empty YARN cluster, allocating more containers than
> available.
> Flink will then allocate as many containers as possible.
>
> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen  wrote:
>
> > Seems reasonable. Have you tested it on a cluster with concurrent YARN
> > jobs?
> >
> > On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger 
> > wrote:
> >
> > > I would really like to include this commit into the 0.8 release as
> well:
> > >
> > >
> >
> https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
> > > A user is affected by this issue.
> > > If you agree, I can merge it.
> > >
> > > On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen 
> wrote:
> > >
> > > > I have gone through the code, cleaned up dependencies and made sure
> > that
> > > > all licenses are correctly handled.
> > > > The changes are in the public release-0.8 branch.
> > > >
> > > > From that side, the code is now good to go in my opinion.
> > > >
> > > >
> > > > Stephan
> > > >
> > > >
> > > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger <
> rmetz...@apache.org>
> > > > wrote:
> > > >
> > > > > I've updated the docs/_config.yml variables to reflect that hadoop2
> > is
> > > > now
> > > > > the default profile: https://github.com/apache/flink/pull/294
> > > > >
> > > > > On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen 
> > wrote:
> > > > >
> > > > > > Just as a heads up. I am almost through with checking
> dependencies
> > > and
> > > > > > licenses.
> > > > > > Will commit that later tonight or tomorrow.
> > > > > >
> > > > > > On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen 
> > > wrote:
> > > > > >
> > > > > > > I vote to include it as well. It is sort of vital for advanced
> > use
> > > of
> > > > > the
> > > > > > > Scala API.
> > > > > > >
> > > > > > > It is also an isolated change that does not affect other
> > > components,
> > > > so
> > > > > > it
> > > > > > > should be testable very well.
> > > > > > >
> > > > > > > On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger <
> > > rmetz...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> I'd say yes because its affecting a user.
> > > > > > >>
> > > > > > >> On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek <
> > > > aljos...@apache.org
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > I have a fix for this issue:
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
> > > > > > >> > and I think this should also make it into the 0.8.0 release.
> > > > > > >> >
> > > > > > >> > What do you think?
> > > > > > >> >
> > > > > > >> > On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi <
> > > > > > >> balassi.mar...@gmail.com>
> > > > > > >> > wrote:
> > > > > > >> > > Sure, thanks Ufuk.
> > > > > > >> > >
> > > > > > >> > > On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi <
> u...@apache.org
> > >
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > >> Marton, could you also cherry-pick 7f659f6 and 7e08fa1
> for
> > > the
> > > > > next
> > > > > > >> RC?
> > > > > > >> > >> It's a minor update to the README describing the IDE
> setup.
> > > > > > >> > >>
> > > > > > >> > >> I will closed the respective issue FLINK-1109.
> > > > > > >> > >>
> > > > > > >> > >> On 08 Jan 2015, at 23:50, Henry Saputra <
> > > > henry.sapu...@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >>
> > > > > > >> > >> > Marton, could you close this VOTE thread by replying to
> > the
> > > > > > >> original
> > > > > > >> > >> > email and append [CANCEL] in the subject line.
> > > > > > >> > >> >
> > > > > > >> > >> > - Henry
> > > > > > >> > >> >
> > > > > > >> > >> > On Thu

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Ufuk Celebi
Thanks, Marton. Yes, I think https://issues.apache.org/jira/browse/FLINK-1376.

Till has a fix coming up for it. Should we wait for this or postpone it to 
0.8.1?

– Ufuk

On 12 Jan 2015, at 10:16, Márton Balassi  wrote:

> Based on the requests listed in this thread Stephan and myself
> cherry-picked the following commits:
> 
> 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
> (rmetzger)
> 40a2705 [FLINK-1266] Update mongodb link and address pull request comments
> (rmetzger)
> cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
> initializing … (rmetzger)
> 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
> e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
> 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
> e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
> available… (rmetzger)
> 
> 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type parameters
> (aljoscha)
> fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer (aljoscha)
> ada35eb [FLINK-1378] [scala] Add support for Try[A]
> (Success/Failure) (aljoscha)
> 
> a64abe1 [docs] Update README and internals (scheduling) for graduation and
> fi… (SephanEwen)
> 4f95ce7 Fix typo in README.md (uce)
> 
> Both of us committed some other fixes mainly for the dependencies and
> licences, the distribution and the quickstarts. Pushing the results as soon
> as the tests pass.
> 
> Does anyone have any other issue that has to be included in the next
> release candidate?
> 
> 
> On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger  wrote:
> 
>> I've tested it on an empty YARN cluster, allocating more containers than
>> available.
>> Flink will then allocate as many containers as possible.
>> 
>> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen  wrote:
>> 
>>> Seems reasonable. Have you tested it on a cluster with concurrent YARN
>>> jobs?
>>> 
>>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger 
>>> wrote:
>>> 
 I would really like to include this commit into the 0.8 release as
>> well:
 
 
>>> 
>> https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
 A user is affected by this issue.
 If you agree, I can merge it.
 
 On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen 
>> wrote:
 
> I have gone through the code, cleaned up dependencies and made sure
>>> that
> all licenses are correctly handled.
> The changes are in the public release-0.8 branch.
> 
> From that side, the code is now good to go in my opinion.
> 
> 
> Stephan
> 
> 
> On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger <
>> rmetz...@apache.org>
> wrote:
> 
>> I've updated the docs/_config.yml variables to reflect that hadoop2
>>> is
> now
>> the default profile: https://github.com/apache/flink/pull/294
>> 
>> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen 
>>> wrote:
>> 
>>> Just as a heads up. I am almost through with checking
>> dependencies
 and
>>> licenses.
>>> Will commit that later tonight or tomorrow.
>>> 
>>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen 
 wrote:
>>> 
 I vote to include it as well. It is sort of vital for advanced
>>> use
 of
>> the
 Scala API.
 
 It is also an isolated change that does not affect other
 components,
> so
>>> it
 should be testable very well.
 
 On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger <
 rmetz...@apache.org>
 wrote:
 
> I'd say yes because its affecting a user.
> 
> On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek <
> aljos...@apache.org
>>> 
> wrote:
> 
>> I have a fix for this issue:
>> 
> 
>>> 
>> 
> 
 
>>> 
>> https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
>> and I think this should also make it into the 0.8.0 release.
>> 
>> What do you think?
>> 
>> On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi <
> balassi.mar...@gmail.com>
>> wrote:
>>> Sure, thanks Ufuk.
>>> 
>>> On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi <
>> u...@apache.org
 
>> wrote:
>>> 
 Marton, could you also cherry-pick 7f659f6 and 7e08fa1
>> for
 the
>> next
> RC?
 It's a minor update to the README describing the IDE
>> setup.
 
 I will closed the respective issue FLINK-1109.
 
 On 08 Jan 2015, at 23:50, Henry Saputra <
> henry.sapu...@gmail.com
>>> 
>> wrote:
 
> Marton, could you close this VOTE thread by replying to
>>> the
> original
> email and append [CANCEL] in the subject line.
> 
> - Henry
> 
> On Thu, Jan 8, 2015 at 9:35 AM, Márton Balas

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Till Rohrmann
I only have to fix a last dead lock and then the fix should be working.

On Mon, Jan 12, 2015 at 11:06 AM, Ufuk Celebi  wrote:

> Thanks, Marton. Yes, I think
> https://issues.apache.org/jira/browse/FLINK-1376.
>
> Till has a fix coming up for it. Should we wait for this or postpone it to
> 0.8.1?
>
> – Ufuk
>
> On 12 Jan 2015, at 10:16, Márton Balassi  wrote:
>
> > Based on the requests listed in this thread Stephan and myself
> > cherry-picked the following commits:
> >
> > 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
> > (rmetzger)
> > 40a2705 [FLINK-1266] Update mongodb link and address pull request
> comments
> > (rmetzger)
> > cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
> > initializing … (rmetzger)
> > 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
> > e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
> > 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
> > e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
> > available… (rmetzger)
> >
> > 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type
> parameters
> > (aljoscha)
> > fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer
> (aljoscha)
> > ada35eb [FLINK-1378] [scala] Add support for Try[A]
> > (Success/Failure) (aljoscha)
> >
> > a64abe1 [docs] Update README and internals (scheduling) for graduation
> and
> > fi… (SephanEwen)
> > 4f95ce7 Fix typo in README.md (uce)
> >
> > Both of us committed some other fixes mainly for the dependencies and
> > licences, the distribution and the quickstarts. Pushing the results as
> soon
> > as the tests pass.
> >
> > Does anyone have any other issue that has to be included in the next
> > release candidate?
> >
> >
> > On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger 
> wrote:
> >
> >> I've tested it on an empty YARN cluster, allocating more containers than
> >> available.
> >> Flink will then allocate as many containers as possible.
> >>
> >> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen  wrote:
> >>
> >>> Seems reasonable. Have you tested it on a cluster with concurrent YARN
> >>> jobs?
> >>>
> >>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger 
> >>> wrote:
> >>>
>  I would really like to include this commit into the 0.8 release as
> >> well:
> 
> 
> >>>
> >>
> https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
>  A user is affected by this issue.
>  If you agree, I can merge it.
> 
>  On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen 
> >> wrote:
> 
> > I have gone through the code, cleaned up dependencies and made sure
> >>> that
> > all licenses are correctly handled.
> > The changes are in the public release-0.8 branch.
> >
> > From that side, the code is now good to go in my opinion.
> >
> >
> > Stephan
> >
> >
> > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger <
> >> rmetz...@apache.org>
> > wrote:
> >
> >> I've updated the docs/_config.yml variables to reflect that hadoop2
> >>> is
> > now
> >> the default profile: https://github.com/apache/flink/pull/294
> >>
> >> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen 
> >>> wrote:
> >>
> >>> Just as a heads up. I am almost through with checking
> >> dependencies
>  and
> >>> licenses.
> >>> Will commit that later tonight or tomorrow.
> >>>
> >>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen 
>  wrote:
> >>>
>  I vote to include it as well. It is sort of vital for advanced
> >>> use
>  of
> >> the
>  Scala API.
> 
>  It is also an isolated change that does not affect other
>  components,
> > so
> >>> it
>  should be testable very well.
> 
>  On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger <
>  rmetz...@apache.org>
>  wrote:
> 
> > I'd say yes because its affecting a user.
> >
> > On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek <
> > aljos...@apache.org
> >>>
> > wrote:
> >
> >> I have a fix for this issue:
> >>
> >
> >>>
> >>
> >
> 
> >>>
> >>
> https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
> >> and I think this should also make it into the 0.8.0 release.
> >>
> >> What do you think?
> >>
> >> On Fri, Jan 9, 2015 at 3:28 PM, Márton Balassi <
> > balassi.mar...@gmail.com>
> >> wrote:
> >>> Sure, thanks Ufuk.
> >>>
> >>> On Fri, Jan 9, 2015 at 3:15 PM, Ufuk Celebi <
> >> u...@apache.org
> 
> >> wrote:
> >>>
>  Marton, could you also cherry-pick 7f659f6 and 7e08fa1
> >> for
>  the
> >> next
> > RC?
>  It's a minor update to the README describing the IDE
> >> setup.
> 
>  I will closed the respective issue FLINK-1109.
> >

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Stephan Ewen
It would be good to have the patch, but it is also a very tricky patch, so
pushing it hastily may be problematic.

We could put out 0.8 and very soon a 0.8.1 with that fix.
Am 12.01.2015 11:12 schrieb "Till Rohrmann" :

> I only have to fix a last dead lock and then the fix should be working.
>
> On Mon, Jan 12, 2015 at 11:06 AM, Ufuk Celebi  wrote:
>
> > Thanks, Marton. Yes, I think
> > https://issues.apache.org/jira/browse/FLINK-1376.
> >
> > Till has a fix coming up for it. Should we wait for this or postpone it
> to
> > 0.8.1?
> >
> > – Ufuk
> >
> > On 12 Jan 2015, at 10:16, Márton Balassi 
> wrote:
> >
> > > Based on the requests listed in this thread Stephan and myself
> > > cherry-picked the following commits:
> > >
> > > 7634310 [FLINK-1266] Generalize DistributedFileSystem implementation
> > > (rmetzger)
> > > 40a2705 [FLINK-1266] Update mongodb link and address pull request
> > comments
> > > (rmetzger)
> > > cd66ced [FLINK-1266] Properly pass the fs.defaulFS setting when
> > > initializing … (rmetzger)
> > > 3bf2d21 [FLINK-1266] More dependency exclusions (rmetzger)
> > > e8ab5b7 [FLINK-1266] Backport fix to 0.8 (rmetzger)
> > > 7cd0f47 [docs] Prepare documentation for 0.8 release (rmetzger)
> > > e83ccd0 [FLINK-1385] Print warning if not resources are _currently_
> > > available… (rmetzger)
> > >
> > > 4f9dcae [FLINK-1378] [scala] Fix type extraction for nested type
> > parameters
> > > (aljoscha)
> > > fced2eb [FLINK-1378] Add support for Throwables in KryoSerializer
> > (aljoscha)
> > > ada35eb [FLINK-1378] [scala] Add support for Try[A]
> > > (Success/Failure) (aljoscha)
> > >
> > > a64abe1 [docs] Update README and internals (scheduling) for graduation
> > and
> > > fi… (SephanEwen)
> > > 4f95ce7 Fix typo in README.md (uce)
> > >
> > > Both of us committed some other fixes mainly for the dependencies and
> > > licences, the distribution and the quickstarts. Pushing the results as
> > soon
> > > as the tests pass.
> > >
> > > Does anyone have any other issue that has to be included in the next
> > > release candidate?
> > >
> > >
> > > On Sat, Jan 10, 2015 at 7:34 PM, Robert Metzger 
> > wrote:
> > >
> > >> I've tested it on an empty YARN cluster, allocating more containers
> than
> > >> available.
> > >> Flink will then allocate as many containers as possible.
> > >>
> > >> On Sat, Jan 10, 2015 at 7:31 PM, Stephan Ewen 
> wrote:
> > >>
> > >>> Seems reasonable. Have you tested it on a cluster with concurrent
> YARN
> > >>> jobs?
> > >>>
> > >>> On Sat, Jan 10, 2015 at 7:28 PM, Robert Metzger  >
> > >>> wrote:
> > >>>
> >  I would really like to include this commit into the 0.8 release as
> > >> well:
> > 
> > 
> > >>>
> > >>
> >
> https://github.com/apache/flink/commit/ec2bb573d185429f8b3efe111850b8f0e67f2704
> >  A user is affected by this issue.
> >  If you agree, I can merge it.
> > 
> >  On Sat, Jan 10, 2015 at 7:25 PM, Stephan Ewen 
> > >> wrote:
> > 
> > > I have gone through the code, cleaned up dependencies and made sure
> > >>> that
> > > all licenses are correctly handled.
> > > The changes are in the public release-0.8 branch.
> > >
> > > From that side, the code is now good to go in my opinion.
> > >
> > >
> > > Stephan
> > >
> > >
> > > On Sat, Jan 10, 2015 at 12:30 PM, Robert Metzger <
> > >> rmetz...@apache.org>
> > > wrote:
> > >
> > >> I've updated the docs/_config.yml variables to reflect that
> hadoop2
> > >>> is
> > > now
> > >> the default profile: https://github.com/apache/flink/pull/294
> > >>
> > >> On Fri, Jan 9, 2015 at 8:52 PM, Stephan Ewen 
> > >>> wrote:
> > >>
> > >>> Just as a heads up. I am almost through with checking
> > >> dependencies
> >  and
> > >>> licenses.
> > >>> Will commit that later tonight or tomorrow.
> > >>>
> > >>> On Fri, Jan 9, 2015 at 7:09 PM, Stephan Ewen 
> >  wrote:
> > >>>
> >  I vote to include it as well. It is sort of vital for advanced
> > >>> use
> >  of
> > >> the
> >  Scala API.
> > 
> >  It is also an isolated change that does not affect other
> >  components,
> > > so
> > >>> it
> >  should be testable very well.
> > 
> >  On Fri, Jan 9, 2015 at 7:05 PM, Robert Metzger <
> >  rmetz...@apache.org>
> >  wrote:
> > 
> > > I'd say yes because its affecting a user.
> > >
> > > On Fri, Jan 9, 2015 at 7:00 PM, Aljoscha Krettek <
> > > aljos...@apache.org
> > >>>
> > > wrote:
> > >
> > >> I have a fix for this issue:
> > >>
> > >
> > >>>
> > >>
> > >
> > 
> > >>>
> > >>
> >
> https://issues.apache.org/jira/browse/FLINK-1378?jql=project%20%3D%20FLINK
> > >> and I think this should also make it into the 0.8.0 release.
> > >>
> > >> What do you think?
> > >>
> > >> On Fri, Jan 9, 2

Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Ufuk Celebi
On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen  wrote:

> It would be good to have the patch, but it is also a very tricky patch, so
> pushing it hastily may be problematic.
>

I agree. @Till: would that be OK with you? If yes, I think we are good to
go for the next RC.


Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Till Rohrmann
Yeah I agree with that.

On Mon, Jan 12, 2015 at 11:30 AM, Ufuk Celebi  wrote:

> On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen  wrote:
>
> > It would be good to have the patch, but it is also a very tricky patch,
> so
> > pushing it hastily may be problematic.
> >
>
> I agree. @Till: would that be OK with you? If yes, I think we are good to
> go for the next RC.
>


Re: [VOTE] Release Apache Flink 0.8.0 (RC1)

2015-01-12 Thread Márton Balassi
Thanks, guys. FLINK-1376 goes to 0.8.1. I'm building the next release
candidate for 0.8.0 then.

On Mon, Jan 12, 2015 at 11:36 AM, Till Rohrmann 
wrote:

> Yeah I agree with that.
>
> On Mon, Jan 12, 2015 at 11:30 AM, Ufuk Celebi  wrote:
>
> > On Mon, Jan 12, 2015 at 11:22 AM, Stephan Ewen  wrote:
> >
> > > It would be good to have the patch, but it is also a very tricky patch,
> > so
> > > pushing it hastily may be problematic.
> > >
> >
> > I agree. @Till: would that be OK with you? If yes, I think we are good to
> > go for the next RC.
> >
>


Gather a distributed dataset

2015-01-12 Thread Alexander Alexandrov
Hi there,

I wished for intermediate datasets, and Santa Ufuk made my wishes come true
(thank you, Santa)!

Now that FLINK-986 is in the mainline, I want to ask some practical
questions.

In Spark, there is a way to put a value from the local driver to the
distributed runtime via

val x = env.parallelize(...) // expose x to the distributed runtime
val y = dataflow(env, x) // y is produced by a dataflow which reads from x

and also to get a value from the distributed runtime back to the driver

val z = env.collect("y")

As far as I know, in Flink we have an equivalent for parallelize

val x = env.fromCollection(...)

but not for collect. Is this still the case?

If yes, how hard would it be to add this feature at the moment? Can you
give me some pointers?

Regards,

Alexander


Re: Type Hints in the Java API

2015-01-12 Thread Timo Walther

Thanks for the doc!

I will add a complete list of all currently supported types (including 
Enums, Writables etc. not metioned in the doc) and types that will be 
supported in near future (e.g. multi-dimensional arrays).


On 09.01.2015 17:37, Stephan Ewen wrote:

Here is a doc that details type handling/extraction for both the Java API
and the Scala API, including the type hints.

https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md

Enjoy :-)

On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri 
wrote:
Hi,

thanks for the nice explanation and the great work!
This will simplify our Graph API-lives a lot ^^

Cheers,
V.

On 9 January 2015 at 11:59, Stephan Ewen  wrote:


I am adding a derivative of that text to the docs right now.



On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger 
wrote:


Thank you!

It would be amazing if you or somebody else could copy paste this into

our

documentation.

On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen 

wrote:

Hi everyone!

We recently introduced type hints for the Java API. Since that is a

pretty

useful feature, I wanted to quickly explain what it is.

Kudos to Timo Walther, who did a large part of this work.


*Background*

Flink tries to figure out as much information about what types enter

and

leave user functions as possible.

  - For the POJO API (where one refers to field names), we need that
information to make checks (for typos and type compatibility) before

the

job is executed.

  - For the upcoming logical programs (see roadmap draft) we need this

to

know the "schema" of functions.

  - The more we know, the better serialization and data layout schemes

the

compiler/optimizer can develop. That is quite important for the

memory

usage paradigm in Flink (work on serialized data inside/outside the

heap

and make serialization very cheap)

  - Finally, it also spares users having to worry about serialization
frameworks and having to register types at those frameworks.


*Problem*

Scala is an easy case, because it preserves generic type information
(ClassTags / Type Manifests), but Java erases generic type info in

most

cases.

We do reflection analysis on the user function classes to get the

generic

types. This logic also contains some simple type inference in case

the

functions have type variables (such as a MapFunction
Long>>).

Not in all cases can we figure out the data types of functions

reliably

in

Java. Some issues remain with generic lambdas (we are trying to solve

this

with the Java community, see below) and with generic type variables

that

we

cannot infer.


*Solution: Type Hints*

To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
master introduced type hints. They allow you to tell the system types

that

it cannot infer.

You can write code like

DataSet result =
 dataSet.map(new MyGenericNonInferrableFunction()).returns(SomeType.class);


To make specification of generic types easier, it also comes with a

parser

for simple string representations of generic types:

   .returns("Tuple2")


We suggest to use this instead of the "ResultTypeQueryable"

workaround

that

has been used in some cases.


*Improving Type information in Java*

One Flink committer (Timo Walther) has actually become active in the
Eclipse JDT compiler community and in the OpenJDK community to try

and

improve the way type information is available for lambdas.


Greetings,
Stephan





Fwd: The Apache Software Foundation Announces Apache™ Flink™ as a Top-Level Project

2015-01-12 Thread Kostas Tzoumas
-- Forwarded message --
From: *Sally Khudairi* 
Date: Monday, January 12, 2015
Subject: The Apache Software Foundation Announces Apache™ Flink™ as a
Top-Level Project
To: Apache Announce List 


>> this announcement is available online at http://s.apache.org/YrZ

Open Source distributed Big Data system for expressive, declarative, and
efficient batch and streaming data processing and analysis

Forest Hill, MD –12 January 2015– The Apache Software Foundation (ASF), the
all-volunteer developers, stewards, and incubators of more than 350 Open
Source projects and initiatives, announced today that Apache™ Flink™ has
graduated from the Apache Incubator to become a Top-Level Project (TLP),
signifying that the project's community and products have been
well-governed under the ASF's meritocratic process and principles.

Apache Flink is an Open Source distributed data analysis engine for batch
and streaming data. It offers programming APIs in Java and Scala, as well
as specialized APIs for graph processing, with more libraries in the making.

"I am very happy that the ASF has become the home for Flink," said Stephan
Ewen, Vice President of Apache Flink. "For a community-driven effort, I can
think of no better umbrella. It is great to see the project is maturing and
many new people are joining the community."

Flink uses a unique combination of streaming/pipelining and batch
processing techniques to create a platform that covers and unifies a broad
set of batch and streaming data analytics use cases. The project has put
significant efforts into making a system that runs reliably and fast in a
wide variety of scenarios. For that reason, Flink contained its own type
serialization, memory management, and cost-based query optimization
components from the early days of the project.

Apache Flink has its roots in the Stratosphere research project that
started in 2009 at TU Berlin together with the Berlin and later the
European data management communities, including HU Berlin, Hasso Plattner
Institute, KTH (Stockholm), ELTE (Budapest), and others. Several Flink
committers recently started data Artisans, a Berlin-based startup committed
to growing Flink both in code and community as 100% Open Source. More than
70 people have by now contributed to Flink.

"Becoming a Top-Level Project in such short time is a great milestone for
Flink and reflects the speed with which the community has been growing,"
said Kostas Tzoumas, co-founder and CEO of data Artisans. "The community is
currently working on some exciting new features that make Flink even more
powerful and accessible to a wider audience, and several companies around
the world are including Flink in their data infrastructure."

"We use Apache Flink as part of our production data infrastructure," said
Ijad Madisch, co-founder and CEO of ResearchGate. "We are happy all around
and excited that Flink provides us with the opportunity for even better
developer productivity and testability, especially for complex data flows.
It’s with good reason that Flink is now a top-level Apache project."

"I have been experimenting with Flink, and we are very excited to hear that
Flink is becoming a top-level Apache project," said Anders Arpteg,
Analytics Machine Learning Manager at Spotify.

Denis Arnaud, Head of Data Science Development of Travel Intelligence at
Amadeus said, "At Amadeus, we continually seek for better improvement in
our analytic platform and our experiments with Apache Flink for analytics
on our travel data show a lot of potential in the system for our production
use."

"Flink was a pleasure to mentor as a new Apache project," said Alan Gates,
Apache Flink Incubator champion at the ASF, and architect/co-founder at
Hortonworks. "The Flink team learned The Apache Way very quickly. They
worked hard at being open in their decision making and including new
contributors. Those of us mentoring them just needed to point them in the
right direction and then let them get to work."

Availability and Oversight
As with all Apache products, Apache Flink software is released under the
Apache License v2.0, and is overseen by a self-selected team of active
contributors to the project. A Project Management Committee (PMC) guides
the Project's day-to-day operations, including community development and
product releases. For documentation and ways to become involved with Apache
Flink, visit http://flink.apache.org/ and @ApacheFlink on Twitter.

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350
leading Open Source projects, including Apache HTTP Server --the world's
most popular Web server software. Through the ASF's meritocratic process
known as "The Apache Way," more than 500 individual Members and 4,500
Committers successfully collaborate to develop freely available
enterprise-grade software, benefiting millions of users worldwide:
thousands of software solutions are distributed under the Apache License;
and th

RE: Gather a distributed dataset

2015-01-12 Thread Paris Carbone
Hello Alexander,

Intermediate results are indeed looking promising, also for finally 
implementing a proper flink-shell for exploratory data analysis.
We are also looking at the moment on how to implement a collect() for the 
flink-streaming scala api that returns a Seq that can be consumed at the client 
side as a part of Flink-1344 [1]. It looks like intermediate results support 
will help, basically I would like to be able to initiate a stream endpoint at 
the client side via the JobClient perhaps, referencing an intermediate result 
id for example. For streaming this is a feature that Spark doesn't explicitly 
have (one has to use foreach and collect on a dstream which is quite 
inefficient) so I guess it would be nice to add. 

Paris

[1] https://issues.apache.org/jira/browse/FLINK-1344

From: Alexander Alexandrov [alexander.s.alexand...@gmail.com]
Sent: Monday, January 12, 2015 11:42 AM
To: dev@flink.apache.org
Subject: Gather a distributed dataset

Hi there,

I wished for intermediate datasets, and Santa Ufuk made my wishes come true
(thank you, Santa)!

Now that FLINK-986 is in the mainline, I want to ask some practical
questions.

In Spark, there is a way to put a value from the local driver to the
distributed runtime via

val x = env.parallelize(...) // expose x to the distributed runtime
val y = dataflow(env, x) // y is produced by a dataflow which reads from x

and also to get a value from the distributed runtime back to the driver

val z = env.collect("y")

As far as I know, in Flink we have an equivalent for parallelize

val x = env.fromCollection(...)

but not for collect. Is this still the case?

If yes, how hard would it be to add this feature at the moment? Can you
give me some pointers?

Regards,

Alexander

Re: Gather a distributed dataset

2015-01-12 Thread Ufuk Celebi
Hey Alexander,

On 12 Jan 2015, at 11:42, Alexander Alexandrov 
 wrote:

> Hi there,
> 
> I wished for intermediate datasets, and Santa Ufuk made my wishes come true
> (thank you, Santa)!
> 
> Now that FLINK-986 is in the mainline, I want to ask some practical
> questions.
> 
> In Spark, there is a way to put a value from the local driver to the
> distributed runtime via
> 
> val x = env.parallelize(...) // expose x to the distributed runtime
> val y = dataflow(env, x) // y is produced by a dataflow which reads from x
> 
> and also to get a value from the distributed runtime back to the driver
> 
> val z = env.collect("y")
> 
> As far as I know, in Flink we have an equivalent for parallelize
> 
> val x = env.fromCollection(...)
> 
> but not for collect. Is this still the case?

Yes, but this will change soon.

> If yes, how hard would it be to add this feature at the moment? Can you
> give me some pointers?

There is a "alpha" version/hack of this using accumulators. See 
https://github.com/apache/flink/pull/210. The problem is that each collect call 
results in a new program being executed from the sources. I think Stephan is 
working on the scheduling to support this kind of programs. From the runtime 
perspective, it is not a problem to transfer the produced intermediate results 
back to the job manager. The job manager can basically use the same mechanism 
that the task managers use. Even the accumulator version should be fine as a 
initial version.

[jira] [Created] (FLINK-1387) Integrate website index.html into jekyll layout.

2015-01-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1387:
-

 Summary: Integrate website index.html into jekyll layout.
 Key: FLINK-1387
 URL: https://issues.apache.org/jira/browse/FLINK-1387
 Project: Flink
  Issue Type: Task
  Components: Project Website
Reporter: Robert Metzger


The index.html of the Flink website is not integrated into the jekyll layout.
This makes any change harder (in particular version updates / releases) because 
updates need to be done multiple times.
Also, markdown parsing is not available to the web frontpage.



Also:
- Add twitter account to front page
- Fix broken Privacy Policy link



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Gather a distributed dataset

2015-01-12 Thread Alexander Alexandrov
Thanks, I am currently looking at the new ExecutionEnvironment API.

> I think Stephan is working on the scheduling to support this kind of
programs.

@Stephan: is there a feature branch for that somewhere?

2015-01-12 12:05 GMT+01:00 Ufuk Celebi :

> Hey Alexander,
>
> On 12 Jan 2015, at 11:42, Alexander Alexandrov <
> alexander.s.alexand...@gmail.com> wrote:
>
> > Hi there,
> >
> > I wished for intermediate datasets, and Santa Ufuk made my wishes come
> true
> > (thank you, Santa)!
> >
> > Now that FLINK-986 is in the mainline, I want to ask some practical
> > questions.
> >
> > In Spark, there is a way to put a value from the local driver to the
> > distributed runtime via
> >
> > val x = env.parallelize(...) // expose x to the distributed runtime
> > val y = dataflow(env, x) // y is produced by a dataflow which reads from
> x
> >
> > and also to get a value from the distributed runtime back to the driver
> >
> > val z = env.collect("y")
> >
> > As far as I know, in Flink we have an equivalent for parallelize
> >
> > val x = env.fromCollection(...)
> >
> > but not for collect. Is this still the case?
>
> Yes, but this will change soon.
>
> > If yes, how hard would it be to add this feature at the moment? Can you
> > give me some pointers?
>
> There is a "alpha" version/hack of this using accumulators. See
> https://github.com/apache/flink/pull/210. The problem is that each
> collect call results in a new program being executed from the sources. I
> think Stephan is working on the scheduling to support this kind of
> programs. From the runtime perspective, it is not a problem to transfer the
> produced intermediate results back to the job manager. The job manager can
> basically use the same mechanism that the task managers use. Even the
> accumulator version should be fine as a initial version.


[jira] [Created] (FLINK-1388) POJO support for writeAsCsv

2015-01-12 Thread Timo Walther (JIRA)
Timo Walther created FLINK-1388:
---

 Summary: POJO support for writeAsCsv
 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Priority: Minor


It would be great if one could simply write out POJOs in CSV format.

{code}
public class MyPojo {
   String a;
   int b;
}
{code}

to:

{code}
# CSV file of org.apache.flink.MyPojo: String a, int b
"Hello World", 42
"Hello World 2", 47
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat

2015-01-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1389:
-

 Summary: Allow setting custom file extensions for files created by 
the FileOutputFormat
 Key: FLINK-1389
 URL: https://issues.apache.org/jira/browse/FLINK-1389
 Project: Flink
  Issue Type: New Feature
Reporter: Robert Metzger
Priority: Minor


A user requested the ability to name avro files with the "avro" extension.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Flink @ Heise

2015-01-12 Thread Andreas Kunft

FYI

http://www.heise.de/newsticker/meldung/Big-Data-Apache-Flink-wird-Top-Level-Projekt-2516177.html


Re: Flink @ Heise

2015-01-12 Thread Robert Metzger
Hey Andreas,

thanks for sharing. Due to the press announcement by the ASF today, there
is quite some attention in the news.

For all non-germans: Heise is one of the biggest IT-news sites here.

On Mon, Jan 12, 2015 at 3:10 PM, Andreas Kunft 
wrote:

> FYI
>
> http://www.heise.de/newsticker/meldung/Big-Data-
> Apache-Flink-wird-Top-Level-Projekt-2516177.html
>


[jira] [Created] (FLINK-1390) java.lang.ClassCastException: X cannot be cast to X

2015-01-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1390:
-

 Summary:  java.lang.ClassCastException: X cannot be cast to X
 Key: FLINK-1390
 URL: https://issues.apache.org/jira/browse/FLINK-1390
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.8
Reporter: Robert Metzger


A user is affected by an issue, which is probably caused by different 
classloaders being used for loading user classes.

Current state of investigation:
- the error happens in yarn sessions (there is only a YARN environment 
available)
- the error doesn't happen on the first time the job is being executed. It only 
happens on subsequent executions.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1391) Kryo fails to properly serialize avro collection types

2015-01-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1391:
-

 Summary: Kryo fails to properly serialize avro collection types
 Key: FLINK-1391
 URL: https://issues.apache.org/jira/browse/FLINK-1391
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.8, 0.9
Reporter: Robert Metzger


Before FLINK-610, Avro was the default generic serializer.
Now, special types coming from Avro are handled by Kryo .. which seems to cause 
errors like:

{code}
Exception in thread "main" 
org.apache.flink.runtime.client.JobExecutionException: 
java.lang.NullPointerException
at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
at 
com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
at 
org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143)
at 
org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148)
at 
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244)
at 
org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56)
at 
org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71)
at 
org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189)
at 
org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176)
at 
org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51)
at 
org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53)
at 
org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
at java.lang.Thread.run(Thread.java:744)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Release Apache Flink 0.8.0 (RC2)

2015-01-12 Thread Márton Balassi
Please vote on releasing the following candidate as Apache Flink version
0.8.0

This release will be the first major release for Flink as a top level
project.

-
The commit to be voted on is in the branch "release-0.8.0-rc2"
(commit 9ee74f3):
https://git-wip-us.apache.org/repos/asf/flink/commit/9ee74f3

The release artifacts to be voted on can be found at:
http://people.apache.org/~mbalassi/flink-0.8.0-rc2/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/mbalassi.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapacheflink-1022
-


Please vote on releasing this package as Apache Flink 0.8.0.

The vote is open for the next 72 hours and passes if a majority of at least
three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Flink 0.8.0
[ ] -1 Do not release this package because ...


Re: Flink @ Heise

2015-01-12 Thread Henry Saputra
Thanks for sharing =)

On Mon, Jan 12, 2015 at 6:10 AM, Andreas Kunft
 wrote:
> FYI
>
> http://www.heise.de/newsticker/meldung/Big-Data-Apache-Flink-wird-Top-Level-Projekt-2516177.html


[jira] [Created] (FLINK-1392) Serializing Protobuf - issue 1

2015-01-12 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1392:


 Summary: Serializing Protobuf - issue 1
 Key: FLINK-1392
 URL: https://issues.apache.org/jira/browse/FLINK-1392
 Project: Flink
  Issue Type: Bug
Reporter: Felix Neutatz
Priority: Minor


Hi, I started to experiment with Parquet using Protobuf.

When I use the standard Protobuf class: 
com.twitter.data.proto.tutorial.AddressBookProtos

The code which I run, can be found here: 
[https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java]

I get the following exception:

{code:xml}
Exception in thread "main" java.lang.Exception: Deserializing the 
InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: Could 
not read the user code wrapper: Error while deserializing element from 
collection
at 
org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: 
org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could 
not read the user code wrapper: Error while deserializing element from 
collection
at 
org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285)
at 
org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57)
... 25 more
Caused by: java.io.IOException: Error while deserializing element from 
collection
at 
org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apa

[jira] [Created] (FLINK-1393) Serializing Protobuf - issue 2

2015-01-12 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1393:


 Summary: Serializing Protobuf - issue 2
 Key: FLINK-1393
 URL: https://issues.apache.org/jira/browse/FLINK-1393
 Project: Flink
  Issue Type: Bug
Reporter: Felix Neutatz
Priority: Minor


Additionally to [FLINK-1392|https://issues.apache.org/jira/browse/FLINK-1392] I 
also tried another Protobuf class, which throws a different exception:

{code:xml}
Exception in thread "main" java.lang.Exception: Deserializing the InputFormat 
([(null,url: "www.test.de"
archiveTime: 123
scripts: "script.de"
scripts: "script1.de"
iframes: "iframe.de"
iframes: "iframe1.de"
links: "link.de"
links: "link1.de"
images: "img.de"
images: "img1.de"
)]) failed: unread block data
at 
org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at 
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at 
org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at 
org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
at akka.dispatch.Mailbox.run(Mailbox.scala:221)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalStateException: unread block data
at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274)
at 
org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236)
at 
org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:281)
at 
org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57)
... 25 more
{code}

The program which causes this exception can be found here: 
[https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput2.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)