date:20160512

Re: [QUESTION] thread model in Flink makes me confused

2016-05-12 Thread Flavio Pompermaier

That would be definitely awesome (and useful also for us)! +1


On Thu, May 12, 2016 at 7:38 AM, Aljoscha Krettek 
wrote:

> I favor the one-cluster-per job approach. If this becomes the dominant
> approach to doing things we could also think about introducing a separate
> component that would allow monitoring the jobs in these per-job clusters as
> is now possible when running multiple jobs in a single cluster.
>
> On Thu, 12 May 2016 at 01:59 Wright, Eron  wrote:
>
> > One option is to use a separate cluster (JobManager + TaskManagers) for
> > each job.   This is fairly straightforward with the YARN support - "flink
> > run” can launch a cluster for a job and tear it down afterwards.
> >
> > Of course this means you must deploy YARN.   That doesn’t necessarily
> > imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to
> > support the YARN staging directory.
> >
> > This approach also facilitates richer scheduling and multi-user
> scenarios.
> >
> > One downside is the loss of a unified web UI to view all jobs.
> >
> >
> > > On May 11, 2016, at 8:32 AM, Jark Wu 
> wrote:
> > >
> > >
> > > As I know, Flink uses thread model, that means one TaskManager process
> > may run many different operator threads from different jobs. So tasks
> from
> > different jobs will compete for memory and CPU in the one process. In the
> > worst case scenario, the bad job will eat most of CPU and memroy which
> may
> > lead to OOM, and then the regular job died too. And there's another
> > problem, tasks from different jobs will print there logs into the same
> > file(the taskmanager log file). This increases the difficulty of
> debugging.
> > >
> > > As I know, Storm will spawn workers for every job. The tasks in one
> > worker belong to the same job. So I'm confused the purpose or advantages
> of
> > Flink design. One more question, is there any tips to solves the issues
> > above? Or any suggestions to implemention the similar desgin with Storm ?
> > >
> > > Thank you for any answers in advance!
> > >
> > > Regards,
> > > Jark Wu
> > >
> > >
> > >
> >
> >
>

Re: [QUESTION] thread model in Flink makes me confused

2016-05-12 Thread Wright, Eron

Funny you should say that, because in a recent discussion with Stephan and 
Jamie, we talked about reworking the web UI to talk to numerous job managers.   
I’ve been looking into is as part of the Mesos work (FLINK-1984).  I’ll start a 
new thread about it soon.

> On May 11, 2016, at 10:38 PM, Aljoscha Krettek  wrote:
> 
> I favor the one-cluster-per job approach. If this becomes the dominant
> approach to doing things we could also think about introducing a separate
> component that would allow monitoring the jobs in these per-job clusters as
> is now possible when running multiple jobs in a single cluster.
> 
> On Thu, 12 May 2016 at 01:59 Wright, Eron  wrote:
> 
>> One option is to use a separate cluster (JobManager + TaskManagers) for
>> each job.   This is fairly straightforward with the YARN support - "flink
>> run” can launch a cluster for a job and tear it down afterwards.
>> 
>> Of course this means you must deploy YARN.   That doesn’t necessarily
>> imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to
>> support the YARN staging directory.
>> 
>> This approach also facilitates richer scheduling and multi-user scenarios.
>> 
>> One downside is the loss of a unified web UI to view all jobs.
>> 
>> 
>>> On May 11, 2016, at 8:32 AM, Jark Wu  wrote:
>>> 
>>> 
>>> As I know, Flink uses thread model, that means one TaskManager process
>> may run many different operator threads from different jobs. So tasks from
>> different jobs will compete for memory and CPU in the one process. In the
>> worst case scenario, the bad job will eat most of CPU and memroy which may
>> lead to OOM, and then the regular job died too. And there's another
>> problem, tasks from different jobs will print there logs into the same
>> file(the taskmanager log file). This increases the difficulty of debugging.
>>> 
>>> As I know, Storm will spawn workers for every job. The tasks in one
>> worker belong to the same job. So I'm confused the purpose or advantages of
>> Flink design. One more question, is there any tips to solves the issues
>> above? Or any suggestions to implemention the similar desgin with Storm ?
>>> 
>>> Thank you for any answers in advance!
>>> 
>>> Regards,
>>> Jark Wu
>>> 
>>> 
>>> 
>> 
>>

Re: [RESULT] [VOTE] Release Apache Flink 1.0.3 (RC3)

2016-05-12 Thread Till Rohrmann

Thanks Ufuk :-)

On Wed, May 11, 2016 at 5:16 PM, Stephan Ewen  wrote:

> Thanks for pushing this release Ufuk!
>
> On Wed, May 11, 2016 at 5:12 PM, Fabian Hueske  wrote:
>
> > Thanks Ufuk!
> >
> > 2016-05-11 16:39 GMT+02:00 Ufuk Celebi :
> >
> > > This vote has passed with 3 binding +1 votes. Thanks to everyone who
> > > contributed and tested the release candidate.
> > >
> > > +1s:
> > > Gyula Fora (binding)
> > > Fabian Hueske (binding)
> > > Ufuk Celebi (binding)
> > >
> > > There are no 0s or -1s.
> > >
> > > I'll go ahead finalize and package this release.
> > >
> > > On Mon, May 9, 2016 at 10:24 AM, Ufuk Celebi  wrote:
> > > > Dear Flink community,
> > > >
> > > > Please vote on releasing the following candidate as Apache Flink
> > version
> > > 1.0.3.
> > > >
> > > > The commit to be voted on:
> > > > f3a6b5f1e8d85d10e1449e2f96291408b781
> > > >
> > > > Branch:
> > > > release-1.0.3-rc3 (see
> > > >
> > >
> >
> https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git;a=shortlog;h=refs/heads/release-1.0.3-rc3
> > > )
> > > >
> > > > The release artifacts to be voted on can be found at:
> > > > http://home.apache.org/~uce/flink-1.0.3-rc3/
> > > >
> > > > The release artifacts are signed with the key with fingerprint
> > 9D403309:
> > > > http://www.apache.org/dist/flink/KEYS
> > > >
> > > > The staging repository for this release can be found at:
> > > >
> https://repository.apache.org/content/repositories/orgapacheflink-1096
> > > >
> > > > -
> > > >
> > > > The vote is open for the next 48 hours and passes if a majority of at
> > > > least three +1 PMC votes are cast.
> > > >
> > > > The vote ends on Wednesday May 11, 2016.
> > > >
> > > > [ ] +1 Release this package as Apache Flink 1.0.3
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > ===
> > > >
> > > > The following commits have been added since the 1.0.2 release
> > (excluding
> > > docs):
> > > >
> > > > * 4d3dcb1 - [FLINK-3860] [connector-wikiedits] Add retry loop to
> > > > WikipediaEditsSourceTest (5 days ago) 
> > > > * f1d34b1 - [FLINK-3790] [streaming] Use proper hadoop config in
> > > > rolling sink (12 hours ago) 
> > > > * 4a34f6f - [FLINK-3835] [optimizer] Add input id to JSON plan to
> > > > resolve ambiguous input names. (2 days ago) 
> > > > * d8feb15 - [hotfix] OptionSerializer.duplicate to respect stateful
> > > > element serializer (3 days ago) 
> > > > * 7062b0a - [FLINK-3803] [runtime] Pass CheckpointStatsTracker to
> > > > ExecutionGraph (3 days ago) 
> > > > * f80f6d6 - [FLINK-3678] [dist, docs] Make Flink logs directory
> > > > configurable (4 days ago) 
> > > > * 344a55e - [hotfix] [cep] Make cep window border treatment
> consistent
> > > > (9 days ago) 
> > >
> >
>

Re: How to specify dependencies for an application that needs to use modified version of Flink

2016-05-12 Thread Jark

Hi Saiph, 
You can enter flink directory and run  `mvn clean install -DskipTest=true` 
to install all the modules (including flunk-streaming-java) into your local .m2 
repository .  After that, change your app dependencies version to the version 
of your flink, such as “1.1-SNAPSHOT”. At last, reimport your app project.
  
- Jark Wu

> 在 2016年5月12日，上午2:33，Saiph Kappa  写道：
> 
> Hi,
> 
> I'm performing some modifications on Flink (current trunk version). I want
> a scala app (sbt based) to use that modified version. I'm only modifying
> the flink-streaming-java module, what is the typical way to specify the
> dependencies for my application in this case? Should I copy all jars to the
> lib folder of my app, or to build a big fat jar? how do the devs here do it?
> 
> Thanks.

Re: How to specify dependencies for an application that needs to use modified version of Flink

2016-05-12 Thread Jark

Sorry for mistyped the command. You can enter into flink/flink-streaming-java 
and run `mvn clean package install -DskipTests=true` . It will install only 
flink-streaming-java module.

> 在 2016年5月12日，上午10:02，Jark  写道：
> 
> Hi Saiph, 
>You can enter flink directory and run  `mvn clean install -DskipTest=true` 
> to install all the modules (including flunk-streaming-java) into your local 
> .m2 repository .  After that, change your app dependencies version to the 
> version of your flink, such as “1.1-SNAPSHOT”. At last, reimport your app 
> project.
> 
> - Jark Wu
> 
>> 在 2016年5月12日，上午2:33，Saiph Kappa  写道：
>> 
>> Hi,
>> 
>> I'm performing some modifications on Flink (current trunk version). I want
>> a scala app (sbt based) to use that modified version. I'm only modifying
>> the flink-streaming-java module, what is the typical way to specify the
>> dependencies for my application in this case? Should I copy all jars to the
>> lib folder of my app, or to build a big fat jar? how do the devs here do it?
>> 
>> Thanks.
>

[ANNOUNCE] Flink 1.0.3 Released

2016-05-12 Thread Ufuk Celebi

The Flink PMC is pleased to announce the availability of Flink 1.0.3.

The official release announcement:
http://flink.apache.org/news/2016/05/11/release-1.0.3.html

Release binaries:
http://apache.openmirror.de/flink/flink-1.0.3/

Please update your Maven dependencies to the new 1.0.3 version and
update your binaries.

On behalf of the Flink PMC, I would like to thank everybody who
contributed to the release.

Re: How to specify dependencies for an application that needs to use modified version of Flink

2016-05-12 Thread Flavio Pompermaier

Since FLINK-1827 was merged you could also skip test compilation with
-Dmaven.test.skip=true if you don't want to waste time and resources :)
On 12 May 2016 10:06, "Jark"  wrote:

> Sorry for mistyped the command. You can enter into
> flink/flink-streaming-java and run `mvn clean package install
> -DskipTests=true` . It will install only flink-streaming-java module.
>
> > 在 2016年5月12日，上午10:02，Jark  写道：
> >
> > Hi Saiph,
> >You can enter flink directory and run  `mvn clean install
> -DskipTest=true` to install all the modules (including
> flunk-streaming-java) into your local .m2 repository .  After that, change
> your app dependencies version to the version of your flink, such as
> “1.1-SNAPSHOT”. At last, reimport your app project.
> >
> > - Jark Wu
> >
> >> 在 2016年5月12日，上午2:33，Saiph Kappa  写道：
> >>
> >> Hi,
> >>
> >> I'm performing some modifications on Flink (current trunk version). I
> want
> >> a scala app (sbt based) to use that modified version. I'm only modifying
> >> the flink-streaming-java module, what is the typical way to specify the
> >> dependencies for my application in this case? Should I copy all jars to
> the
> >> lib folder of my app, or to build a big fat jar? how do the devs here
> do it?
> >>
> >> Thanks.
> >
>
>

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Gábor Gévay

Hello,

There are at least three Gábors in the Flink community,  :) so
assuming that the Gábor in the list of maintainers of the DataSet API
is referring to me, I'll be happy to do it. :)

Best,
Gábor G.



2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing list and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> 
> Goals
> 
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
> But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about many core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of certain
> parts of the system are.
>   -> What are often encountered known issues
>   -> What are the most frequently requested features
>
>
> 
> Problems
> 
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them up. Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are sometimes not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels responsible to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings, efforts, and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===
> The Proposal
> ===
>
> Since we are building a parallel system, the natural solution seems to be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between the
> multiple maintainers.
>
>
> A maintainers' role is:
> -
>
>   - Have an overview of which of the open pull requests relate to their
> component
>   - Drive the pull requests relating to the component to resolution
>   => Moderate the decision whether the feature should be merged
>   => Make sure the pull request gets a shepherd.
>In many cases, the maintainers would shepherd themselves.
>   => In case the shepherd becomes inactive, the maintainers need to
> find a new shepherd.
>
>   - Have an overview of what are the known issues of their component
>   - Have an overview of what are the frequently requested features of their
> component
>
>   - Have an overview of which contributors are doing very good work in
> their component,
> would be candidates for committers, and should be mentored towards that.
>
>   - Resolve email threads that have been brought to their attention,
> because deeper
> component knowledge is required for that thread.
>
> A maintainers' role is NOT:
> --
>
>   - Review all pull requests of that c

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Márton Balassi

+1 for the proposal
@ggevay: I do think that it refers to you. :)

On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  wrote:

> Hello,
>
> There are at least three Gábors in the Flink community,  :) so
> assuming that the Gábor in the list of maintainers of the DataSet API
> is referring to me, I'll be happy to do it. :)
>
> Best,
> Gábor G.
>
>
>
> 2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > Hi everyone!
> >
> > We propose to establish some lightweight structures in the Flink open
> > source community and development process,
> > to help us better handle the increased interest in Flink (mailing list
> and
> > pull requests), while not overwhelming the
> > committers, and giving users and contributors a good experience.
> >
> > This proposal is triggered by the observation that we are reaching the
> > limits of where the current community can support
> > users and guide new contributors. The below proposal is based on
> > observations and ideas from Till, Robert, and me.
> >
> > 
> > Goals
> > 
> >
> > We try to achieve the following
> >
> >   - Pull requests get handled in a timely fashion
> >   - New contributors are better integrated into the community
> >   - The community feels empowered on the mailing list.
> > But questions that need the attention of someone that has deep
> > knowledge of a certain part of Flink get their attention.
> >   - At the same time, the committers that are knowledgeable about many
> core
> > parts do not get completely overwhelmed.
> >   - We don't overlook threads that report critical issues.
> >   - We always have a pretty good overview of what the status of certain
> > parts of the system are.
> >   -> What are often encountered known issues
> >   -> What are the most frequently requested features
> >
> >
> > 
> > Problems
> > 
> >
> > Looking into the process, there are two big issues:
> >
> > (1) Up to now, we have been relying on the fact that everything just
> > "organizes itself", driven by best effort. That assumes
> > that everyone feels equally responsible for every part, question, and
> > contribution. At the current state, this is impossible
> > to maintain, it overwhelms the committers and contributors.
> >
> > Example: Pull requests are picked up by whoever wants to pick them up.
> Pull
> > requests that are a lot of work, have little
> > chance of getting in, or relate to less active components are sometimes
> not
> > picked up. When contributors are pretty
> > loaded already, it may happen that no one eventually feels responsible to
> > pick up a pull request, and it falls through the cracks.
> >
> > (2) There is no good overview of what are known shortcomings, efforts,
> and
> > requested features for different parts of the system.
> > This information exists in various peoples' heads, but is not easily
> > accessible for new people. The Flink JIRA is not well
> > maintained, it is not easy to draw insights from that.
> >
> >
> > ===
> > The Proposal
> > ===
> >
> > Since we are building a parallel system, the natural solution seems to
> be:
> > partition the workload ;-)
> >
> > We propose to define a set of components for Flink. Each component is
> > maintained or tracked by one or more
> > people - let's call them maintainers. It is important to note that we
> don't
> > suggest the maintainers as an authoritative role, but
> > simply as committers or contributors that visibly step up for a certain
> > component, and mainly track and drive the efforts
> > pertaining to that component.
> >
> > It is also important to realize that we do not want to suggest that
> people
> > get less involved with certain parts and components, because
> > they are not the maintainers. We simply want to make sure that each pull
> > request or question or contribution has in the end
> > one person (or a small set of people) responsible for catching and
> tracking
> > it, if it was not worked on by the pro-active
> > community.
> >
> > For some components, having multiple maintainers will be helpful. In that
> > case, one maintainer should be the "chair" or "lead"
> > and make sure that no issue of that component gets lost between the
> > multiple maintainers.
> >
> >
> > A maintainers' role is:
> > -
> >
> >   - Have an overview of which of the open pull requests relate to their
> > component
> >   - Drive the pull requests relating to the component to resolution
> >   => Moderate the decision whether the feature should be merged
> >   => Make sure the pull request gets a shepherd.
> >In many cases, the maintainers would shepherd themselves.
> >   => In case the shepherd becomes inactive, the maintainers need to
> > find a new shepherd.
> >
> >   - Have an overview of what are the known issues of their component
> >   - Have an overview of what are the frequently requested features of
> their
> > component
> >
> >   - Have an overview of which contributors are doing v

Re: Intellij code style

2016-05-12 Thread Flavio Pompermaier

If you're interested to I created an Eclipse version that should follows
Flink coding rules..should I create a new JIRA for it?

On Thu, May 5, 2016 at 6:02 PM, Dawid Wysakowicz  wrote:

> I opened JIRA: https://issues.apache.org/jira/browse/FLINK-3870. and
> created PR both to flink and flink-web.
>
> https://github.com/apache/flink/pull/1963
> https://github.com/apache/flink-web/pull/20
>
> I would be thankful for a review.
>
> 2016-05-04 11:00 GMT+02:00 Fabian Hueske :
>
> > Yes, please open a JIRA. Thanks!
> >
> > 2016-05-04 10:16 GMT+02:00 Dawid Wysakowicz  >:
> >
> > > Sure, Will open PR shortly. Shall I create any JIRA issue?
> > >
> > > 2016-05-04 9:28 GMT+02:00 Fabian Hueske :
> > >
> > > > +1 for adding a template to the tools folder and linking it from the
> > > coding
> > > > guide lines!
> > > >
> > > > 2016-05-04 6:08 GMT+02:00 Henry Saputra :
> > > >
> > > > > We could actually put this in the tools directory of the source and
> > > repo
> > > > > and refer it from contribution guide.
> > > > >
> > > > > @Dawid want to try to send Pull request for it?
> > > > >
> > > > > On Thursday, April 28, 2016, Theodore Vasiloudis <
> > > > > theodoros.vasilou...@gmail.com> wrote:
> > > > >
> > > > > > Do we plan to include something like this in the contribution
> guide
> > > as
> > > > > > well?
> > > > > >
> > > > > > On Thu, Apr 28, 2016 at 3:16 PM, Stefano Baghino <
> > > > > > stefano.bagh...@radicalbit.io > wrote:
> > > > > >
> > > > > > > Awesome Dawid! Thanks for taking the time to do this. :)
> > > > > > >
> > > > > > > On Thu, Apr 28, 2016 at 1:45 PM, Dawid Wysakowicz <
> > > > > > > wysakowicz.da...@gmail.com > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I tried to create a code style that would follow Flink
> > > code-style.
> > > > It
> > > > > > may
> > > > > > > > be not "production" ready, but I think it can be a good
> start.
> > > > > > > > Hope it will be useful for someone. Also I will be glad for
> any
> > > > > > comments
> > > > > > > > on that.
> > > > > > > >
> > > > > > > > 2016-04-10 13:59 GMT+02:00 Stephan Ewen  > > > > > >:
> > > > > > > >
> > > > > > > >> I don't know how close Phoenix' code style is to Flink's
> > > de-facto
> > > > > code
> > > > > > > >> style.
> > > > > > > >> I would create one that reflects Flink's de-facto code
> style,
> > so
> > > > > that
> > > > > > > the
> > > > > > > >> formatter does not change everything...
> > > > > > > >>
> > > > > > > >> On Sun, Apr 10, 2016 at 4:40 AM, Naveen Madhire <
> > > > > > vmadh...@umail.iu.edu >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Apache Phoenix has one code template which contributors
> use.
> > > Do
> > > > > you
> > > > > > > >> think
> > > > > > > >> > onc can use the same for Flink or may be with some more
> > > > > > modifications?
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/phoenix/blob/master/dev/PhoenixCodeTemplate.xml
> > > > > > > >> >
> > > > > > > >> > On Sat, Apr 9, 2016 at 11:00 AM, Stephan Ewen <
> > > se...@apache.org
> > > > > > >
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Actually, It would be amazing to create a code style
> > profile
> > > > for
> > > > > > > >> > download,
> > > > > > > >> > > so that all contributors would use that.
> > > > > > > >> > >
> > > > > > > >> > > Same thing actually for IntelliJ inspections: A set of
> > > > > inspections
> > > > > > > we
> > > > > > > >> > want
> > > > > > > >> > > to have active and where we strive for zero warnings.
> > > > > > > >> > >
> > > > > > > >> > > On Sat, Apr 9, 2016 at 10:00 AM, Robert Metzger <
> > > > > > > rmetz...@apache.org >
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi Dawid,
> > > > > > > >> > > >
> > > > > > > >> > > > we don't have an automated formatter for intelliJ.
> > > However,
> > > > > you
> > > > > > > can
> > > > > > > >> use
> > > > > > > >> > > the
> > > > > > > >> > > > "Checkstyle" plugin of IntelliJ to mark checkstyle
> > > > violations
> > > > > in
> > > > > > > the
> > > > > > > >> > IDE.
> > > > > > > >> > > >
> > > > > > > >> > > > On Fri, Apr 8, 2016 at 12:30 PM, Dawid Wysakowicz <
> > > > > > > >> > > > wysakowicz.da...@gmail.com > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Hi all,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I am currently working on some issues and been
> > wondering
> > > > if
> > > > > > you
> > > > > > > >> have
> > > > > > > >> > > > > settings for Intellij code style that would follow
> > your
> > > > > coding
> > > > > > > >> > > guidelines
> > > > > > > >> > > > > available (I tried to look on wikis but could not
> find
> > > > it).
> > > > > If
> > > > > > > not
> > > > > > > >> > > could
> > > > > > > >> > > > > someone share its own? I would be grateful.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Regards
> > > > > > > >> > > > > Dawid Wysakow

Re: Intellij code style

2016-05-12 Thread Stephan Ewen

Yes, please open a pull request for that.

On Thu, May 12, 2016 at 11:40 AM, Flavio Pompermaier 
wrote:

> If you're interested to I created an Eclipse version that should follows
> Flink coding rules..should I create a new JIRA for it?
>
> On Thu, May 5, 2016 at 6:02 PM, Dawid Wysakowicz <
> wysakowicz.da...@gmail.com
> > wrote:
>
> > I opened JIRA: https://issues.apache.org/jira/browse/FLINK-3870. and
> > created PR both to flink and flink-web.
> >
> > https://github.com/apache/flink/pull/1963
> > https://github.com/apache/flink-web/pull/20
> >
> > I would be thankful for a review.
> >
> > 2016-05-04 11:00 GMT+02:00 Fabian Hueske :
> >
> > > Yes, please open a JIRA. Thanks!
> > >
> > > 2016-05-04 10:16 GMT+02:00 Dawid Wysakowicz <
> wysakowicz.da...@gmail.com
> > >:
> > >
> > > > Sure, Will open PR shortly. Shall I create any JIRA issue?
> > > >
> > > > 2016-05-04 9:28 GMT+02:00 Fabian Hueske :
> > > >
> > > > > +1 for adding a template to the tools folder and linking it from
> the
> > > > coding
> > > > > guide lines!
> > > > >
> > > > > 2016-05-04 6:08 GMT+02:00 Henry Saputra :
> > > > >
> > > > > > We could actually put this in the tools directory of the source
> and
> > > > repo
> > > > > > and refer it from contribution guide.
> > > > > >
> > > > > > @Dawid want to try to send Pull request for it?
> > > > > >
> > > > > > On Thursday, April 28, 2016, Theodore Vasiloudis <
> > > > > > theodoros.vasilou...@gmail.com> wrote:
> > > > > >
> > > > > > > Do we plan to include something like this in the contribution
> > guide
> > > > as
> > > > > > > well?
> > > > > > >
> > > > > > > On Thu, Apr 28, 2016 at 3:16 PM, Stefano Baghino <
> > > > > > > stefano.bagh...@radicalbit.io > wrote:
> > > > > > >
> > > > > > > > Awesome Dawid! Thanks for taking the time to do this. :)
> > > > > > > >
> > > > > > > > On Thu, Apr 28, 2016 at 1:45 PM, Dawid Wysakowicz <
> > > > > > > > wysakowicz.da...@gmail.com > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I tried to create a code style that would follow Flink
> > > > code-style.
> > > > > It
> > > > > > > may
> > > > > > > > > be not "production" ready, but I think it can be a good
> > start.
> > > > > > > > > Hope it will be useful for someone. Also I will be glad for
> > any
> > > > > > > comments
> > > > > > > > > on that.
> > > > > > > > >
> > > > > > > > > 2016-04-10 13:59 GMT+02:00 Stephan Ewen  > > > > > > >:
> > > > > > > > >
> > > > > > > > >> I don't know how close Phoenix' code style is to Flink's
> > > > de-facto
> > > > > > code
> > > > > > > > >> style.
> > > > > > > > >> I would create one that reflects Flink's de-facto code
> > style,
> > > so
> > > > > > that
> > > > > > > > the
> > > > > > > > >> formatter does not change everything...
> > > > > > > > >>
> > > > > > > > >> On Sun, Apr 10, 2016 at 4:40 AM, Naveen Madhire <
> > > > > > > vmadh...@umail.iu.edu >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Apache Phoenix has one code template which contributors
> > use.
> > > > Do
> > > > > > you
> > > > > > > > >> think
> > > > > > > > >> > onc can use the same for Flink or may be with some more
> > > > > > > modifications?
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/phoenix/blob/master/dev/PhoenixCodeTemplate.xml
> > > > > > > > >> >
> > > > > > > > >> > On Sat, Apr 9, 2016 at 11:00 AM, Stephan Ewen <
> > > > se...@apache.org
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Actually, It would be amazing to create a code style
> > > profile
> > > > > for
> > > > > > > > >> > download,
> > > > > > > > >> > > so that all contributors would use that.
> > > > > > > > >> > >
> > > > > > > > >> > > Same thing actually for IntelliJ inspections: A set of
> > > > > > inspections
> > > > > > > > we
> > > > > > > > >> > want
> > > > > > > > >> > > to have active and where we strive for zero warnings.
> > > > > > > > >> > >
> > > > > > > > >> > > On Sat, Apr 9, 2016 at 10:00 AM, Robert Metzger <
> > > > > > > > rmetz...@apache.org >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi Dawid,
> > > > > > > > >> > > >
> > > > > > > > >> > > > we don't have an automated formatter for intelliJ.
> > > > However,
> > > > > > you
> > > > > > > > can
> > > > > > > > >> use
> > > > > > > > >> > > the
> > > > > > > > >> > > > "Checkstyle" plugin of IntelliJ to mark checkstyle
> > > > > violations
> > > > > > in
> > > > > > > > the
> > > > > > > > >> > IDE.
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Fri, Apr 8, 2016 at 12:30 PM, Dawid Wysakowicz <
> > > > > > > > >> > > > wysakowicz.da...@gmail.com > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Hi all,
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I am currently working on some issues and been
> > > wondering
> > > > > if
> > > > > > > you
> > > >

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Stephan Ewen

Yes, Gabor Gevay, that did refer to you!

Sorry for the ambiguity...

On Thu, May 12, 2016 at 10:46 AM, Márton Balassi 
wrote:

> +1 for the proposal
> @ggevay: I do think that it refers to you. :)
>
> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  wrote:
>
> > Hello,
> >
> > There are at least three Gábors in the Flink community,  :) so
> > assuming that the Gábor in the list of maintainers of the DataSet API
> > is referring to me, I'll be happy to do it. :)
> >
> > Best,
> > Gábor G.
> >
> >
> >
> > 2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > > Hi everyone!
> > >
> > > We propose to establish some lightweight structures in the Flink open
> > > source community and development process,
> > > to help us better handle the increased interest in Flink (mailing list
> > and
> > > pull requests), while not overwhelming the
> > > committers, and giving users and contributors a good experience.
> > >
> > > This proposal is triggered by the observation that we are reaching the
> > > limits of where the current community can support
> > > users and guide new contributors. The below proposal is based on
> > > observations and ideas from Till, Robert, and me.
> > >
> > > 
> > > Goals
> > > 
> > >
> > > We try to achieve the following
> > >
> > >   - Pull requests get handled in a timely fashion
> > >   - New contributors are better integrated into the community
> > >   - The community feels empowered on the mailing list.
> > > But questions that need the attention of someone that has deep
> > > knowledge of a certain part of Flink get their attention.
> > >   - At the same time, the committers that are knowledgeable about many
> > core
> > > parts do not get completely overwhelmed.
> > >   - We don't overlook threads that report critical issues.
> > >   - We always have a pretty good overview of what the status of certain
> > > parts of the system are.
> > >   -> What are often encountered known issues
> > >   -> What are the most frequently requested features
> > >
> > >
> > > 
> > > Problems
> > > 
> > >
> > > Looking into the process, there are two big issues:
> > >
> > > (1) Up to now, we have been relying on the fact that everything just
> > > "organizes itself", driven by best effort. That assumes
> > > that everyone feels equally responsible for every part, question, and
> > > contribution. At the current state, this is impossible
> > > to maintain, it overwhelms the committers and contributors.
> > >
> > > Example: Pull requests are picked up by whoever wants to pick them up.
> > Pull
> > > requests that are a lot of work, have little
> > > chance of getting in, or relate to less active components are sometimes
> > not
> > > picked up. When contributors are pretty
> > > loaded already, it may happen that no one eventually feels responsible
> to
> > > pick up a pull request, and it falls through the cracks.
> > >
> > > (2) There is no good overview of what are known shortcomings, efforts,
> > and
> > > requested features for different parts of the system.
> > > This information exists in various peoples' heads, but is not easily
> > > accessible for new people. The Flink JIRA is not well
> > > maintained, it is not easy to draw insights from that.
> > >
> > >
> > > ===
> > > The Proposal
> > > ===
> > >
> > > Since we are building a parallel system, the natural solution seems to
> > be:
> > > partition the workload ;-)
> > >
> > > We propose to define a set of components for Flink. Each component is
> > > maintained or tracked by one or more
> > > people - let's call them maintainers. It is important to note that we
> > don't
> > > suggest the maintainers as an authoritative role, but
> > > simply as committers or contributors that visibly step up for a certain
> > > component, and mainly track and drive the efforts
> > > pertaining to that component.
> > >
> > > It is also important to realize that we do not want to suggest that
> > people
> > > get less involved with certain parts and components, because
> > > they are not the maintainers. We simply want to make sure that each
> pull
> > > request or question or contribution has in the end
> > > one person (or a small set of people) responsible for catching and
> > tracking
> > > it, if it was not worked on by the pro-active
> > > community.
> > >
> > > For some components, having multiple maintainers will be helpful. In
> that
> > > case, one maintainer should be the "chair" or "lead"
> > > and make sure that no issue of that component gets lost between the
> > > multiple maintainers.
> > >
> > >
> > > A maintainers' role is:
> > > -
> > >
> > >   - Have an overview of which of the open pull requests relate to their
> > > component
> > >   - Drive the pull requests relating to the component to resolution
> > >   => Moderate the decision whether the feature should be merged
> > >   => Make sure the pull request gets a shepherd.
> > >In many cases, t

Re: Intellij code style

2016-05-12 Thread Flavio Pompermaier

Do I need to open also a Jira or just the PR?

On Thu, May 12, 2016 at 12:03 PM, Stephan Ewen  wrote:

> Yes, please open a pull request for that.
>
> On Thu, May 12, 2016 at 11:40 AM, Flavio Pompermaier  >
> wrote:
>
> > If you're interested to I created an Eclipse version that should follows
> > Flink coding rules..should I create a new JIRA for it?
> >
> > On Thu, May 5, 2016 at 6:02 PM, Dawid Wysakowicz <
> > wysakowicz.da...@gmail.com
> > > wrote:
> >
> > > I opened JIRA: https://issues.apache.org/jira/browse/FLINK-3870. and
> > > created PR both to flink and flink-web.
> > >
> > > https://github.com/apache/flink/pull/1963
> > > https://github.com/apache/flink-web/pull/20
> > >
> > > I would be thankful for a review.
> > >
> > > 2016-05-04 11:00 GMT+02:00 Fabian Hueske :
> > >
> > > > Yes, please open a JIRA. Thanks!
> > > >
> > > > 2016-05-04 10:16 GMT+02:00 Dawid Wysakowicz <
> > wysakowicz.da...@gmail.com
> > > >:
> > > >
> > > > > Sure, Will open PR shortly. Shall I create any JIRA issue?
> > > > >
> > > > > 2016-05-04 9:28 GMT+02:00 Fabian Hueske :
> > > > >
> > > > > > +1 for adding a template to the tools folder and linking it from
> > the
> > > > > coding
> > > > > > guide lines!
> > > > > >
> > > > > > 2016-05-04 6:08 GMT+02:00 Henry Saputra  >:
> > > > > >
> > > > > > > We could actually put this in the tools directory of the source
> > and
> > > > > repo
> > > > > > > and refer it from contribution guide.
> > > > > > >
> > > > > > > @Dawid want to try to send Pull request for it?
> > > > > > >
> > > > > > > On Thursday, April 28, 2016, Theodore Vasiloudis <
> > > > > > > theodoros.vasilou...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Do we plan to include something like this in the contribution
> > > guide
> > > > > as
> > > > > > > > well?
> > > > > > > >
> > > > > > > > On Thu, Apr 28, 2016 at 3:16 PM, Stefano Baghino <
> > > > > > > > stefano.bagh...@radicalbit.io > wrote:
> > > > > > > >
> > > > > > > > > Awesome Dawid! Thanks for taking the time to do this. :)
> > > > > > > > >
> > > > > > > > > On Thu, Apr 28, 2016 at 1:45 PM, Dawid Wysakowicz <
> > > > > > > > > wysakowicz.da...@gmail.com > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I tried to create a code style that would follow Flink
> > > > > code-style.
> > > > > > It
> > > > > > > > may
> > > > > > > > > > be not "production" ready, but I think it can be a good
> > > start.
> > > > > > > > > > Hope it will be useful for someone. Also I will be glad
> for
> > > any
> > > > > > > > comments
> > > > > > > > > > on that.
> > > > > > > > > >
> > > > > > > > > > 2016-04-10 13:59 GMT+02:00 Stephan Ewen <
> se...@apache.org
> > > > > > > > >:
> > > > > > > > > >
> > > > > > > > > >> I don't know how close Phoenix' code style is to Flink's
> > > > > de-facto
> > > > > > > code
> > > > > > > > > >> style.
> > > > > > > > > >> I would create one that reflects Flink's de-facto code
> > > style,
> > > > so
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > >> formatter does not change everything...
> > > > > > > > > >>
> > > > > > > > > >> On Sun, Apr 10, 2016 at 4:40 AM, Naveen Madhire <
> > > > > > > > vmadh...@umail.iu.edu >
> > > > > > > > > >> wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Apache Phoenix has one code template which
> contributors
> > > use.
> > > > > Do
> > > > > > > you
> > > > > > > > > >> think
> > > > > > > > > >> > onc can use the same for Flink or may be with some
> more
> > > > > > > > modifications?
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/phoenix/blob/master/dev/PhoenixCodeTemplate.xml
> > > > > > > > > >> >
> > > > > > > > > >> > On Sat, Apr 9, 2016 at 11:00 AM, Stephan Ewen <
> > > > > se...@apache.org
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Actually, It would be amazing to create a code style
> > > > profile
> > > > > > for
> > > > > > > > > >> > download,
> > > > > > > > > >> > > so that all contributors would use that.
> > > > > > > > > >> > >
> > > > > > > > > >> > > Same thing actually for IntelliJ inspections: A set
> of
> > > > > > > inspections
> > > > > > > > > we
> > > > > > > > > >> > want
> > > > > > > > > >> > > to have active and where we strive for zero
> warnings.
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Sat, Apr 9, 2016 at 10:00 AM, Robert Metzger <
> > > > > > > > > rmetz...@apache.org >
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi Dawid,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > we don't have an automated formatter for intelliJ.
> > > > > However,
> > > > > > > you
> > > > > > > > > can
> > > > > > > > > >> use
> > > > > > > > > >> > > the
> > > > > > > > > >> > > > "Checkstyle" plugin of IntelliJ to mark checkstyle
> > > > > > violations
> > > > > >

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Till Rohrmann

+1 for the proposal
On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:

> Yes, Gabor Gevay, that did refer to you!
>
> Sorry for the ambiguity...
>
> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi  >
> wrote:
>
> > +1 for the proposal
> > @ggevay: I do think that it refers to you. :)
> >
> > On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  wrote:
> >
> > > Hello,
> > >
> > > There are at least three Gábors in the Flink community,  :) so
> > > assuming that the Gábor in the list of maintainers of the DataSet API
> > > is referring to me, I'll be happy to do it. :)
> > >
> > > Best,
> > > Gábor G.
> > >
> > >
> > >
> > > 2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > > > Hi everyone!
> > > >
> > > > We propose to establish some lightweight structures in the Flink open
> > > > source community and development process,
> > > > to help us better handle the increased interest in Flink (mailing
> list
> > > and
> > > > pull requests), while not overwhelming the
> > > > committers, and giving users and contributors a good experience.
> > > >
> > > > This proposal is triggered by the observation that we are reaching
> the
> > > > limits of where the current community can support
> > > > users and guide new contributors. The below proposal is based on
> > > > observations and ideas from Till, Robert, and me.
> > > >
> > > > 
> > > > Goals
> > > > 
> > > >
> > > > We try to achieve the following
> > > >
> > > >   - Pull requests get handled in a timely fashion
> > > >   - New contributors are better integrated into the community
> > > >   - The community feels empowered on the mailing list.
> > > > But questions that need the attention of someone that has deep
> > > > knowledge of a certain part of Flink get their attention.
> > > >   - At the same time, the committers that are knowledgeable about
> many
> > > core
> > > > parts do not get completely overwhelmed.
> > > >   - We don't overlook threads that report critical issues.
> > > >   - We always have a pretty good overview of what the status of
> certain
> > > > parts of the system are.
> > > >   -> What are often encountered known issues
> > > >   -> What are the most frequently requested features
> > > >
> > > >
> > > > 
> > > > Problems
> > > > 
> > > >
> > > > Looking into the process, there are two big issues:
> > > >
> > > > (1) Up to now, we have been relying on the fact that everything just
> > > > "organizes itself", driven by best effort. That assumes
> > > > that everyone feels equally responsible for every part, question, and
> > > > contribution. At the current state, this is impossible
> > > > to maintain, it overwhelms the committers and contributors.
> > > >
> > > > Example: Pull requests are picked up by whoever wants to pick them
> up.
> > > Pull
> > > > requests that are a lot of work, have little
> > > > chance of getting in, or relate to less active components are
> sometimes
> > > not
> > > > picked up. When contributors are pretty
> > > > loaded already, it may happen that no one eventually feels
> responsible
> > to
> > > > pick up a pull request, and it falls through the cracks.
> > > >
> > > > (2) There is no good overview of what are known shortcomings,
> efforts,
> > > and
> > > > requested features for different parts of the system.
> > > > This information exists in various peoples' heads, but is not easily
> > > > accessible for new people. The Flink JIRA is not well
> > > > maintained, it is not easy to draw insights from that.
> > > >
> > > >
> > > > ===
> > > > The Proposal
> > > > ===
> > > >
> > > > Since we are building a parallel system, the natural solution seems
> to
> > > be:
> > > > partition the workload ;-)
> > > >
> > > > We propose to define a set of components for Flink. Each component is
> > > > maintained or tracked by one or more
> > > > people - let's call them maintainers. It is important to note that we
> > > don't
> > > > suggest the maintainers as an authoritative role, but
> > > > simply as committers or contributors that visibly step up for a
> certain
> > > > component, and mainly track and drive the efforts
> > > > pertaining to that component.
> > > >
> > > > It is also important to realize that we do not want to suggest that
> > > people
> > > > get less involved with certain parts and components, because
> > > > they are not the maintainers. We simply want to make sure that each
> > pull
> > > > request or question or contribution has in the end
> > > > one person (or a small set of people) responsible for catching and
> > > tracking
> > > > it, if it was not worked on by the pro-active
> > > > community.
> > > >
> > > > For some components, having multiple maintainers will be helpful. In
> > that
> > > > case, one maintainer should be the "chair" or "lead"
> > > > and make sure that no issue of that component gets lost between the
> > > > multiple maintainers.
> > > >
> > > >
> > > > A maintainers' role is:
> > > > -

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Matthias J. Sax

+1 from my side.

Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
it's me, even the correct spelling would be with two 't' :P)

-Matthias

On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> +1 for the proposal
> On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> 
>> Yes, Gabor Gevay, that did refer to you!
>>
>> Sorry for the ambiguity...
>>
>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi >>
>> wrote:
>>
>>> +1 for the proposal
>>> @ggevay: I do think that it refers to you. :)
>>>
>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  wrote:
>>>
 Hello,

 There are at least three Gábors in the Flink community,  :) so
 assuming that the Gábor in the list of maintainers of the DataSet API
 is referring to me, I'll be happy to do it. :)

 Best,
 Gábor G.



 2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing
>> list
 and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching
>> the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> 
> Goals
> 
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
> But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about
>> many
 core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of
>> certain
> parts of the system are.
>   -> What are often encountered known issues
>   -> What are the most frequently requested features
>
>
> 
> Problems
> 
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them
>> up.
 Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are
>> sometimes
 not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels
>> responsible
>>> to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings,
>> efforts,
 and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===
> The Proposal
> ===
>
> Since we are building a parallel system, the natural solution seems
>> to
 be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we
 don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a
>> certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that
 people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each
>>> pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and
 tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In
>>> that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between the
> multiple maintainers.
>
>
> A maintainers' role is:
> --

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Kostas Tzoumas

Big +1 from my side, I think this will help the community grow and prosper
big time!

On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax  wrote:

> +1 from my side.
>
> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> it's me, even the correct spelling would be with two 't' :P)
>
> -Matthias
>
> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > +1 for the proposal
> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> >
> >> Yes, Gabor Gevay, that did refer to you!
> >>
> >> Sorry for the ambiguity...
> >>
> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> balassi.mar...@gmail.com
> >>>
> >> wrote:
> >>
> >>> +1 for the proposal
> >>> @ggevay: I do think that it refers to you. :)
> >>>
> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
> wrote:
> >>>
>  Hello,
> 
>  There are at least three Gábors in the Flink community,  :) so
>  assuming that the Gábor in the list of maintainers of the DataSet API
>  is referring to me, I'll be happy to do it. :)
> 
>  Best,
>  Gábor G.
> 
> 
> 
>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > Hi everyone!
> >
> > We propose to establish some lightweight structures in the Flink open
> > source community and development process,
> > to help us better handle the increased interest in Flink (mailing
> >> list
>  and
> > pull requests), while not overwhelming the
> > committers, and giving users and contributors a good experience.
> >
> > This proposal is triggered by the observation that we are reaching
> >> the
> > limits of where the current community can support
> > users and guide new contributors. The below proposal is based on
> > observations and ideas from Till, Robert, and me.
> >
> > 
> > Goals
> > 
> >
> > We try to achieve the following
> >
> >   - Pull requests get handled in a timely fashion
> >   - New contributors are better integrated into the community
> >   - The community feels empowered on the mailing list.
> > But questions that need the attention of someone that has deep
> > knowledge of a certain part of Flink get their attention.
> >   - At the same time, the committers that are knowledgeable about
> >> many
>  core
> > parts do not get completely overwhelmed.
> >   - We don't overlook threads that report critical issues.
> >   - We always have a pretty good overview of what the status of
> >> certain
> > parts of the system are.
> >   -> What are often encountered known issues
> >   -> What are the most frequently requested features
> >
> >
> > 
> > Problems
> > 
> >
> > Looking into the process, there are two big issues:
> >
> > (1) Up to now, we have been relying on the fact that everything just
> > "organizes itself", driven by best effort. That assumes
> > that everyone feels equally responsible for every part, question, and
> > contribution. At the current state, this is impossible
> > to maintain, it overwhelms the committers and contributors.
> >
> > Example: Pull requests are picked up by whoever wants to pick them
> >> up.
>  Pull
> > requests that are a lot of work, have little
> > chance of getting in, or relate to less active components are
> >> sometimes
>  not
> > picked up. When contributors are pretty
> > loaded already, it may happen that no one eventually feels
> >> responsible
> >>> to
> > pick up a pull request, and it falls through the cracks.
> >
> > (2) There is no good overview of what are known shortcomings,
> >> efforts,
>  and
> > requested features for different parts of the system.
> > This information exists in various peoples' heads, but is not easily
> > accessible for new people. The Flink JIRA is not well
> > maintained, it is not easy to draw insights from that.
> >
> >
> > ===
> > The Proposal
> > ===
> >
> > Since we are building a parallel system, the natural solution seems
> >> to
>  be:
> > partition the workload ;-)
> >
> > We propose to define a set of components for Flink. Each component is
> > maintained or tracked by one or more
> > people - let's call them maintainers. It is important to note that we
>  don't
> > suggest the maintainers as an authoritative role, but
> > simply as committers or contributors that visibly step up for a
> >> certain
> > component, and mainly track and drive the efforts
> > pertaining to that component.
> >
> > It is also important to realize that we do not want to suggest that
>  people
> > get less involved with certain parts and components, because
> > they are not the maintainers. We simply want to make sure that each
> >>> pull
> > request or question or contribution has in the end
> > one perso

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Stephan Ewen

Yes, Matthias, that was supposed to be you.
Sorry from another guy who frequently has his name misspelled ;-)

On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax  wrote:

> +1 from my side.
>
> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> it's me, even the correct spelling would be with two 't' :P)
>
> -Matthias
>
> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > +1 for the proposal
> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> >
> >> Yes, Gabor Gevay, that did refer to you!
> >>
> >> Sorry for the ambiguity...
> >>
> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> balassi.mar...@gmail.com
> >>>
> >> wrote:
> >>
> >>> +1 for the proposal
> >>> @ggevay: I do think that it refers to you. :)
> >>>
> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
> wrote:
> >>>
>  Hello,
> 
>  There are at least three Gábors in the Flink community,  :) so
>  assuming that the Gábor in the list of maintainers of the DataSet API
>  is referring to me, I'll be happy to do it. :)
> 
>  Best,
>  Gábor G.
> 
> 
> 
>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > Hi everyone!
> >
> > We propose to establish some lightweight structures in the Flink open
> > source community and development process,
> > to help us better handle the increased interest in Flink (mailing
> >> list
>  and
> > pull requests), while not overwhelming the
> > committers, and giving users and contributors a good experience.
> >
> > This proposal is triggered by the observation that we are reaching
> >> the
> > limits of where the current community can support
> > users and guide new contributors. The below proposal is based on
> > observations and ideas from Till, Robert, and me.
> >
> > 
> > Goals
> > 
> >
> > We try to achieve the following
> >
> >   - Pull requests get handled in a timely fashion
> >   - New contributors are better integrated into the community
> >   - The community feels empowered on the mailing list.
> > But questions that need the attention of someone that has deep
> > knowledge of a certain part of Flink get their attention.
> >   - At the same time, the committers that are knowledgeable about
> >> many
>  core
> > parts do not get completely overwhelmed.
> >   - We don't overlook threads that report critical issues.
> >   - We always have a pretty good overview of what the status of
> >> certain
> > parts of the system are.
> >   -> What are often encountered known issues
> >   -> What are the most frequently requested features
> >
> >
> > 
> > Problems
> > 
> >
> > Looking into the process, there are two big issues:
> >
> > (1) Up to now, we have been relying on the fact that everything just
> > "organizes itself", driven by best effort. That assumes
> > that everyone feels equally responsible for every part, question, and
> > contribution. At the current state, this is impossible
> > to maintain, it overwhelms the committers and contributors.
> >
> > Example: Pull requests are picked up by whoever wants to pick them
> >> up.
>  Pull
> > requests that are a lot of work, have little
> > chance of getting in, or relate to less active components are
> >> sometimes
>  not
> > picked up. When contributors are pretty
> > loaded already, it may happen that no one eventually feels
> >> responsible
> >>> to
> > pick up a pull request, and it falls through the cracks.
> >
> > (2) There is no good overview of what are known shortcomings,
> >> efforts,
>  and
> > requested features for different parts of the system.
> > This information exists in various peoples' heads, but is not easily
> > accessible for new people. The Flink JIRA is not well
> > maintained, it is not easy to draw insights from that.
> >
> >
> > ===
> > The Proposal
> > ===
> >
> > Since we are building a parallel system, the natural solution seems
> >> to
>  be:
> > partition the workload ;-)
> >
> > We propose to define a set of components for Flink. Each component is
> > maintained or tracked by one or more
> > people - let's call them maintainers. It is important to note that we
>  don't
> > suggest the maintainers as an authoritative role, but
> > simply as committers or contributors that visibly step up for a
> >> certain
> > component, and mainly track and drive the efforts
> > pertaining to that component.
> >
> > It is also important to realize that we do not want to suggest that
>  people
> > get less involved with certain parts and components, because
> > they are not the maintainers. We simply want to make sure that each
> >>> pull
> > request or question or contribution has in

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Ufuk Celebi

Hey Stephan!

Thanks to you and the others who started this. I really like the
proposal and I'm happy to see my name on some components.

So, +1.

I'd say let's wait until the end of the week/beginning of next week to
see if there is any disagreement with the propsal in the community
(doesn't look like it so far ;-)). Then we can continue to execute
this. :-)

– Ufuk


On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen  wrote:
> Yes, Matthias, that was supposed to be you.
> Sorry from another guy who frequently has his name misspelled ;-)
>
> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax  wrote:
>
>> +1 from my side.
>>
>> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
>> it's me, even the correct spelling would be with two 't' :P)
>>
>> -Matthias
>>
>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>> > +1 for the proposal
>> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
>> >
>> >> Yes, Gabor Gevay, that did refer to you!
>> >>
>> >> Sorry for the ambiguity...
>> >>
>> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>> balassi.mar...@gmail.com
>> >>>
>> >> wrote:
>> >>
>> >>> +1 for the proposal
>> >>> @ggevay: I do think that it refers to you. :)
>> >>>
>> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
>> wrote:
>> >>>
>>  Hello,
>> 
>>  There are at least three Gábors in the Flink community,  :) so
>>  assuming that the Gábor in the list of maintainers of the DataSet API
>>  is referring to me, I'll be happy to do it. :)
>> 
>>  Best,
>>  Gábor G.
>> 
>> 
>> 
>>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
>> > Hi everyone!
>> >
>> > We propose to establish some lightweight structures in the Flink open
>> > source community and development process,
>> > to help us better handle the increased interest in Flink (mailing
>> >> list
>>  and
>> > pull requests), while not overwhelming the
>> > committers, and giving users and contributors a good experience.
>> >
>> > This proposal is triggered by the observation that we are reaching
>> >> the
>> > limits of where the current community can support
>> > users and guide new contributors. The below proposal is based on
>> > observations and ideas from Till, Robert, and me.
>> >
>> > 
>> > Goals
>> > 
>> >
>> > We try to achieve the following
>> >
>> >   - Pull requests get handled in a timely fashion
>> >   - New contributors are better integrated into the community
>> >   - The community feels empowered on the mailing list.
>> > But questions that need the attention of someone that has deep
>> > knowledge of a certain part of Flink get their attention.
>> >   - At the same time, the committers that are knowledgeable about
>> >> many
>>  core
>> > parts do not get completely overwhelmed.
>> >   - We don't overlook threads that report critical issues.
>> >   - We always have a pretty good overview of what the status of
>> >> certain
>> > parts of the system are.
>> >   -> What are often encountered known issues
>> >   -> What are the most frequently requested features
>> >
>> >
>> > 
>> > Problems
>> > 
>> >
>> > Looking into the process, there are two big issues:
>> >
>> > (1) Up to now, we have been relying on the fact that everything just
>> > "organizes itself", driven by best effort. That assumes
>> > that everyone feels equally responsible for every part, question, and
>> > contribution. At the current state, this is impossible
>> > to maintain, it overwhelms the committers and contributors.
>> >
>> > Example: Pull requests are picked up by whoever wants to pick them
>> >> up.
>>  Pull
>> > requests that are a lot of work, have little
>> > chance of getting in, or relate to less active components are
>> >> sometimes
>>  not
>> > picked up. When contributors are pretty
>> > loaded already, it may happen that no one eventually feels
>> >> responsible
>> >>> to
>> > pick up a pull request, and it falls through the cracks.
>> >
>> > (2) There is no good overview of what are known shortcomings,
>> >> efforts,
>>  and
>> > requested features for different parts of the system.
>> > This information exists in various peoples' heads, but is not easily
>> > accessible for new people. The Flink JIRA is not well
>> > maintained, it is not easy to draw insights from that.
>> >
>> >
>> > ===
>> > The Proposal
>> > ===
>> >
>> > Since we are building a parallel system, the natural solution seems
>> >> to
>>  be:
>> > partition the workload ;-)
>> >
>> > We propose to define a set of components for Flink. Each component is
>> > maintained or tracked by one or more
>> > people - let's call them maintainers. It is important to note that we
>> >

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Maximilian Michels

+1 for the initiative. With a better process we will improve the
quality of the Flink development and give us more time to focus.

Could we have another category "Infrastructure"? This would concern
things like CI, nightly deployment of snapshots/documentation, ASF
Infra communication. Robert and me could be the initial maintainers
for that.

On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen  wrote:
> Yes, Matthias, that was supposed to be you.
> Sorry from another guy who frequently has his name misspelled ;-)
>
> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax  wrote:
>
>> +1 from my side.
>>
>> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
>> it's me, even the correct spelling would be with two 't' :P)
>>
>> -Matthias
>>
>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>> > +1 for the proposal
>> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
>> >
>> >> Yes, Gabor Gevay, that did refer to you!
>> >>
>> >> Sorry for the ambiguity...
>> >>
>> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>> balassi.mar...@gmail.com
>> >>>
>> >> wrote:
>> >>
>> >>> +1 for the proposal
>> >>> @ggevay: I do think that it refers to you. :)
>> >>>
>> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
>> wrote:
>> >>>
>>  Hello,
>> 
>>  There are at least three Gábors in the Flink community,  :) so
>>  assuming that the Gábor in the list of maintainers of the DataSet API
>>  is referring to me, I'll be happy to do it. :)
>> 
>>  Best,
>>  Gábor G.
>> 
>> 
>> 
>>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
>> > Hi everyone!
>> >
>> > We propose to establish some lightweight structures in the Flink open
>> > source community and development process,
>> > to help us better handle the increased interest in Flink (mailing
>> >> list
>>  and
>> > pull requests), while not overwhelming the
>> > committers, and giving users and contributors a good experience.
>> >
>> > This proposal is triggered by the observation that we are reaching
>> >> the
>> > limits of where the current community can support
>> > users and guide new contributors. The below proposal is based on
>> > observations and ideas from Till, Robert, and me.
>> >
>> > 
>> > Goals
>> > 
>> >
>> > We try to achieve the following
>> >
>> >   - Pull requests get handled in a timely fashion
>> >   - New contributors are better integrated into the community
>> >   - The community feels empowered on the mailing list.
>> > But questions that need the attention of someone that has deep
>> > knowledge of a certain part of Flink get their attention.
>> >   - At the same time, the committers that are knowledgeable about
>> >> many
>>  core
>> > parts do not get completely overwhelmed.
>> >   - We don't overlook threads that report critical issues.
>> >   - We always have a pretty good overview of what the status of
>> >> certain
>> > parts of the system are.
>> >   -> What are often encountered known issues
>> >   -> What are the most frequently requested features
>> >
>> >
>> > 
>> > Problems
>> > 
>> >
>> > Looking into the process, there are two big issues:
>> >
>> > (1) Up to now, we have been relying on the fact that everything just
>> > "organizes itself", driven by best effort. That assumes
>> > that everyone feels equally responsible for every part, question, and
>> > contribution. At the current state, this is impossible
>> > to maintain, it overwhelms the committers and contributors.
>> >
>> > Example: Pull requests are picked up by whoever wants to pick them
>> >> up.
>>  Pull
>> > requests that are a lot of work, have little
>> > chance of getting in, or relate to less active components are
>> >> sometimes
>>  not
>> > picked up. When contributors are pretty
>> > loaded already, it may happen that no one eventually feels
>> >> responsible
>> >>> to
>> > pick up a pull request, and it falls through the cracks.
>> >
>> > (2) There is no good overview of what are known shortcomings,
>> >> efforts,
>>  and
>> > requested features for different parts of the system.
>> > This information exists in various peoples' heads, but is not easily
>> > accessible for new people. The Flink JIRA is not well
>> > maintained, it is not easy to draw insights from that.
>> >
>> >
>> > ===
>> > The Proposal
>> > ===
>> >
>> > Since we are building a parallel system, the natural solution seems
>> >> to
>>  be:
>> > partition the workload ;-)
>> >
>> > We propose to define a set of components for Flink. Each component is
>> > maintained or tracked by one or more
>> > people - let's call them maintainers. It is important to note that we
>>  don't
>> > suggest

Dataset split/demultiplex

2016-05-12 Thread CPC

Hi folks,

Is there any way in dataset api to split Dataset[A] to Dataset[A] and
Dataset[B] ? Use case belongs to a custom filter component that we want to
implement. We will want to direct input elements whose result is false
after we apply the predicate. Actually we want to direct input elements
that throw exception to another output as well(demultiplexer like
component).

Thank you in advance...

[jira] [Created] (FLINK-3899) Document window processing with Reduce/FoldFunction + WindowFunction

2016-05-12 Thread Fabian Hueske (JIRA)

Fabian Hueske created FLINK-3899:


 Summary: Document window processing with Reduce/FoldFunction + 
WindowFunction
 Key: FLINK-3899
 URL: https://issues.apache.org/jira/browse/FLINK-3899
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Streaming
Affects Versions: 1.1.0
Reporter: Fabian Hueske


The streaming documentation does not describe how windows can be processed with 
FoldFunction or ReduceFunction and a subsequent WindowFunction. This 
combination allows for eager window aggregation (only a single element is kept 
in the window) and access of the Window object, e.g., to have access to the 
window's start and end time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Dataset split/demultiplex

2016-05-12 Thread Gábor Gévay

Hello,

You can split a DataSet into two DataSets with two filters:

val xs: DataSet[A] = ...
val split1: DataSet[A] = xs.filter(f1)
val split2: DataSet[A] = xs.filter(f2)

where f1 and f2 are true for those elements that should go into the
first and second DataSets respectively. So far, the splits will just
contain elements from the input DataSet, but you can of course apply
some map after one of the filters.

Does this help?

Best,
Gábor



2016-05-12 16:03 GMT+02:00 CPC :
> Hi folks,
>
> Is there any way in dataset api to split Dataset[A] to Dataset[A] and
> Dataset[B] ? Use case belongs to a custom filter component that we want to
> implement. We will want to direct input elements whose result is false
> after we apply the predicate. Actually we want to direct input elements
> that throw exception to another output as well(demultiplexer like
> component).
>
> Thank you in advance...

Re: Dataset split/demultiplex

2016-05-12 Thread CPC

Hi Gabor,

Yes functionally this helps. But in this case i am processing an element
twice and sending  whole data to two different operator . What i am trying
to achieve is like datastream split  like functionality or a little bit
more:
In filter like scenario i want to do below pseudo operation:

def function(iter: Iterator[URLOutputData], trueEvents:
>> Collector[URLOutputData], falseEvents: Collector[URLOutputData], errEvents:
>> Collector[URLOutputData]) {
>
> iter.foreach {
>
>   i =>
>
> try {
>
>   if (predicate(i))
>
> trueEvents.collect(i)
>
>   else
>
> falseEvents.collect(i)
>
> } catch {
>
>   case _ => errEvents.collect(i)
>
> }
>
> }
>
>   }
>
>
Another case could be,suppose i have an input set of web events comes from
different web apps and i want to split dataset based on application category

Thanks,


On 12 May 2016 at 17:28, Gábor Gévay  wrote:

> Hello,
>
> You can split a DataSet into two DataSets with two filters:
>
> val xs: DataSet[A] = ...
> val split1: DataSet[A] = xs.filter(f1)
> val split2: DataSet[A] = xs.filter(f2)
>
> where f1 and f2 are true for those elements that should go into the
> first and second DataSets respectively. So far, the splits will just
> contain elements from the input DataSet, but you can of course apply
> some map after one of the filters.
>
> Does this help?
>
> Best,
> Gábor
>
>
>
> 2016-05-12 16:03 GMT+02:00 CPC :
> > Hi folks,
> >
> > Is there any way in dataset api to split Dataset[A] to Dataset[A] and
> > Dataset[B] ? Use case belongs to a custom filter component that we want
> to
> > implement. We will want to direct input elements whose result is false
> > after we apply the predicate. Actually we want to direct input elements
> > that throw exception to another output as well(demultiplexer like
> > component).
> >
> > Thank you in advance...
>

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Robert Metzger

tl;dr: +1

I also like the proposal a lot. Our community is growing at a quite fast
pace and we need to have some structure in place to still keep track of
everything going on.

I'm happy to see that the proposal mentions cleaning up our JIRA. This is
something that has been annoying me for quite a while, but its too big to
do it alone. If maintainers could take care of their components, we should
have covered already a lot there.

One question regarding the "chair" or "lead" role for components: Is the
first name in the list of maintainers the lead?

I would actually suggest to wait until all proposed maintainers agreed to
the proposal. It doesn't make sense to make somebody a maintainer of
something if they disagree or are not aware of it.




On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels  wrote:

> +1 for the initiative. With a better process we will improve the
> quality of the Flink development and give us more time to focus.
>
> Could we have another category "Infrastructure"? This would concern
> things like CI, nightly deployment of snapshots/documentation, ASF
> Infra communication. Robert and me could be the initial maintainers
> for that.
>
> On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen  wrote:
> > Yes, Matthias, that was supposed to be you.
> > Sorry from another guy who frequently has his name misspelled ;-)
> >
> > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax 
> wrote:
> >
> >> +1 from my side.
> >>
> >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> >> it's me, even the correct spelling would be with two 't' :P)
> >>
> >> -Matthias
> >>
> >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> >> > +1 for the proposal
> >> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> >> >
> >> >> Yes, Gabor Gevay, that did refer to you!
> >> >>
> >> >> Sorry for the ambiguity...
> >> >>
> >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> >> balassi.mar...@gmail.com
> >> >>>
> >> >> wrote:
> >> >>
> >> >>> +1 for the proposal
> >> >>> @ggevay: I do think that it refers to you. :)
> >> >>>
> >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
> >> wrote:
> >> >>>
> >>  Hello,
> >> 
> >>  There are at least three Gábors in the Flink community,  :) so
> >>  assuming that the Gábor in the list of maintainers of the DataSet
> API
> >>  is referring to me, I'll be happy to do it. :)
> >> 
> >>  Best,
> >>  Gábor G.
> >> 
> >> 
> >> 
> >>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> >> > Hi everyone!
> >> >
> >> > We propose to establish some lightweight structures in the Flink
> open
> >> > source community and development process,
> >> > to help us better handle the increased interest in Flink (mailing
> >> >> list
> >>  and
> >> > pull requests), while not overwhelming the
> >> > committers, and giving users and contributors a good experience.
> >> >
> >> > This proposal is triggered by the observation that we are reaching
> >> >> the
> >> > limits of where the current community can support
> >> > users and guide new contributors. The below proposal is based on
> >> > observations and ideas from Till, Robert, and me.
> >> >
> >> > 
> >> > Goals
> >> > 
> >> >
> >> > We try to achieve the following
> >> >
> >> >   - Pull requests get handled in a timely fashion
> >> >   - New contributors are better integrated into the community
> >> >   - The community feels empowered on the mailing list.
> >> > But questions that need the attention of someone that has deep
> >> > knowledge of a certain part of Flink get their attention.
> >> >   - At the same time, the committers that are knowledgeable about
> >> >> many
> >>  core
> >> > parts do not get completely overwhelmed.
> >> >   - We don't overlook threads that report critical issues.
> >> >   - We always have a pretty good overview of what the status of
> >> >> certain
> >> > parts of the system are.
> >> >   -> What are often encountered known issues
> >> >   -> What are the most frequently requested features
> >> >
> >> >
> >> > 
> >> > Problems
> >> > 
> >> >
> >> > Looking into the process, there are two big issues:
> >> >
> >> > (1) Up to now, we have been relying on the fact that everything
> just
> >> > "organizes itself", driven by best effort. That assumes
> >> > that everyone feels equally responsible for every part, question,
> and
> >> > contribution. At the current state, this is impossible
> >> > to maintain, it overwhelms the committers and contributors.
> >> >
> >> > Example: Pull requests are picked up by whoever wants to pick them
> >> >> up.
> >>  Pull
> >> > requests that are a lot of work, have little
> >> > chance of getting in, or relate to less active components are
> >> >> sometimes
> >>  not
> >> >

Re: Dataset split/demultiplex

2016-05-12 Thread Aljoscha Krettek

Hi,
I agree that this would be very nice. Unfortunately Flink does only allow
one output from an operation right now. Maybe we can extends this somehow
in the future.

Cheers,
Aljoscha

On Thu, 12 May 2016 at 17:27 CPC  wrote:

> Hi Gabor,
>
> Yes functionally this helps. But in this case i am processing an element
> twice and sending  whole data to two different operator . What i am trying
> to achieve is like datastream split  like functionality or a little bit
> more:
> In filter like scenario i want to do below pseudo operation:
>
> def function(iter: Iterator[URLOutputData], trueEvents:
> >> Collector[URLOutputData], falseEvents: Collector[URLOutputData],
> errEvents:
> >> Collector[URLOutputData]) {
> >
> > iter.foreach {
> >
> >   i =>
> >
> > try {
> >
> >   if (predicate(i))
> >
> > trueEvents.collect(i)
> >
> >   else
> >
> > falseEvents.collect(i)
> >
> > } catch {
> >
> >   case _ => errEvents.collect(i)
> >
> > }
> >
> > }
> >
> >   }
> >
> >
> Another case could be,suppose i have an input set of web events comes from
> different web apps and i want to split dataset based on application
> category
>
> Thanks,
>
>
> On 12 May 2016 at 17:28, Gábor Gévay  wrote:
>
> > Hello,
> >
> > You can split a DataSet into two DataSets with two filters:
> >
> > val xs: DataSet[A] = ...
> > val split1: DataSet[A] = xs.filter(f1)
> > val split2: DataSet[A] = xs.filter(f2)
> >
> > where f1 and f2 are true for those elements that should go into the
> > first and second DataSets respectively. So far, the splits will just
> > contain elements from the input DataSet, but you can of course apply
> > some map after one of the filters.
> >
> > Does this help?
> >
> > Best,
> > Gábor
> >
> >
> >
> > 2016-05-12 16:03 GMT+02:00 CPC :
> > > Hi folks,
> > >
> > > Is there any way in dataset api to split Dataset[A] to Dataset[A] and
> > > Dataset[B] ? Use case belongs to a custom filter component that we want
> > to
> > > implement. We will want to direct input elements whose result is false
> > > after we apply the predicate. Actually we want to direct input elements
> > > that throw exception to another output as well(demultiplexer like
> > > component).
> > >
> > > Thank you in advance...
> >
>

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Stephan Ewen

All maintainer candidates are only proposals so far. No indication of lead
or anything so far.

Let's first see if we agree on the structure proposed here, and if we take
the components as suggested here or if we refine the list.
Am 12.05.2016 17:45 schrieb "Robert Metzger" :

> tl;dr: +1
>
> I also like the proposal a lot. Our community is growing at a quite fast
> pace and we need to have some structure in place to still keep track of
> everything going on.
>
> I'm happy to see that the proposal mentions cleaning up our JIRA. This is
> something that has been annoying me for quite a while, but its too big to
> do it alone. If maintainers could take care of their components, we should
> have covered already a lot there.
>
> One question regarding the "chair" or "lead" role for components: Is the
> first name in the list of maintainers the lead?
>
> I would actually suggest to wait until all proposed maintainers agreed to
> the proposal. It doesn't make sense to make somebody a maintainer of
> something if they disagree or are not aware of it.
>
>
>
>
> On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels 
> wrote:
>
> > +1 for the initiative. With a better process we will improve the
> > quality of the Flink development and give us more time to focus.
> >
> > Could we have another category "Infrastructure"? This would concern
> > things like CI, nightly deployment of snapshots/documentation, ASF
> > Infra communication. Robert and me could be the initial maintainers
> > for that.
> >
> > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen  wrote:
> > > Yes, Matthias, that was supposed to be you.
> > > Sorry from another guy who frequently has his name misspelled ;-)
> > >
> > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax 
> > wrote:
> > >
> > >> +1 from my side.
> > >>
> > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> > >> it's me, even the correct spelling would be with two 't' :P)
> > >>
> > >> -Matthias
> > >>
> > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > >> > +1 for the proposal
> > >> > On May 12, 2016 12:13 PM, "Stephan Ewen"  wrote:
> > >> >
> > >> >> Yes, Gabor Gevay, that did refer to you!
> > >> >>
> > >> >> Sorry for the ambiguity...
> > >> >>
> > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > >> balassi.mar...@gmail.com
> > >> >>>
> > >> >> wrote:
> > >> >>
> > >> >>> +1 for the proposal
> > >> >>> @ggevay: I do think that it refers to you. :)
> > >> >>>
> > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay 
> > >> wrote:
> > >> >>>
> > >>  Hello,
> > >> 
> > >>  There are at least three Gábors in the Flink community,  :) so
> > >>  assuming that the Gábor in the list of maintainers of the DataSet
> > API
> > >>  is referring to me, I'll be happy to do it. :)
> > >> 
> > >>  Best,
> > >>  Gábor G.
> > >> 
> > >> 
> > >> 
> > >>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > >> > Hi everyone!
> > >> >
> > >> > We propose to establish some lightweight structures in the Flink
> > open
> > >> > source community and development process,
> > >> > to help us better handle the increased interest in Flink
> (mailing
> > >> >> list
> > >>  and
> > >> > pull requests), while not overwhelming the
> > >> > committers, and giving users and contributors a good experience.
> > >> >
> > >> > This proposal is triggered by the observation that we are
> reaching
> > >> >> the
> > >> > limits of where the current community can support
> > >> > users and guide new contributors. The below proposal is based on
> > >> > observations and ideas from Till, Robert, and me.
> > >> >
> > >> > 
> > >> > Goals
> > >> > 
> > >> >
> > >> > We try to achieve the following
> > >> >
> > >> >   - Pull requests get handled in a timely fashion
> > >> >   - New contributors are better integrated into the community
> > >> >   - The community feels empowered on the mailing list.
> > >> > But questions that need the attention of someone that has
> deep
> > >> > knowledge of a certain part of Flink get their attention.
> > >> >   - At the same time, the committers that are knowledgeable
> about
> > >> >> many
> > >>  core
> > >> > parts do not get completely overwhelmed.
> > >> >   - We don't overlook threads that report critical issues.
> > >> >   - We always have a pretty good overview of what the status of
> > >> >> certain
> > >> > parts of the system are.
> > >> >   -> What are often encountered known issues
> > >> >   -> What are the most frequently requested features
> > >> >
> > >> >
> > >> > 
> > >> > Problems
> > >> > 
> > >> >
> > >> > Looking into the process, there are two big issues:
> > >> >
> > >> > (1) Up to now, we have been relying on the fact that everything
> > just
> > >> > "organizes itself", driven by be

[jira] [Created] (FLINK-3900) Set nullCheck=true as default in TableConfig

2016-05-12 Thread Flavio Pompermaier (JIRA)

Flavio Pompermaier created FLINK-3900:
-

 Summary: Set nullCheck=true as default in TableConfig
 Key: FLINK-3900
 URL: https://issues.apache.org/jira/browse/FLINK-3900
 Project: Flink
  Issue Type: Improvement
  Components: Table API
Affects Versions: 1.0.2
Reporter: Flavio Pompermaier
Priority: Minor


As discussed with Fabian, TableConfig should use nullCheck=true as default to 
allow for null values in the data



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Aljoscha Krettek

+1

The ideas seem good and the proposed number of components seems reasonable.
With this, we should also then cleanup the JIRA to make it actually usable.

On Thu, 12 May 2016 at 18:09 Stephan Ewen  wrote:

> All maintainer candidates are only proposals so far. No indication of lead
> or anything so far.
>
> Let's first see if we agree on the structure proposed here, and if we take
> the components as suggested here or if we refine the list.
> Am 12.05.2016 17:45 schrieb "Robert Metzger" :
>
> > tl;dr: +1
> >
> > I also like the proposal a lot. Our community is growing at a quite fast
> > pace and we need to have some structure in place to still keep track of
> > everything going on.
> >
> > I'm happy to see that the proposal mentions cleaning up our JIRA. This is
> > something that has been annoying me for quite a while, but its too big to
> > do it alone. If maintainers could take care of their components, we
> should
> > have covered already a lot there.
> >
> > One question regarding the "chair" or "lead" role for components: Is the
> > first name in the list of maintainers the lead?
> >
> > I would actually suggest to wait until all proposed maintainers agreed to
> > the proposal. It doesn't make sense to make somebody a maintainer of
> > something if they disagree or are not aware of it.
> >
> >
> >
> >
> > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels 
> > wrote:
> >
> > > +1 for the initiative. With a better process we will improve the
> > > quality of the Flink development and give us more time to focus.
> > >
> > > Could we have another category "Infrastructure"? This would concern
> > > things like CI, nightly deployment of snapshots/documentation, ASF
> > > Infra communication. Robert and me could be the initial maintainers
> > > for that.
> > >
> > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen 
> wrote:
> > > > Yes, Matthias, that was supposed to be you.
> > > > Sorry from another guy who frequently has his name misspelled ;-)
> > > >
> > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax 
> > > wrote:
> > > >
> > > >> +1 from my side.
> > > >>
> > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> > > >> it's me, even the correct spelling would be with two 't' :P)
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > > >> > +1 for the proposal
> > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" 
> wrote:
> > > >> >
> > > >> >> Yes, Gabor Gevay, that did refer to you!
> > > >> >>
> > > >> >> Sorry for the ambiguity...
> > > >> >>
> > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > > >> balassi.mar...@gmail.com
> > > >> >>>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> +1 for the proposal
> > > >> >>> @ggevay: I do think that it refers to you. :)
> > > >> >>>
> > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay  >
> > > >> wrote:
> > > >> >>>
> > > >>  Hello,
> > > >> 
> > > >>  There are at least three Gábors in the Flink community,  :) so
> > > >>  assuming that the Gábor in the list of maintainers of the
> DataSet
> > > API
> > > >>  is referring to me, I'll be happy to do it. :)
> > > >> 
> > > >>  Best,
> > > >>  Gábor G.
> > > >> 
> > > >> 
> > > >> 
> > > >>  2016-05-10 11:24 GMT+02:00 Stephan Ewen :
> > > >> > Hi everyone!
> > > >> >
> > > >> > We propose to establish some lightweight structures in the
> Flink
> > > open
> > > >> > source community and development process,
> > > >> > to help us better handle the increased interest in Flink
> > (mailing
> > > >> >> list
> > > >>  and
> > > >> > pull requests), while not overwhelming the
> > > >> > committers, and giving users and contributors a good
> experience.
> > > >> >
> > > >> > This proposal is triggered by the observation that we are
> > reaching
> > > >> >> the
> > > >> > limits of where the current community can support
> > > >> > users and guide new contributors. The below proposal is based
> on
> > > >> > observations and ideas from Till, Robert, and me.
> > > >> >
> > > >> > 
> > > >> > Goals
> > > >> > 
> > > >> >
> > > >> > We try to achieve the following
> > > >> >
> > > >> >   - Pull requests get handled in a timely fashion
> > > >> >   - New contributors are better integrated into the community
> > > >> >   - The community feels empowered on the mailing list.
> > > >> > But questions that need the attention of someone that has
> > deep
> > > >> > knowledge of a certain part of Flink get their attention.
> > > >> >   - At the same time, the committers that are knowledgeable
> > about
> > > >> >> many
> > > >>  core
> > > >> > parts do not get completely overwhelmed.
> > > >> >   - We don't overlook threads that report critical issues.
> > > >> >   - We always have a pretty good overview of what the status
> of
> > > >> >> certain
> > > >> >>>

[jira] [Created] (FLINK-3901) Create a RowCsvInputFormat to use as default CSV IF in Table API

2016-05-12 Thread Flavio Pompermaier (JIRA)

Flavio Pompermaier created FLINK-3901:
-

 Summary: Create a RowCsvInputFormat to use as default CSV IF in 
Table API
 Key: FLINK-3901
 URL: https://issues.apache.org/jira/browse/FLINK-3901
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.0.2
Reporter: Flavio Pompermaier
Priority: Minor


At the moment the Table APIs reads CSVs using the TupleCsvInputFormat, that has 
the big limitation of 25 fields and null handling.
A new IF producing Row object is indeed necessary to avoid those limitations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-3902) Discarded FileSystem checkpoints are lingering around

2016-05-12 Thread Ufuk Celebi (JIRA)

Ufuk Celebi created FLINK-3902:
--

 Summary: Discarded FileSystem checkpoints are lingering around
 Key: FLINK-3902
 URL: https://issues.apache.org/jira/browse/FLINK-3902
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 1.0.2
Reporter: Ufuk Celebi


A user reported that checkpoints with {{FSStateBackend}} are not properly 
cleaned up.

{code}
2016-05-10 12:21:06,559 INFO BlockStateChange: BLOCK* addToInvalidates: 
blk_1084791727_11053122 10.10.113.10:50010
2016-05-10 12:21:06,559 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.delete from 
10.10.113.9:49233 Call#12337 Retry#0
org.apache.hadoop.fs.PathIsNotEmptyDirectoryException: 
`/flink/checkpoints_test/570d6e67d571c109daab468e5678402b/chk-62 is non empty': 
Directory is not empty
at 
org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:85)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3712)
{code}

{code}
2016-05-10 12:20:22,636 [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
checkpoint 62 @ 1462875622636
2016-05-10 12:20:32,507 [flink-akka.actor.default-dispatcher-240088] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed 
checkpoint 62 (in 9843 ms)
2016-05-10 12:20:52,637 [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
checkpoint 63 @ 1462875652637
2016-05-10 12:21:06,563 [flink-akka.actor.default-dispatcher-240028] INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed 
checkpoint 63 (in 13909 ms)
2016-05-10 12:21:22,636 [Checkpoint Timer] INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
checkpoint 64 @ 1462875682636
{code}

Running the same program with the {{RocksDBBackend}} works as expected and 
clears the old checkpoints properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Nick Dimiduk

For what it's worth, this is very close to how HBase attempts to manage the
community load. We break out components (in Jira), with a list of named
component maintainers. Actually, having components alone has given a Big
Bang for the buck because when properly labeled, it makes it really easy
for part-timers to channel their efforts with precision.

As a flink user, I'm +1 for this proposal as well :)

On Thursday, May 12, 2016, Aljoscha Krettek  wrote:

> +1
>
> The ideas seem good and the proposed number of components seems reasonable.
> With this, we should also then cleanup the JIRA to make it actually usable.
>
> On Thu, 12 May 2016 at 18:09 Stephan Ewen >
> wrote:
>
> > All maintainer candidates are only proposals so far. No indication of
> lead
> > or anything so far.
> >
> > Let's first see if we agree on the structure proposed here, and if we
> take
> > the components as suggested here or if we refine the list.
> > Am 12.05.2016 17:45 schrieb "Robert Metzger"  >:
> >
> > > tl;dr: +1
> > >
> > > I also like the proposal a lot. Our community is growing at a quite
> fast
> > > pace and we need to have some structure in place to still keep track of
> > > everything going on.
> > >
> > > I'm happy to see that the proposal mentions cleaning up our JIRA. This
> is
> > > something that has been annoying me for quite a while, but its too big
> to
> > > do it alone. If maintainers could take care of their components, we
> > should
> > > have covered already a lot there.
> > >
> > > One question regarding the "chair" or "lead" role for components: Is
> the
> > > first name in the list of maintainers the lead?
> > >
> > > I would actually suggest to wait until all proposed maintainers agreed
> to
> > > the proposal. It doesn't make sense to make somebody a maintainer of
> > > something if they disagree or are not aware of it.
> > >
> > >
> > >
> > >
> > > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels  >
> > > wrote:
> > >
> > > > +1 for the initiative. With a better process we will improve the
> > > > quality of the Flink development and give us more time to focus.
> > > >
> > > > Could we have another category "Infrastructure"? This would concern
> > > > things like CI, nightly deployment of snapshots/documentation, ASF
> > > > Infra communication. Robert and me could be the initial maintainers
> > > > for that.
> > > >
> > > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen  >
> > wrote:
> > > > > Yes, Matthias, that was supposed to be you.
> > > > > Sorry from another guy who frequently has his name misspelled ;-)
> > > > >
> > > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax  >
> > > > wrote:
> > > > >
> > > > >> +1 from my side.
> > > > >>
> > > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I
> guess
> > > > >> it's me, even the correct spelling would be with two 't' :P)
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > > > >> > +1 for the proposal
> > > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen"  >
> > wrote:
> > > > >> >
> > > > >> >> Yes, Gabor Gevay, that did refer to you!
> > > > >> >>
> > > > >> >> Sorry for the ambiguity...
> > > > >> >>
> > > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > > > >> balassi.mar...@gmail.com 
> > > > >> >>>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>> +1 for the proposal
> > > > >> >>> @ggevay: I do think that it refers to you. :)
> > > > >> >>>
> > > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <
> gga...@gmail.com 
> > >
> > > > >> wrote:
> > > > >> >>>
> > > > >>  Hello,
> > > > >> 
> > > > >>  There are at least three Gábors in the Flink community,  :)
> so
> > > > >>  assuming that the Gábor in the list of maintainers of the
> > DataSet
> > > > API
> > > > >>  is referring to me, I'll be happy to do it. :)
> > > > >> 
> > > > >>  Best,
> > > > >>  Gábor G.
> > > > >> 
> > > > >> 
> > > > >> 
> > > > >>  2016-05-10 11:24 GMT+02:00 Stephan Ewen  >:
> > > > >> > Hi everyone!
> > > > >> >
> > > > >> > We propose to establish some lightweight structures in the
> > Flink
> > > > open
> > > > >> > source community and development process,
> > > > >> > to help us better handle the increased interest in Flink
> > > (mailing
> > > > >> >> list
> > > > >>  and
> > > > >> > pull requests), while not overwhelming the
> > > > >> > committers, and giving users and contributors a good
> > experience.
> > > > >> >
> > > > >> > This proposal is triggered by the observation that we are
> > > reaching
> > > > >> >> the
> > > > >> > limits of where the current community can support
> > > > >> > users and guide new contributors. The below proposal is
> based
> > on
> > > > >> > observations and ideas from Till, Robert, and me.
> > > > >> >
> > > > >> > 
> > > > >> > Goals
> > > > >> > 
> > > > >> >
> > > > >> > We try to achieve t

Re: Dataset split/demultiplex

2016-05-12 Thread CPC

Hi,

if it just require implementing a custom operator(i mean does not require
changes to network stack or other engine level changes)  i can try to
implement it since i am working on optimizer and plan generation for a
month. Also  we are going to implement our etl framework on flink and this
kind of scenario is a good fit and a common requirement in etl like flows.
If you can help me which parts of the project I should look for , i can try
it.

Thanks
On May 12, 2016 6:54 PM, "Aljoscha Krettek"  wrote:

> Hi,
> I agree that this would be very nice. Unfortunately Flink does only allow
> one output from an operation right now. Maybe we can extends this somehow
> in the future.
>
> Cheers,
> Aljoscha
>
> On Thu, 12 May 2016 at 17:27 CPC  wrote:
>
> > Hi Gabor,
> >
> > Yes functionally this helps. But in this case i am processing an element
> > twice and sending  whole data to two different operator . What i am
> trying
> > to achieve is like datastream split  like functionality or a little bit
> > more:
> > In filter like scenario i want to do below pseudo operation:
> >
> > def function(iter: Iterator[URLOutputData], trueEvents:
> > >> Collector[URLOutputData], falseEvents: Collector[URLOutputData],
> > errEvents:
> > >> Collector[URLOutputData]) {
> > >
> > > iter.foreach {
> > >
> > >   i =>
> > >
> > > try {
> > >
> > >   if (predicate(i))
> > >
> > > trueEvents.collect(i)
> > >
> > >   else
> > >
> > > falseEvents.collect(i)
> > >
> > > } catch {
> > >
> > >   case _ => errEvents.collect(i)
> > >
> > > }
> > >
> > > }
> > >
> > >   }
> > >
> > >
> > Another case could be,suppose i have an input set of web events comes
> from
> > different web apps and i want to split dataset based on application
> > category
> >
> > Thanks,
> >
> >
> > On 12 May 2016 at 17:28, Gábor Gévay  wrote:
> >
> > > Hello,
> > >
> > > You can split a DataSet into two DataSets with two filters:
> > >
> > > val xs: DataSet[A] = ...
> > > val split1: DataSet[A] = xs.filter(f1)
> > > val split2: DataSet[A] = xs.filter(f2)
> > >
> > > where f1 and f2 are true for those elements that should go into the
> > > first and second DataSets respectively. So far, the splits will just
> > > contain elements from the input DataSet, but you can of course apply
> > > some map after one of the filters.
> > >
> > > Does this help?
> > >
> > > Best,
> > > Gábor
> > >
> > >
> > >
> > > 2016-05-12 16:03 GMT+02:00 CPC :
> > > > Hi folks,
> > > >
> > > > Is there any way in dataset api to split Dataset[A] to Dataset[A] and
> > > > Dataset[B] ? Use case belongs to a custom filter component that we
> want
> > > to
> > > > implement. We will want to direct input elements whose result is
> false
> > > > after we apply the predicate. Actually we want to direct input
> elements
> > > > that throw exception to another output as well(demultiplexer like
> > > > component).
> > > >
> > > > Thank you in advance...
> > >
> >
>

[jira] [Created] (FLINK-3903) Homebrew Installation

2016-05-12 Thread Eron Wright (JIRA)

Eron Wright  created FLINK-3903:
---

 Summary: Homebrew Installation
 Key: FLINK-3903
 URL: https://issues.apache.org/jira/browse/FLINK-3903
 Project: Flink
  Issue Type: Task
  Components: Documentation, release
Reporter: Eron Wright 
Assignee: Ufuk Celebi
Priority: Minor


Recently I submitted a formula for apache-flink to the 
[homebrew|http://brew.sh/] project.   Homebrew simplifies installation on Mac:

{code}
$ brew install apache-flink
...
$ flink --version
Version: 1.0.2, Commit ID: d39af15
{code}

Updates to the formula are adhoc at the moment.  I opened this issue to 
formalize updating homebrew into the release process.  I drafted a procedure 
doc here:
[https://gist.github.com/EronWright/b62bd3b192a15be4c200a2542f7c9376]
 
Please also consider updating the website documentation to suggest homebrew as 
an alternate installation method for Mac users.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [RESULT] [VOTE] Release Apache Flink 1.0.3 (RC3)

2016-05-12 Thread Wright, Eron

FYI the brew formula has been updated to 1.0.3.

$ brew info apache-flink
apache-flink: stable 1.0.3, HEAD
Scalable batch and stream data processing
https://flink.apache.org/
Not installed
From: 
https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-flink.rb


> On May 12, 2016, at 12:58 AM, Till Rohrmann  wrote:
> 
> Thanks Ufuk :-)
> 
> On Wed, May 11, 2016 at 5:16 PM, Stephan Ewen  wrote:
> 
>> Thanks for pushing this release Ufuk!
>> 
>> On Wed, May 11, 2016 at 5:12 PM, Fabian Hueske  wrote:
>> 
>>> Thanks Ufuk!
>>> 
>>> 2016-05-11 16:39 GMT+02:00 Ufuk Celebi :
>>> 
 This vote has passed with 3 binding +1 votes. Thanks to everyone who
 contributed and tested the release candidate.
 
 +1s:
 Gyula Fora (binding)
 Fabian Hueske (binding)
 Ufuk Celebi (binding)
 
 There are no 0s or -1s.
 
 I'll go ahead finalize and package this release.
 
 On Mon, May 9, 2016 at 10:24 AM, Ufuk Celebi  wrote:
> Dear Flink community,
> 
> Please vote on releasing the following candidate as Apache Flink
>>> version
 1.0.3.
> 
> The commit to be voted on:
> f3a6b5f1e8d85d10e1449e2f96291408b781
> 
> Branch:
> release-1.0.3-rc3 (see
> 
 
>>> 
>> https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git;a=shortlog;h=refs/heads/release-1.0.3-rc3
 )
> 
> The release artifacts to be voted on can be found at:
> http://home.apache.org/~uce/flink-1.0.3-rc3/
> 
> The release artifacts are signed with the key with fingerprint
>>> 9D403309:
> http://www.apache.org/dist/flink/KEYS
> 
> The staging repository for this release can be found at:
> 
>> https://repository.apache.org/content/repositories/orgapacheflink-1096
> 
> -
> 
> The vote is open for the next 48 hours and passes if a majority of at
> least three +1 PMC votes are cast.
> 
> The vote ends on Wednesday May 11, 2016.
> 
> [ ] +1 Release this package as Apache Flink 1.0.3
> [ ] -1 Do not release this package because ...
> 
> ===
> 
> The following commits have been added since the 1.0.2 release
>>> (excluding
 docs):
> 
> * 4d3dcb1 - [FLINK-3860] [connector-wikiedits] Add retry loop to
> WikipediaEditsSourceTest (5 days ago) 
> * f1d34b1 - [FLINK-3790] [streaming] Use proper hadoop config in
> rolling sink (12 hours ago) 
> * 4a34f6f - [FLINK-3835] [optimizer] Add input id to JSON plan to
> resolve ambiguous input names. (2 days ago) 
> * d8feb15 - [hotfix] OptionSerializer.duplicate to respect stateful
> element serializer (3 days ago) 
> * 7062b0a - [FLINK-3803] [runtime] Pass CheckpointStatsTracker to
> ExecutionGraph (3 days ago) 
> * f80f6d6 - [FLINK-3678] [dist, docs] Make Flink logs directory
> configurable (4 days ago) 
> * 344a55e - [hotfix] [cep] Make cep window border treatment
>> consistent
> (9 days ago) 
 
>>> 
>>

flink Kafka connector

2016-05-12 Thread Arun Balan

Hi, I am trying to use the flink-kafka-connector and I notice that every time I 
restart my application it re-reads the last message on the kafka topic. So if 
the latest offset on the topic is 10, then when the application is restarted, 
kafka will re-read message 10. Why is this the behavior? I would assume that 
the last message has already been read and offset committed. I require that 
messages that are already processed from the topic not be reprocessed. Any 
insight would be helpful.

Thanks
Arun Balan

Re: Intellij code style

2016-05-12 Thread Chiwan Park

Please create a JIRA issue for this and send the PR with JIRA issue number.

Regards,
Chiwan Park

> On May 12, 2016, at 7:15 PM, Flavio Pompermaier  wrote:
> 
> Do I need to open also a Jira or just the PR?
> 
> On Thu, May 12, 2016 at 12:03 PM, Stephan Ewen  wrote:
> 
>> Yes, please open a pull request for that.
>> 
>> On Thu, May 12, 2016 at 11:40 AM, Flavio Pompermaier >> 
>> wrote:
>> 
>>> If you're interested to I created an Eclipse version that should follows
>>> Flink coding rules..should I create a new JIRA for it?
>>> 
>>> On Thu, May 5, 2016 at 6:02 PM, Dawid Wysakowicz <
>>> wysakowicz.da...@gmail.com
 wrote:
>>> 
 I opened JIRA: https://issues.apache.org/jira/browse/FLINK-3870. and
 created PR both to flink and flink-web.
 
 https://github.com/apache/flink/pull/1963
 https://github.com/apache/flink-web/pull/20
 
 I would be thankful for a review.
 
 2016-05-04 11:00 GMT+02:00 Fabian Hueske :
 
> Yes, please open a JIRA. Thanks!
> 
> 2016-05-04 10:16 GMT+02:00 Dawid Wysakowicz <
>>> wysakowicz.da...@gmail.com
> :
> 
>> Sure, Will open PR shortly. Shall I create any JIRA issue?
>> 
>> 2016-05-04 9:28 GMT+02:00 Fabian Hueske :
>> 
>>> +1 for adding a template to the tools folder and linking it from
>>> the
>> coding
>>> guide lines!
>>> 
>>> 2016-05-04 6:08 GMT+02:00 Henry Saputra >> :
>>> 
 We could actually put this in the tools directory of the source
>>> and
>> repo
 and refer it from contribution guide.
 
 @Dawid want to try to send Pull request for it?
 
 On Thursday, April 28, 2016, Theodore Vasiloudis <
 theodoros.vasilou...@gmail.com> wrote:
 
> Do we plan to include something like this in the contribution
 guide
>> as
> well?
> 
> On Thu, Apr 28, 2016 at 3:16 PM, Stefano Baghino <
> stefano.bagh...@radicalbit.io > wrote:
> 
>> Awesome Dawid! Thanks for taking the time to do this. :)
>> 
>> On Thu, Apr 28, 2016 at 1:45 PM, Dawid Wysakowicz <
>> wysakowicz.da...@gmail.com > wrote:
>> 
>>> Hi,
>>> 
>>> I tried to create a code style that would follow Flink
>> code-style.
>>> It
> may
>>> be not "production" ready, but I think it can be a good
 start.
>>> Hope it will be useful for someone. Also I will be glad
>> for
 any
> comments
>>> on that.
>>> 
>>> 2016-04-10 13:59 GMT+02:00 Stephan Ewen <
>> se...@apache.org
> >:
>>> 
 I don't know how close Phoenix' code style is to Flink's
>> de-facto
 code
 style.
 I would create one that reflects Flink's de-facto code
 style,
> so
 that
>> the
 formatter does not change everything...
 
 On Sun, Apr 10, 2016 at 4:40 AM, Naveen Madhire <
> vmadh...@umail.iu.edu >
 wrote:
 
> Apache Phoenix has one code template which
>> contributors
 use.
>> Do
 you
 think
> onc can use the same for Flink or may be with some
>> more
> modifications?
> 
> 
> 
 
>> 
> 
 
>>> 
>> 
> 
 
>>> 
>> https://github.com/apache/phoenix/blob/master/dev/PhoenixCodeTemplate.xml
> 
> On Sat, Apr 9, 2016 at 11:00 AM, Stephan Ewen <
>> se...@apache.org
> >
>> wrote:
> 
>> Actually, It would be amazing to create a code style
> profile
>>> for
> download,
>> so that all contributors would use that.
>> 
>> Same thing actually for IntelliJ inspections: A set
>> of
 inspections
>> we
> want
>> to have active and where we strive for zero
>> warnings.
>> 
>> On Sat, Apr 9, 2016 at 10:00 AM, Robert Metzger <
>> rmetz...@apache.org >
>> wrote:
>> 
>>> Hi Dawid,
>>> 
>>> we don't have an automated formatter for intelliJ.
>> However,
 you
>> can
 use
>> the
>>> "Checkstyle" plugin of IntelliJ to mark checkstyle
>>> violations
 in
>> the
> IDE.
>>> 
>>> On Fri, Apr 8, 2016 at 12:30 PM, Dawid Wysakowicz
>> <
>>> wysakowicz.da...@gmail.com > wrote:
>>> 
 Hi all,
 
 I am currently working on some issues and been
> wondering
>>> if
> you
 have
 settings for Intellij code style that would
>> follow
> your
 coding
>> guidelines
 av

Re: [PROPOSAL] Structure the Flink Open Source Development

2016-05-12 Thread Chiwan Park

Thanks for great suggestion.

+1 for this proposal.

Regards,
Chiwan Park

> On May 13, 2016, at 1:44 AM, Nick Dimiduk  wrote:
> 
> For what it's worth, this is very close to how HBase attempts to manage the
> community load. We break out components (in Jira), with a list of named
> component maintainers. Actually, having components alone has given a Big
> Bang for the buck because when properly labeled, it makes it really easy
> for part-timers to channel their efforts with precision.
> 
> As a flink user, I'm +1 for this proposal as well :)
> 
> On Thursday, May 12, 2016, Aljoscha Krettek  wrote:
> 
>> +1
>> 
>> The ideas seem good and the proposed number of components seems reasonable.
>> With this, we should also then cleanup the JIRA to make it actually usable.
>> 
>> On Thu, 12 May 2016 at 18:09 Stephan Ewen >
>> wrote:
>> 
>>> All maintainer candidates are only proposals so far. No indication of
>> lead
>>> or anything so far.
>>> 
>>> Let's first see if we agree on the structure proposed here, and if we
>> take
>>> the components as suggested here or if we refine the list.
>>> Am 12.05.2016 17:45 schrieb "Robert Metzger" > >:
>>> 
 tl;dr: +1
 
 I also like the proposal a lot. Our community is growing at a quite
>> fast
 pace and we need to have some structure in place to still keep track of
 everything going on.
 
 I'm happy to see that the proposal mentions cleaning up our JIRA. This
>> is
 something that has been annoying me for quite a while, but its too big
>> to
 do it alone. If maintainers could take care of their components, we
>>> should
 have covered already a lot there.
 
 One question regarding the "chair" or "lead" role for components: Is
>> the
 first name in the list of maintainers the lead?
 
 I would actually suggest to wait until all proposed maintainers agreed
>> to
 the proposal. It doesn't make sense to make somebody a maintainer of
 something if they disagree or are not aware of it.
 
 
 
 
 On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels > >
 wrote:
 
> +1 for the initiative. With a better process we will improve the
> quality of the Flink development and give us more time to focus.
> 
> Could we have another category "Infrastructure"? This would concern
> things like CI, nightly deployment of snapshots/documentation, ASF
> Infra communication. Robert and me could be the initial maintainers
> for that.
> 
> On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen > >
>>> wrote:
>> Yes, Matthias, that was supposed to be you.
>> Sorry from another guy who frequently has his name misspelled ;-)
>> 
>> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax > >
> wrote:
>> 
>>> +1 from my side.
>>> 
>>> Happy to be the maintainer for Storm-Compatibiltiy (at least I
>> guess
>>> it's me, even the correct spelling would be with two 't' :P)
>>> 
>>> -Matthias
>>> 
>>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
 +1 for the proposal
 On May 12, 2016 12:13 PM, "Stephan Ewen" > >
>>> wrote:
 
> Yes, Gabor Gevay, that did refer to you!
> 
> Sorry for the ambiguity...
> 
> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>>> balassi.mar...@gmail.com 
>> 
> wrote:
> 
>> +1 for the proposal
>> @ggevay: I do think that it refers to you. :)
>> 
>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <
>> gga...@gmail.com 
 
>>> wrote:
>> 
>>> Hello,
>>> 
>>> There are at least three Gábors in the Flink community,  :)
>> so
>>> assuming that the Gábor in the list of maintainers of the
>>> DataSet
> API
>>> is referring to me, I'll be happy to do it. :)
>>> 
>>> Best,
>>> Gábor G.
>>> 
>>> 
>>> 
>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen > >:
 Hi everyone!
 
 We propose to establish some lightweight structures in the
>>> Flink
> open
 source community and development process,
 to help us better handle the increased interest in Flink
 (mailing
> list
>>> and
 pull requests), while not overwhelming the
 committers, and giving users and contributors a good
>>> experience.
 
 This proposal is triggered by the observation that we are
 reaching
> the
 limits of where the current community can support
 users and guide new contributors. The below proposal is
>> based
>>> on
 observations and ideas from Till, Robert, and me.
 
 
 Goals
 
 
 We try to achieve the following
 
  - Pull requests get handled in a timel

37 matches

Mail list logo