New blog post: Splittable DoFn

2017-08-16 Thread Eugene Kirpichov
Hi all, The blog post Powerful and modular IO connectors with Splittable DoFn in Apache Beam just went live - take a look! *One of the most important parts of the Apache Beam ecosystem is its quickly growing set of connectors that al

Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread María García Herrero
Welcome Gris, Umang, and Justin! On Wed, Aug 16, 2017 at 1:15 AM, Jean-Baptiste Onofré wrote: > Welcome ! > > Regards > JB > > On Aug 16, 2017, 08:54, at 08:54, "Ismaël Mejía" > wrote: > >Hello and welcome Griselda, Umang, Justin > > > >Apart of the links provided by Ahmet you might read Beam-r

Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
bq. IRAs should still stay open but should become unassigned The above would need admin privilege, right ? Is there automated way to do it ? bq. Prevent contributors/committers from taking more than 'n' JIRAs at the same time It would be hard to determine the N above since the amount of coding /

Re: Policy for stale PRs

2017-08-16 Thread Ismaël Mejía
Thanks Ahmet for bringing this subject. +1 to close the stale PRs automatically after a fixed time of inactivity. 90 days is ok, but maybe a shorter period is better. If we consider that being stale is just not having any activity i.e., the author of the PR does not answer any message. The author

Re: Policy for stale PRs

2017-08-16 Thread Thomas Groh
JIRAs should only be closed if the issue that they track is no longer relevant (either via being fixed or being determined to not be a problem). If a JIRA isn't being meaningfully worked on, it should be unassigned (in all cases, not just if there's an associated pull request that has not been work

Re: Policy for stale PRs

2017-08-16 Thread Jean-Baptiste Onofré
IMHO the jira should stay open as it's different from the PR. Regards JB On Aug 16, 2017, 20:16, at 20:16, Ted Yu wrote: >What should be done to the JIRA associated with the PR? > Original message From: Ahmet Altay > Date: 8/16/17 12:05 PM (GMT-08:00) To: >dev@beam.apache.org S

Re: contrib package for beam?

2017-08-16 Thread Griselda Cuevas
I like the idea -- This seems like a thing I can help with to get familiar with the project. Who could help me make a list of available things? On 16 August 2017 at 12:02, Jesse Anderson wrote: > I've had this discussion before. I'd love to see one so that there's a > consistent home for things

Re: Policy for stale PRs

2017-08-16 Thread Lukasz Cwik
I think the JIRA should remain open and possibly become unassigned. On Wed, Aug 16, 2017 at 12:16 PM, Ted Yu wrote: > What should be done to the JIRA associated with the PR? > Original message From: Ahmet Altay > Date: 8/16/17 12:05 PM (GMT-08:00) To: > dev@beam.apache.org Su

Re: Policy for stale PRs

2017-08-16 Thread Ted Yu
What should be done to the JIRA associated with the PR? Original message From: Ahmet Altay Date: 8/16/17 12:05 PM (GMT-08:00) To: dev@beam.apache.org Subject: Re: Policy for stale PRs Sounds like we have consensus. Since this is a new policy, I would suggest picking the most

Re: Policy for stale PRs

2017-08-16 Thread Sourabh Bajaj
Some projects I have seen close stale PRs after 30 days, saying "Closing due to lack of activity, please feel free to re-open". On Wed, Aug 16, 2017 at 12:05 PM Ahmet Altay wrote: > Sounds like we have consensus. Since this is a new policy, I would suggest > picking the most flexible option for

Re: Policy for stale PRs

2017-08-16 Thread Ahmet Altay
Sounds like we have consensus. Since this is a new policy, I would suggest picking the most flexible option for now (90 days) and we can tighten it in the future. To answer Kenn's question, I do not know, how other projects handle this. I did a basic search but could not find a good answer. What m

Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Chamikara Jayalath
Thanks for proposing this. I left some comments. My main concern is the possible complexity this might add to textio and potential performance impact. So at this point I prefer if this is implemented as a new filebasedsource instead of updating textio. I'm open to being convinced otherwise :). Th

Re: contrib package for beam?

2017-08-16 Thread Jesse Anderson
I've had this discussion before. I'd love to see one so that there's a consistent home for things that don't belong in the API. On Wed, Aug 16, 2017, 2:55 PM Pablo Estrada wrote: > Hi all, > What would be an appropriate medium for contributions such as utility > Pipelines or PTransforms? Perhaps

contrib package for beam?

2017-08-16 Thread Pablo Estrada
Hi all, What would be an appropriate medium for contributions such as utility Pipelines or PTransforms? Perhaps it's different for each kind of contribution (sources/sinks, PTransforms, or utility pipelines). The question comes from an active user on Stack Overflow[1], and it seems pertinent. What

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Robert Bradshaw
+1 binding (I've been on vacation as well.) On Wed, Aug 16, 2017 at 8:50 AM, Lukasz Cwik wrote: > Back from vacation. > > +1 binding > > BEAM-2671 has been marked for 2.2.0 release. > > > > On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant wrote: > >> Hi, >> >> Spark runner was tested with word coun

Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Eugene Kirpichov
+Chamikara Jayalath Also you may find useful the recent discussion on WholeFileIO https://lists.apache.org/thread.html/6ea193b7178f8ab44de5562bfdd94dc3fe740bc440e8a05e533e40cf@%3Cdev.beam.apache.org%3E https://github.com/apache/beam/pull/3543 (I think bulk of discussion happened there) https://git

Re: Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Jean-Baptiste Onofré
I will thanks ! Regards JB On Aug 16, 2017, 18:53, at 18:53, Asha Rostamianfar wrote: >Hi everyone, > >I have a proposal to add a new built-in I/O source for VCF files: >https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit > >I'm planning to take on the implement

Proposal: adding a built-in I/O source for VCF files

2017-08-16 Thread Asha Rostamianfar
Hi everyone, I have a proposal to add a new built-in I/O source for VCF files: https://docs.google.com/document/d/1jsdxOPALYYlhnww2NLURS8NKXaFyRSJrcGbEDpY9Lkw/edit I'm planning to take on the implementation work myself, but wanted to get preliminary feedback about the proposed design as it requir

Re: Proposal : An extension for sketch-based statistics

2017-08-16 Thread Arnaud Fournier
Thanks to bring these subjects in the discussio Ismaël. For the second point about the standard deviation, I just want to add that this could also be added to the distribution metric. Actually I think this makes much more sense than just add a new transform for this (we can also do both). Indeed,

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Jean-Baptiste Onofré
Hi Thanks. I will send the result e-mail, promote the artifacts on Central and dist.apache.org. Then I will prepare the announcement (website and mailing lists). Regards JB On Aug 16, 2017, 17:20, at 17:20, Eugene Kirpichov wrote: >Thanks Luke! With your vote, we have 3 PMC affirmative votes

Re: ConcurrentModificationException while performing checkpoint for Kinesis stream

2017-08-16 Thread Lukasz Cwik
Moved to dev@beam.apache.org On Wed, Aug 16, 2017 at 9:22 AM, Pawel Bartoszek wrote: > When flink performs a checkpoint I get randomly > ConcurrentModificationException. > > From my investigation it looks like the method > > public boolean advance() throws IOException > >

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Eugene Kirpichov
Thanks Luke! With your vote, we have 3 PMC affirmative votes. JB, what are the next steps to finalize the release? On Wed, Aug 16, 2017 at 8:50 AM Lukasz Cwik wrote: > Back from vacation. > > +1 binding > > BEAM-2671 has been marked for 2.2.0 release. > > > > On Wed, Aug 16, 2017 at 2:08 AM, Kob

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Lukasz Cwik
Back from vacation. +1 binding BEAM-2671 has been marked for 2.2.0 release. On Wed, Aug 16, 2017 at 2:08 AM, Kobi Salant wrote: > Hi, > > Spark runner was tested with word count example and a more complex session > based application on a yarn cluster. > Both application run successfully so w

Re: Policy for stale PRs

2017-08-16 Thread Aviem Zur
Makes sense to close after a long time of inactivity and no response, and as Kenn mentioned they can always re-open. On Wed, Aug 16, 2017 at 12:20 AM Jean-Baptiste Onofré wrote: > If we consider the author, it makes sense. > > Regards > JB > > On Aug 15, 2017, 01:29, at 01:29, Ted Yu wrote: > >

Re: [VOTE] Release 2.1.0, release candidate #3

2017-08-16 Thread Kobi Salant
Hi, Spark runner was tested with word count example and a more complex session based application on a yarn cluster. Both application run successfully so we can say that spark runner passed the sanity tests needed. Still there is an open ticket https://issues.apache.org/jira/browse/BEAM-2671 which

Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Jean-Baptiste Onofré
Welcome ! Regards JB On Aug 16, 2017, 08:54, at 08:54, "Ismaël Mejía" wrote: >Hello and welcome Griselda, Umang, Justin > >Apart of the links provided by Ahmet you might read Beam-related >material on the website (See Documentation > Programming Guide and >Documentation > Additional Resources am

Re: Hello from a newbie to the data world living in the city by the bay!

2017-08-16 Thread Ismaël Mejía
Hello and welcome Griselda, Umang, Justin Apart of the links provided by Ahmet you might read Beam-related material on the website (See Documentation > Programming Guide and Documentation > Additional Resources among others). But probably as important as improving your Beam related knowledge is t