Re: Beam spark 2.x runner status

2017-03-15 Thread Cody Innowhere
I'm personally in favor of maintaining one single branch, e.g., spark-runner, which supports both Spark 1.6 & 2.1. Since there's currently no DataFrame support in spark 1.x runner, there should be no conflicts if we put two versions of Spark into one runner. I'm also +1 for adding adapters in the

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré
By the way, this step is in the "Release Guide". Bu you are right, it means the release manager needs "permission" on the Jira or ask to change the version state. Regards JB On 03/16/2017 02:42 AM, Ahmet Altay wrote: JB, 0.6.0 is flagged as released now, thank you for catching this. As a s

Re: Beam spark 2.x runner status

2017-03-15 Thread Jean-Baptiste Onofré
Hi guys, sorry, due to the time zone shift, I answer a bit late ;) I think we can have the same runner dealing with the two major Spark version, introducing some adapters. For instance, in CarbonData, we created some adapters to work with Spark 1?5, Spark 1.6 and Spark 2.1. The dependencies co

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré
Thanks ! Regards JB On 03/16/2017 02:42 AM, Ahmet Altay wrote: JB, 0.6.0 is flagged as released now, thank you for catching this. As a side note, I did not have enough permissions do this and asked Davor to do. I will add this to the release notes. Ahmet On Wed, Mar 15, 2017 at 7:16 AM, Jess

Re: Docker image dependencies

2017-03-15 Thread Stephen Sisk
thanks for the discussion! In general, I agree with the sentiments expressed here. I updated https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a to reflect this discussion. (The plan is still that I will put that on the website.) Apache Docke

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ahmet Altay
JB, 0.6.0 is flagged as released now, thank you for catching this. As a side note, I did not have enough permissions do this and asked Davor to do. I will add this to the release notes. Ahmet On Wed, Mar 15, 2017 at 7:16 AM, Jesse Anderson wrote: > Excellent! > > On Wed, Mar 15, 2017, 6:13 AM

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
I answered inline to Abbass' comment, but I think he hit something - how about we have a branch with those adaptations ? same RDD implementation, but depending on the latest 2.x version with the minimal changes required. I'd be happy to do that, or guide anyone who wants to (I did most of it on my

Re: Beam spark 2.x runner status

2017-03-15 Thread amarouni
+1 for Spark runners based on different APIs RDD/Dataset and keeping the Spark versions as a deployment dependency. The RDD API is stable & mature enough so it makes sense to have it on master, the Dataset API still have some work to do and from our own experience it just reached a comparable RDD

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Amit Sela
Great! so we'll use the hangout you added here, see you then. On Wed, Mar 15, 2017 at 7:22 PM Eugene Kirpichov wrote: > Amit - 8am is fine with me, let's do that. > > On Wed, Mar 15, 2017 at 6:00 AM Jean-Baptiste Onofré > wrote: > > > Hi, > > > > Anyway, I hope it will result with some notes on

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
> So you're suggesting we copy-paste the current runner and adapt whatever is > necessary so it runs with Spark 2 ? Yes > This also means any bug-fix / improvement would have to be maintained in > two runners, and I wouldn't wanna do that. No, this is the reason I first proposed to deprecate the

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Eugene Kirpichov
Amit - 8am is fine with me, let's do that. On Wed, Mar 15, 2017 at 6:00 AM Jean-Baptiste Onofré wrote: > Hi, > > Anyway, I hope it will result with some notes on the mailing list as it > could be > helpful. > > I'm not against a video call to move forward, but, from ma community > perspective,

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
So you're suggesting we copy-paste the current runner and adapt whatever is necessary so it runs with Spark 2 ? This also means any bug-fix / improvement would have to be maintained in two runners, and I wouldn't wanna do that. I don't like to think in terms of Spark1/2 but in terms of RDD/Dataset

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
> However, I do feel that we should use the Dataset API, starting with batch > support first. WDYT ? Well, this is the exact current status quo, and it will take us some time to have something as complete as what we have with the spark 1 runner for the spark 2. The other proposal has two advantag

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
I feel that as we're getting closer to supporting streaming with Spark 1 runner, and having Structured Streaming advance in Spark 2, we could start work on Spark 2 runner in a separate branch. However, I do feel that we should use the Dataset API, starting with batch support first. WDYT ? On Wed,

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
> So you propose to have the Spark 2 branch a clone of the current one with > adaptations around Context->Session, Accumulator->AccumulatorV2 etc. while > still using the RDD API ? Yes this is exactly what I have in mind. > I think that having another Spark runner is great if it has value, > othe

Re: Performance Testing Next Steps

2017-03-15 Thread Ismaël Mejía
Excellent proposal, sorry to jump into this discussion so late, this was in my toread list for almost two weeks, and I finally got the time to read the document and I have two minor comments: I have the impression that the strict separation of Providers (the data-processing systems) and Resources

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
So you propose to have the Spark 2 branch a clone of the current one with adaptations around Context->Session, Accumulator->AccumulatorV2 etc. while still using the RDD API ? I think that having another Spark runner is great if it has value, otherwise, let's just bump the version. My idea of havin

Re: Beam spark 2.x runner status

2017-03-15 Thread Ismaël Mejía
BIG +1 JB, If we can just jump the version number with minor changes staying as close as possible to the current implementation for spark 1 we can go faster and offer in principle the exact same support but for version 2. I know that the advanced streaming stuff based on the DataSet API won't be

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jesse Anderson
Excellent! On Wed, Mar 15, 2017, 6:13 AM Jean-Baptiste Onofré wrote: > Hi Ahmet, > > it seems Jira is not up to date: 0.6.0 version is not flagged as > "Released". > > Can you fix that please ? > > Thanks ! > Regards > JB > > On 03/15/2017 05:22 AM, Ahmet Altay wrote: > > I'm happy to announce t

Re: Docker image dependencies

2017-03-15 Thread Ismaël Mejía
Hi, Thanks for bringing this subject to the mailing list. +1 We definitely need a consensus on this, and I agree with your proposal and JB’s comments modulo certain clarifications: I think we shall go in this priority order if the version of the image we want is available: 1. Image provided by t

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Jean-Baptiste Onofré
Hi Ahmet, it seems Jira is not up to date: 0.6.0 version is not flagged as "Released". Can you fix that please ? Thanks ! Regards JB On 03/15/2017 05:22 AM, Ahmet Altay wrote: I'm happy to announce that we have unanimously approved this release. There are 7 approving votes, 4 of which are bi

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Jean-Baptiste Onofré
Hi, Anyway, I hope it will result with some notes on the mailing list as it could be helpful. I'm not against a video call to move forward, but, from ma community perspective, we should always provide minute notes on the mailing list. Unfortunately, next Friday, I will still be in China, s

Re: Beam spark 2.x runner status

2017-03-15 Thread Jean-Baptiste Onofré
Hi Amit, What do you think of the following: - in the mean time that you reintroduce the Spark 2 branch, what about "extending" the version in the current Spark runner ? Still using RDD/DStream, I think we can support Spark 2.x even if we don't yet leverage the new provided features. Though

Re: Beam spark 2.x runner status

2017-03-15 Thread Amit Sela
Hi Cody, I will re-introduce this branch soon as part of the work on BEAM-913 . For now, and from previous experience with the mentioned branch, batch implementation should be straight-forward. Only issue is with streaming support - in the current ru

Re: Call for help: let's add Splittable DoFn to Spark, Flink and Apex runners

2017-03-15 Thread Amit Sela
I have dinner at 9am.. which doesn't sound like a real thing if you forget about timezones J How about 8am ? or something later like 12pm mid-day ? Apex can take the 9am time slot ;-) On Wed, Mar 15, 2017 at 4:28 AM Eugene Kirpichov wrote: > Hi! Please feel free to join this call, but I think we

Re: Style: how much testing for transform builder classes?

2017-03-15 Thread Ismaël Mejía
+1 to Vikas point maybe the right place to enforce things correct build tests is in the validate and like this reduce the test boilerplate and only test the validate, but I wonder if this totally covers both cases (the buildsCorrectly and buildsCorrectlyInDifferentOrder ones). I answer Eugene’s qu

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ismaël Mejía
Thanks Ahmet for dealing with the release, I just tried the pip install apache-beam and the wordcount example and as you said it feels awesome to see this working so easily now.​ Congrats to everyone working on the python SDK ! On Wed, Mar 15, 2017 at 8:17 AM, Ahmet Altay wrote: > This release

Jenkins build is still unstable: beam_Release_NightlySnapshot #357

2017-03-15 Thread Apache Jenkins Server
See

Re: [RESULT] [VOTE] Release 0.6.0, release candidate #2

2017-03-15 Thread Ahmet Altay
This release is now complete. Thanks to everyone who have helped make this release possible! Before sending a note to users@, I would like to make a pass over the website and simplify things now that we have an official python release. I did the first 'pip install apache-beam' today and it felt am

Beam spark 2.x runner status

2017-03-15 Thread Cody Innowhere
Hi guys, Is there anybody who's currently working on Spark 2.x runner? A old PR for spark 2.x runner was closed a few days ago, so I wonder what's the status now, and is there a roadmap for this? Thanks~