[RESULT][VOTE] Release 2.1.0, release candidate #3

2017-08-21 Thread Jean-Baptiste Onofré
Hi This vote passed with only +1. I'm promoting the artifacts to central and update Jira. As I'm in vacation can a committer deal with the tag and website or merge ? Sorry for this very short e-mail. Thanks all for your vote. Regards JB On Aug 18, 2017, 18:43, at 18:43, "Jean-Baptiste Onofré

How to Retain File Name while using TextIO for pattern

2017-08-21 Thread Siddharth Mittal
Hi Team, We want to retain the File name while reading a zip file using TextIO api. When we read a Zip file using TextIO API we get PCollection of all lines of all files but the file name is not present . If we have a Zip file which contains four files inside that lets say file1.csv , file2.

How to read files in distributed way from a pcollection

2017-08-21 Thread Siddharth Mittal
Hi Team, I have a use case where I will get a PCollection of file names. Files are present on NFS and file size may wary from few KBs to few GBs. We want to transform PCollection of File Names to PCollection of Please Suggest how to handle this type of use case. Thanks & Regards Siddharth Mi

Re: Beam spark 2.x runner status

2017-08-21 Thread Holden Karau
I'd love to take a look at the PR when it comes in (<3 BEAM + SPARK :)). On Mon, Aug 21, 2017 at 11:33 AM, Jean-Baptiste Onofré wrote: > Hi > > I did a new runner supporting spark 2.1.x. I changed code for that. > > I'm still in vacation this week. I will send an update when back. > > Regards >

Re: Beam spark 2.x runner status

2017-08-21 Thread Jean-Baptiste Onofré
Hi I did a new runner supporting spark 2.1.x. I changed code for that. I'm still in vacation this week. I will send an update when back. Regards JB On Aug 21, 2017, 09:01, at 09:01, Pei HE wrote: >Any updates for upgrading to spark 2.x? > >I tried to replace the dependency and found a compile

Re: [Proposal] Progress Reporting in Fn API

2017-08-21 Thread Vikas RK
Hi, I have updated the proposal based on the comments received. The major change is that the SDK no longer reports cumulative backlog, but includes more details for each transform itself. This provides a Runner more information about each tra

Re: [DISCUSS] Capability Matrix revamp

2017-08-21 Thread Tyler Akidau
Is there any way we could add quantitative runner metrics to this as well? Like by having some benchmarks that process X amount of data, and then detailing in the matrix latency, throughput, and (where possible) cost, etc, numbers for each of the given runners? Semantic support is one thing, but th

Re: Beam spark 2.x runner status

2017-08-21 Thread Pei HE
Any updates for upgrading to spark 2.x? I tried to replace the dependency and found a compile error from implementing a scala trait: org.apache.beam.runners.spark.io.SourceRDD.SourcePartition is not abstract and does not override abstract method org$apache$spark$Partition$$super$equals(java.lang.O