Regarding ParquetIO on S3: I am investigating the issue. It seems that it never worked on s3 (I didn't expect that). Currently, I'm trying to understand why it behaves differently than on other filesystems (HDFS, local). Any help appreciated.
Regarding ParquetIO on HDFS: I was able to run it on my machine successfully. I also created a PR with HDFS Performance test for Parquet (and it is passing too): https://github.com/apache/beam/pull/5520. Hope this will be helpful! Best regards, Łukasz 2018-05-31 0:41 GMT+02:00 Robert Bradshaw <[email protected]>: > On Wed, May 30, 2018 at 12:59 PM Ahmet Altay <[email protected]> wrote: > >> Thank you JB. >> >> For clarification, are you referring to the following items: >> - RabbitMqIO - https://github.com/apache/beam/pull/1729 >> - ParquetIO on HDFS/S3 - https://issues.apache.org/jira/browse/BEAM-4421 >> >> If the above mapping is correct, could we separate addition of new >> feature from addressing blocking issues? I would propose that we do not >> block the release for the former one and fix the latter one before the >> release. >> >> On Tue, May 29, 2018 at 10:26 PM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >>> Hi, >>> >>> I would like to merge RabbitMqIO (we are doing the final touches) and we >>> have an issue about ParquetIO on HDFS/S3 that I would like to >>> investigate with the team. >>> >> >> Do you know who is currently investigating the ParquetIO issue? Do you >> need help with that? >> > > Do we know if this is a regression, or has it never worked? > > >> I plan to start the release process asap, hopefully later today. >>> >> > That would be great. A lot has happened since the last release [1] and > we've had a pretty good cadence so far in 2018 so it'd be nice to get this > out in to the hands of our users. And thanks for volunteering to do the > release! > > - Robert > > > [1] https://github.com/apache/beam/compare/release-2.4.0...master > > > > >> >>> Regards >>> JB >>> >>> On 29/05/2018 23:00, Ahmet Altay wrote: >>> > Thank you JB for the update. Could we start the release process now? Is >>> > there anyway I could help with moving the release forward? >>> > >>> > On Fri, May 25, 2018 at 8:19 AM, Lukasz Cwik <[email protected] >>> > <mailto:[email protected]>> wrote: >>> > >>> > Thanks for the update JB. >>> > >>> > Kenn, we have the post commit integration tests which run against >>> > shaded artifacts like validates runner. We also have the nightly >>> > snapshot and its verification run which validates the nightly >>> > snapshot with DirectRunner / Dataflow / Apex / Spark / Flink for >>> > WordCount and DirectRunner / Dataflow for the mobile gaming >>> examples. >>> > >>> > I'm not sure about the IOs and whether the perfkit benchmark work >>> > adequately covers them. >>> > >>> > >>> > On Fri, May 25, 2018 at 1:28 AM Jean-Baptiste Onofré >>> > <[email protected] <mailto:[email protected]>> wrote: >>> > >>> > Hi Luke, >>> > >>> > I tested the following build: >>> > >>> > ./gradlew publishToMavenLocal -PisRelease --no-parallel >>> > >>> > The artifacts are present in my .m2/repository. >>> > >>> > For instance, I can see: >>> > >>> > .m2/repository/org/apache/beam/beam-sdks-java-core/2.5.0$ ls >>> -l >>> > total 16256 >>> > beam-sdks-java-core-2.5.0.jar >>> > beam-sdks-java-core-2.5.0.jar.asc >>> > beam-sdks-java-core-2.5.0-javadoc.jar >>> > beam-sdks-java-core-2.5.0-javadoc.jar.asc >>> > beam-sdks-java-core-2.5.0.pom >>> > beam-sdks-java-core-2.5.0.pom.asc >>> > beam-sdks-java-core-2.5.0-sources.jar >>> > beam-sdks-java-core-2.5.0-sources.jar.asc >>> > beam-sdks-java-core-2.5.0-tests.jar >>> > beam-sdks-java-core-2.5.0-tests.jar.asc >>> > beam-sdks-java-core-2.5.0-test-sources.jar >>> > beam-sdks-java-core-2.5.0-test-sources.jar.asc >>> > >>> > 1. The signatures are OK: >>> > >>> > gpg --verify beam-sdks-java-core-2.5.0.jar.asc >>> > beam-sdks-java-core-2.5.0.jar >>> > gpg: Signature made jeu. 24 mai 2018 16:55:11 CEST >>> > gpg: using RSA key >>> > 1AA8CF92D409A73393D0B736BFF2EE42C8282E76 >>> > gpg: Good signature from "Jean-Baptiste Onofré >>> > <[email protected] <mailto:[email protected]>>" >>> > [unknown] >>> > >>> > 2. The pom looks correct to me but it's not optimal because >>> > >>> > 2.1. There's no parent definition, so each pom duplicate the >>> same >>> > configurations (like scm, license, etc) >>> > 2.2. There's no Maven plugin configuration, even if it's not >>> > used for >>> > the build, other tools can parse and use plugin configuration >>> > (like the >>> > source/target version, etc). >>> > >>> > So, even if it's not optimal, the pom looks overall good. >>> > >>> > I think it makes sense to move forward on the release as it is >>> > right now. >>> > >>> > If there's no objection, I will start the release process >>> during the >>> > week end. >>> > >>> > By the way, it would be good to verify that the Maven build is >>> still >>> > working. Ismaël and I fixed new issues on the Maven build. >>> > At some point, after the 2.5.0 release, we have to state to >>> > remove the >>> > Maven build (after a vote ;)). >>> > >>> > Thanks, >>> > Regards >>> > JB >>> > >>> > >>> > On 25/05/2018 01:34, Lukasz Cwik wrote: >>> > > The license inclusion issue that was brought up on the thread >>> > has been >>> > > resolved https://issues.apache.org/jira/browse/BEAM-4393 >>> > <https://issues.apache.org/jira/browse/BEAM-4393>. >>> > > >>> > > JB, you find any other release related issues? >>> > > >>> > > On Fri, May 18, 2018 at 10:33 AM Lukasz Cwik < >>> [email protected] >>> > <mailto:[email protected]> >>> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: >>> > > >>> > > I believe JB is referring >>> > > to https://issues.apache.org/jira/browse/BEAM-4060 >>> > <https://issues.apache.org/jira/browse/BEAM-4060> >>> > > >>> > > On Fri, May 18, 2018 at 10:16 AM Scott Wegner >>> > <[email protected] <mailto:[email protected]> >>> > > <mailto:[email protected] <mailto:[email protected]>>> >>> > wrote: >>> > > >>> > > J.B., can you give any context on what metadata is >>> > missing? Is >>> > > there a JIRA? >>> > > >>> > > On Thu, May 17, 2018 at 9:30 PM Jean-Baptiste Onofré >>> > > <[email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>>> wrote: >>> > > >>> > > Hi, >>> > > >>> > > The build was OK yesterday but the >>> maven-metadata >>> > is still >>> > > missing. >>> > > >>> > > That's the point to fix before being able to >>> move >>> > forward >>> > > on the release. >>> > > >>> > > I gonna tackle this later today. >>> > > >>> > > Regards >>> > > JB >>> > > >>> > > On 05/18/2018 02:41 AM, Ahmet Altay wrote: >>> > > > Hi JB and all, >>> > > > >>> > > > I wanted to follow up on my previous email. The >>> > python >>> > > streaming issue I >>> > > > mentioned is resolved and removed from the >>> > blocker list. >>> > > Blocker list is empty >>> > > > now. You can go ahead with the release branch >>> > cut when you >>> > > are ready. >>> > > > >>> > > > Thank you, >>> > > > Ahmet >>> > > > >>> > > > >>> > > > On Sun, May 13, 2018 at 8:43 AM, Jean-Baptiste >>> > Onofré >>> > > <[email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > > <mailto:[email protected] <mailto: >>> [email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>>>> wrote: >>> > > > >>> > > > Hi guys, >>> > > > >>> > > > just to let you know that the build fully >>> > passed on my >>> > > box. >>> > > > >>> > > > I'm testing the artifacts right now. >>> > > > >>> > > > Regards >>> > > > JB >>> > > > >>> > > > On 06/04/2018 10:48, Jean-Baptiste Onofré >>> wrote: >>> > > > >>> > > > Hi guys, >>> > > > >>> > > > Apache Beam 2.4.0 has been released on >>> > March 20th. >>> > > > >>> > > > According to our cycle of release >>> (roughly 6 >>> > > weeks), we should think >>> > > > about 2.5.0. >>> > > > >>> > > > I'm volunteer to tackle this release. >>> > > > >>> > > > I'm proposing the following items: >>> > > > >>> > > > 1. We start the Jira triage now, up to >>> > Tuesday >>> > > > 2. I would like to cut the release on >>> > Tuesday >>> > > night (Europe time) >>> > > > 2bis. I think it's wiser to still use >>> > Maven for >>> > > this release. Do you >>> > > > think we >>> > > > will be ready to try a release with >>> Gradle ? >>> > > > >>> > > > After this release, I would like a >>> > discussion about: >>> > > > 1. Gradle release (if we release 2.5.0 >>> > with Maven) >>> > > > 2. Isolate release cycle per Beam part. >>> > I think it >>> > > would be interesting >>> > > > to have >>> > > > different release cycle: SDKs, DSLs, >>> > Runners, IOs. >>> > > That's another >>> > > > discussion, I >>> > > > will start a thread about that. >>> > > > >>> > > > Thoughts ? >>> > > > >>> > > > Regards >>> > > > JB >>> > > > >>> > > > >>> > > >>> > > -- >>> > > Jean-Baptiste Onofré >>> > > [email protected] <mailto:[email protected]> >>> > <mailto:[email protected] <mailto:[email protected]>> >>> > > http://blog.nanthrax.net >>> > > Talend - http://www.talend.com >>> > > >>> > >>> > -- >>> > -- >>> > Jean-Baptiste Onofré >>> > [email protected] <mailto:[email protected]> >>> > http://blog.nanthrax.net >>> > Talend - http://www.talend.com >>> > >>> > >>> >>> -- >>> -- >>> Jean-Baptiste Onofré >>> [email protected] >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >> >>
