Re: joda-time dependency version

2019-03-21 Thread Kenneth Knowles
+dev@ I don't know of any special reason we are using an old version. Kenn On Thu, Mar 21, 2019, 09:38 Ismaël Mejía wrote: > Does anyone have any context on why we have such an old version of > Joda time (2.4 released on 2014!) and if there is any possible issue > upgrading it? If not maybe w

Re: [spark runner dataset POC] workCount works !

2019-03-21 Thread Kenneth Knowles
Nice milestone! On Thu, Mar 21, 2019 at 10:49 AM Pablo Estrada wrote: > This is pretty cool. Thanks for working on this and for sharing:) > Best > -P. > > On Thu, Mar 21, 2019, 8:18 AM Alexey Romanenko > wrote: > >> Good job! =) >> Congrats to all who was involved to move this forward! >> >> Bt

Re: [PROPOSAL] commit granularity in master

2019-03-22 Thread Kenneth Knowles
I think my opinion is unpopular but I don't think [BEAM-] is important to have in the *subject* line of a commit message, but putting it in the body of the commit message is fine and could even be better. Mostly I want to know the actual change and I would save all subject line characters for

Re: joda-time dependency version

2019-03-23 Thread Kenneth Knowles
;https://jira.apache.org/jira/browse/BEAM-6895> > (2.10.1). > > Kenn, I've found your issue > <https://jira.apache.org/jira/browse/BEAM-5827> for joda-time vendoring, > is it still relevant? This might cause a breaking change as it is part of > user facing API. &g

Re: docs: java-dependencies

2019-03-24 Thread Kenneth Knowles
I had forgotten about that page. I think it is a good idea to include it in the release process. I would rephrase the page a little bit to make it clear that dependency conflicts are normal and expected in Java so this is an FYI page about the versions we test with. Users may have to pin to other v

[ANNOUNCE] New committer announcement: Mark Liu

2019-03-24 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Mark Liu. Mark has been contributing to Beam since late 2016! He has proposed 100+ pull requests. Mark was instrumental in expanding test and infrastructure coverage, especially for Python. In consideration of Mark'

Re: New contributor

2019-03-26 Thread Kenneth Knowles
Welcome! Cool project. A lot of code, and thorough experiments. Kenn On Tue, Mar 26, 2019 at 9:15 AM Chamikara Jayalath wrote: > Welcome! > > On Tue, Mar 26, 2019 at 8:56 AM Ahmet Altay wrote: > >> Welcome Guobao! >> >> On Tue, Mar 26, 2019 at 7:13 AM Ismaël Mejía wrote: >> >>> Welcome Guobao

Re: Build blocking on

2019-03-26 Thread Kenneth Knowles
+1 to separating integration tests from "build". It should be able to succeed without internet access (if deps are cached). On Tue, Mar 26, 2019 at 3:18 PM Michael Luckey wrote: > Of course, we could implement something here. But I am worried about the > consequences. As gogradle writes into (us

Re: Build blocking on

2019-03-26 Thread Kenneth Knowles
Exactly. What you have said is what we should move towards IMO. On Tue, Mar 26, 2019 at 4:02 PM Michael Luckey wrote: > > > On Tue, Mar 26, 2019 at 11:40 PM Kenneth Knowles wrote: > >> +1 to separating integration tests from "build". It should be able to >> s

Re: New contributor

2019-03-27 Thread Kenneth Knowles
Welcome! On Wed, Mar 27, 2019 at 2:59 PM Mikhail Gryzykhin wrote: > Welcome Niklas. > > This is another location with useful resources for contributors: > https://cwiki.apache.org/confluence/display/BEAM/Developer+Guides (contributor > guide has link to this as well though) > > On Wed, Mar 27, 2

Re: Python SDK Arrow Integrations

2019-03-27 Thread Kenneth Knowles
Thinking about Arrow + Beam SQL + schemas: - Obviously many SQL operations could be usefully accelerated by arrow / columnar. Especially in the analytical realm this is the new normal. For ETL, perhaps less so. - Beam SQL planner (pipeline construction) is implemented in Java, and so the variou

Re: Python SDK Arrow Integrations

2019-03-28 Thread Kenneth Knowles
t it is that any change to Beam or Arrow could introduce something that doesn't translate well, so we just need to be cognizant of that. Kenn > > Brian > > [1] http://arrow.apache.org/blog/2018/12/05/gandiva-donation/ > > On Wed, Mar 27, 2019 at 9:19 PM Kenneth Knowles wrote: &

Re: Increase Portable SDK Harness share of memory?

2019-03-29 Thread Kenneth Knowles
On Thu, Mar 28, 2019 at 9:30 AM Lukasz Cwik wrote: > The intention is that these kinds of hints such as CPU and/or memory > should be embedded in the environment specification that is associated with > the transforms that need resource hints. > > The environment spec is woefully ill prepared as i

Re: Quieten javadoc generation

2019-04-01 Thread Kenneth Knowles
Personally, I would like to suppress the warnings globally. I think requiring javadoc everywhere is already enough to remind someone to write something meaningful. And I think @param rarely adds anything beyond the function signature and @return rarely adds anything beyond the description. Kenn O

Re: Contibutor permissions for Beam Jira tickets

2019-04-01 Thread Kenneth Knowles
Welcome! On Mon, Apr 1, 2019 at 9:22 AM Ahmet Altay wrote: > Welcome to the project! > > On Mon, Apr 1, 2019 at 6:23 AM Ismaël Mejía wrote: > >> You have now the Contributor role, and I assigned the ticket you asked >> for. >> Enjoy! >> >> Ismaël >> >> On Mon, Apr 1, 2019 at 12:35 PM Madhusudha

Re: Removing :beam-website:testWebsite from gradle build target

2019-04-01 Thread Kenneth Knowles
+1 thanks for noticing and raising yet another source of non-hermeticity (plus the docker constraint) On Mon, Apr 1, 2019 at 9:09 AM Andrew Pilloud wrote: > +1 on this, particularly removing the dead link checker from default > tests. It is effectively testing that ~20 random websites are up. I

Re: Increase Portable SDK Harness share of memory?

2019-04-01 Thread Kenneth Knowles
nse to me. It looks like this migration path is already in place in `message Environment` in beam_runner_api.proto, with `message StandardEnvironments` enumerating some URNs and corresponding payload messages just below. So is the gap just getting the two portable runners to look at the new fields? K

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
As to building an aggregated "Java" project, I think the blocker will be supporting conflicting deps. For IOs like ElasticSearch and runners like Flink the conflict is essential and deliberate, to support multiple versions of other services. And that is not even talking about transitive dep conflic

Re: kafka 0.9 support

2019-04-01 Thread Kenneth Knowles
This could be a backward-incompatible change, though that notion has many interpretations. What matters is user pain. Technically if we don't break the core SDK, users should be able to use Java SDK >=2.11.0 with KafkaIO 2.11.0 forever. How are multiple versions of Kafka supported? Are they all in

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik wrote: > > > On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles wrote: > >> >> As to building an aggregated "Java" project, I think the blocker will be >> supporting conflicting deps. For IOs like ElasticSearch a

Re: [PROPOSAL] Introduce beam-sdks-java gradle project

2019-04-01 Thread Kenneth Knowles
m/apache/beam/pull/8194 > > On Mon, Apr 1, 2019 at 11:56 PM Kenneth Knowles wrote: > >> >> >> On Mon, Apr 1, 2019 at 2:20 PM Lukasz Cwik wrote: >> >>> >>> >>> On Mon, Apr 1, 2019 at 2:00 PM Kenneth Knowles wrote: >>> &

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Kenneth Knowles
I think option 2 with n=1 minor version seems OK. So users get the message for one release and it is gone the next. We should make sure the deprecation warning says "this is an experimental feature, so it will be removed after 1 minor version". And we need a process for doing it so it doesn't sit a

Re: [VOTE] Release 2.12.0, release candidate #1

2019-04-03 Thread Kenneth Knowles
I suggest keeping the bug open until the cherry-pick is complete. That makes tracking the burndown easier and is more accurate treatment of Fix Version. And from the other direction, a good practice is to check not only the Jira burndown [1] and also search for pull requests targeting the release

Re: [DISCUSS] Backwards compatibility of @Experimental features

2019-04-03 Thread Kenneth Knowles
e experimental by that version, it's >>>> fine - we can always bump the tagged version. However this forces us to >>>> think about each one. >>>> >>>> Downside - it might add more toil to the existing release process. >>>> >>>> Re

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-03 Thread Kenneth Knowles
Agree that a coder URN defines the encoding. I see that string UTF-8 was added to the proto enum, but it needs a written spec of the encoding. Ideally some test data that different languages can use to drive compliance testing. Kenn On Wed, Apr 3, 2019 at 6:21 PM Robert Burke wrote: > String UT

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
gt;> [3] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/coders.py#L321 >>>> >>>> >>>> >>>>> >>>>> We should define the spec clearly and have cross-language tests. >

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
On Thu, Apr 4, 2019 at 1:48 PM Kenneth Knowles wrote: > I have to actually say that a collection of test cases is not a definition > of a format. It is one of the pieces, and the other one is a textual > description in a prominent, discoverable place. > A reference implementation ca

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-04 Thread Kenneth Knowles
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/StringUtf8Coder.java#L50 >>>>> [3] >>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/coders/coders.py#L321 >>>>> >>>>&g

Re: Hazelcast Jet Runner - validation tests

2019-04-05 Thread Kenneth Knowles
Robert - that appears to be a test of state & timers, not triggers. Should work for testing that the watermark at least advances. We do already have similar java-based ValidatesRunner tests in ParDoTest. The results of triggering, while nondeterministic, should generally fall into a testable equiv

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-05 Thread Kenneth Knowles
ription" should be part of the > documentation of the URNs on the Proto messages, since that's the common > place. I've added a short description for the varints for example, and we > already have lenghthier format & protocol descriptions there for iterables > and simi

Re: Projects Can Apply Individually for Google Season of Docs

2019-04-05 Thread Kenneth Knowles
Yes, this is great. Thanks for noticing the call and pushing ahead on this, Aizhamal! I would also like to see the runner comparison revamp at https://issues.apache.org/jira/browse/BEAM-2888 which would help users really understand what they can and cannot do in plain terms. Kenn On Fri, Apr 5,

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-05 Thread Kenneth Knowles
On Fri, Apr 5, 2019 at 2:24 PM Robert Bradshaw wrote: > On Fri, Apr 5, 2019 at 6:24 PM Kenneth Knowles wrote: > > > > Nested and unnested contexts are two different encodings. Can we just > give them different URNs? We can even just express the length-prefixed > UTF-8 as

Re: Changes in Beam Jenkins Agents

2019-04-05 Thread Kenneth Knowles
I was just looking to grab something, and everything is marked verified or not a blocker except "Beam PreCommit test with Cython for Py 3.6, Py 3.7" How does one invoke this? Kenn On Thu, Apr 4, 2019 at 10:04 AM Yifan Zou wrote: > Great! Thank you, Lukasz! > > On Thu, Apr 4, 2019 at 3:10 AM Łu

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-08 Thread Kenneth Knowles
On Mon, Apr 8, 2019 at 1:57 AM Robert Bradshaw wrote: > On Sat, Apr 6, 2019 at 12:08 AM Kenneth Knowles wrote: > > > > > > > > On Fri, Apr 5, 2019 at 2:24 PM Robert Bradshaw > wrote: > >> > >> On Fri, Apr 5, 2019 at 6:24 PM Kenneth Knowles wrote

Re: [QUESTION] Should DoFns be able to get the watermark?

2019-04-09 Thread Kenneth Knowles
In state & timers and new DoFn in the past It was an explicit decision to not allow direct observation of the watermark, but only to set a timer in event time. Is there a design doc I can read to catch up? Kenn On Tue, Apr 9, 2019 at 1:44 PM Lukasz Cwik wrote: > WatermarkReporterParam is about

Re: Updates on Beam Jenkins

2019-04-09 Thread Kenneth Knowles
Yes, thanks Yifan! This is critical infrastructure that was in real trouble without your work. Kenn On Tue, Apr 9, 2019 at 2:39 PM Pablo Estrada wrote: > Thanks for the updates Yifan. I am sure this process has been difficult, > and I appreciate the good communication, and that this didn't real

[ANNOUNCE] New committer announcement: Boyuan Zhang

2019-04-10 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Boyuan Zhang. Boyuan has been contributing to Beam since early 2018. She has proposed 100+ pull requests across a wide range of topics: bug fixes, to integration tests, build improvements, metrics features, release

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-04-10 Thread Kenneth Knowles
thread, that's the goal of the mailing list ;) >>>>>> >>>>>> And yes, you got my idea about a "meta" module: easy way of building >>>>>> the >>>>>> "whole" Java SDK. >>>>>> >>&g

Re: [DISCUSS] change the encoding scheme of Python StrUtf8Coder

2019-04-10 Thread Kenneth Knowles
; >> >> >> raw UTF-8 encoded bytes are encoded when used in an *unnested* > context > >> >> >> >> and the length-prefixed UTF-8 encoded bytes are used when the > coder is > >> >> >> >> used in a *nested* context. > >&g

Re: [PROPOSAL] Custom JVM initialization for Beam workers

2019-04-10 Thread Kenneth Knowles
TL;DR I like the simple approach better than the ServiceLoader solution when a particular DoFn depends on the result. The ServiceLoader solution fits when it is somewhat independent of a particular DoFn (I'm not sure the use case(s)). On Wed, Apr 10, 2019 at 4:10 PM Brian Hulette wrote: > - Each

Re: [PROPOSAL] Custom JVM initialization for Beam workers

2019-04-10 Thread Kenneth Knowles
On Wed, Apr 10, 2019 at 8:18 PM Ahmet Altay wrote: > > > On Wed, Apr 10, 2019 at 7:59 PM Kenneth Knowles wrote: > >> TL;DR I like the simple approach better than the ServiceLoader solution >> when a particular DoFn depends on the result. The ServiceLoader solution >

Re: [DISCUSS] Side input consistency guarantees for triggers with multiple firings

2019-04-11 Thread Kenneth Knowles
Luke & I talked in person a bit. I want to give context for what is at stake here, in terms of side inputs in portability. A decent starting place is https://s.apache.org/beam-side-inputs-1-pager In that general design, the runner offers the SDK just one (or a few) materialization strategies, and

Re: [DISCUSS] Side input consistency guarantees for triggers with multiple firings

2019-04-11 Thread Kenneth Knowles
or me bringing up >> this topic, I didn't want to limit this discussion to how side inputs could >> work but in general what users want from their side inputs when dealing >> with multiple firings. >> >> On Thu, Apr 11, 2019 at 10:09 AM Kenneth Knowles wrote: >>

Re: [DISCUSS] Side input consistency guarantees for triggers with multiple firings

2019-04-12 Thread Kenneth Knowles
ther issue is when a single triggered PCollectionView is read by >> two different ParDos - each one might have a different view of the trigger. >> This is noticeable if the output of those two ParDos is then joined >> together. >> >> Reuven >> >> On Thu, Apr

Re: [EXT] Re: [DOC] Portable Spark Runner

2019-04-15 Thread Kenneth Knowles
Great. Thanks for sharing! On Mon, Apr 15, 2019 at 2:38 PM Lei Xu wrote: > This is super nice! Really look forward to use this. > > On Mon, Apr 15, 2019 at 2:34 PM Thomas Weise wrote: > >> Great to see the portable Spark runner taking shape. Thanks for the >> update! >> >> >> On Mon, Apr 15, 20

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
1. This is clearly useful, and extensively used. Agree with all that. I think it can work for batch and streaming equally well if sorting is required only per "pane", though I might be overlooking something. 2. A transform need not be primitive to be well-defined and executed in a special way by m

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-16 Thread Kenneth Knowles
lues" use case would have a lot of data duplication so we might have some payload on the transform to configure that, or a couple of related transforms. Kenn > > Reuven > > On Tue, Apr 16, 2019 at 9:08 AM Kenneth Knowles wrote: > >> 1. This is clearly useful, and

Re: Python SDK timestamp precision

2019-04-16 Thread Kenneth Knowles
I am not so sure this is a good idea. Here are some systems and their precision: Arrow - microseconds BigQuery - microseconds New Java instant - nanoseconds Firestore - microseconds Protobuf - nanoseconds Dataflow backend - microseconds Postgresql - microseconds Pubsub publish time - nanoseconds M

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-16 Thread Kenneth Knowles
+1 Ran the verification scripts. Caveats: - I input a GCS bucket that did not exist, expecting it to be created, so the Dataflow tests failed. - I also skipped the Python tests that asked to write to GitHub. - You also have not built, staged, & signed the Python wheels. It is a bit hidden in

Re: What is preferred way to label Jira issues intended for new contributors?

2019-04-17 Thread Kenneth Knowles
The only reference I know of is https://s.apache.org/beam-starter-tasks which includes even more tags. What is the goal of reducing the list? And how would you maintain it? Kenn On Wed, Apr 17, 2019 at 2:42 PM Valentyn Tymofieiev wrote: > I am seeing at least 4 labels in JIRA that can be well a

Re: Python SDK timestamp precision

2019-04-17 Thread Kenneth Knowles
Michels >>> wrote: >>> >>>> Hi, >>>> >>>> Thanks for taking care of this issue in the Python SDK, Thomas! >>>> >>>> It would be nice to have a uniform precision for timestamps but, as >>>> Kenn >>>> point

Re: Python SDK timestamp precision

2019-04-17 Thread Kenneth Knowles
>> >>>> >>>> >>>> On Wed, Apr 17, 2019 at 5:43 AM Maximilian Michels >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Thanks for taking care of this issue in the Python SDK, Thomas! >>>&

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-04-19 Thread Kenneth Knowles
WindowedValue has always been an interface, not a concrete representation: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java

Re: New IOIT Dashboards

2019-04-19 Thread Kenneth Knowles
Very cool! I assume times are all in seconds? On Fri, Apr 19, 2019 at 6:26 AM Łukasz Gajowy wrote: > Hi, > > just wanted to announce that we improved the way we collect metrics from > IOIT. Now we use Metrics API for this which allowed us to get more insights > and collect run/read/write time (a

Re: Hazelcast Jet Runner

2019-04-19 Thread Kenneth Knowles
The ValidatesRunner tests are the best source we have for knowing the capabilities of a runner. Are there instructions for running the tests? Assuming we can check it out, then just open a PR to the website with the current capabilities and caveats. Since it is a big deal and could use lots of eye

Re: Possible bug in accumulating triggers Python DirectRunner?

2019-04-19 Thread Kenneth Knowles
What is the behavior you are seeing? Kenn On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay wrote: > > > On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada wrote: > >> Hello all, >> I've been slowly learning a bit about life in streaming, with state, >> timers, triggers, etc. >> >> The other day, I tried

Re: Possible bug in accumulating triggers Python DirectRunner?

2019-04-19 Thread Kenneth Knowles
nn On Fri, Apr 19, 2019 at 3:45 PM Kenneth Knowles wrote: > What is the behavior you are seeing? > > Kenn > > On Fri, Apr 19, 2019 at 3:14 PM Ahmet Altay wrote: > >> >> >> On Fri, Apr 19, 2019 at 1:58 PM Pablo Estrada wrote: >> >>> Hel

Re: Possible bug in accumulating triggers Python DirectRunner?

2019-04-19 Thread Kenneth Knowles
/trigger_test.py", line 491, in >>> test_multiple_accumulating_firings >>> TriggerPipelineTest.all_records) >>> AssertionError: Lists differ: ['1', '2', '3', '4', '5', '1',... != ['1', >>> '2', 

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-04-22 Thread Kenneth Knowles
ce interface, I hope to just use a `T`. > > Have you seen some problem if we change the interface parameter from > `WindowedValue` to T? > > Thanks, > Jincheng > > Kenneth Knowles 于2019年4月20日周六 上午2:38写道: > >> WindowedValue has always been an interface, not a concret

[ANNOUNCE] New committer announcement: Yifan Zou

2019-04-22 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Yifan Zou. Yifan has been contributing to Beam since early 2018. He has proposed 70+ pull requests, adding dependency checking and improving test infrastructure. But something the numbers cannot show adequately is t

Re: Python SDK timestamp precision

2019-04-23 Thread Kenneth Knowles
On Tue, Apr 23, 2019 at 5:48 AM Robert Bradshaw wrote: > On Thu, Apr 18, 2019 at 12:23 AM Kenneth Knowles wrote: > > > > For Robert's benefit, I want to point out that my proposal is to support > femtosecond data, with femtosecond-scale windows, even if watermarks/event

Re: Python SDK timestamp precision

2019-04-23 Thread Kenneth Knowles
anos On Tue, Apr 23, 2019 at 7:20 AM Kenneth Knowles wrote: > On Tue, Apr 23, 2019 at 5:48 AM Robert Bradshaw > wrote: > >> On Thu, Apr 18, 2019 at 12:23 AM Kenneth Knowles wrote: >> > >> > For Robert's benefit, I want to point out that my proposal is to &

Re: [DISCUSS] Turn `WindowedValue` into `T` in the FnDataService and BeamFnDataClient interface definition

2019-04-23 Thread Kenneth Knowles
xample, > > Apache Flink also has a definition similar to WindowedValue. For > > example, Apache Flink Stream has StreamRecord. If we change > > `WindowedValue` to T, then other project's implementation > > does not need to wrap Window

Re: [VOTE] Release 2.12.0, release candidate #4

2019-04-23 Thread Kenneth Knowles
What can we do to make this part of day-to-day workflow instead of finding out during release validation? Was this just a failing test that was missed? Kenn On Tue, Apr 23, 2019 at 3:02 PM Andrew Pilloud wrote: > It looks like Java Nexmark tests are on the validation sheet but we've > missed it

Re: Hello from Hannah Jiang

2019-04-25 Thread Kenneth Knowles
Welcome! On Thu, Apr 25, 2019 at 12:38 PM Matthias Baetens wrote: > Welcome to the community! > > On Thu, Apr 25, 2019, 18:55 Griselda Cuevas wrote: > >> Welcome Hannah! - Very excited to see you in the Beam community :) >> >> On Tue, 23 Apr 2019 at 12:59, Hannah Jiang >> wrote: >> >>> Hi ever

Re: Add new JIRA component for Python IO?

2019-04-25 Thread Kenneth Knowles
Makes sense. I just added components for all the things I identified under sdks/python/apache_beam/io Kenn On Thu, Apr 25, 2019 at 12:43 PM Pablo Estrada wrote: > Hello all, > there are only two JIRA components for python: `sdk-py-core`, and > `sdk-py-harness`. Naturally, sdk-py-core is the co

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-04-25 Thread Kenneth Knowles
Pip is also able to be pointed at any raw hosted directory for the install, right? So we could publish RCs or snapshots somewhere with more obvious caveats and not interfere with the pypi list of actual releases. Much like the Java snapshots are stored in a separate opt-in repository. Kenn On Thu

[PROPOSAL] Prepare for LTS bugfix release 2.7.1

2019-04-25 Thread Kenneth Knowles
Hi all, Since the release of 2.7.0 we have identified some serious bugs: - There are 8 (non-dupe) issues* tagged with Fix Version 2.7.1 - 2 are rated "Blocker" (aka P0) but I think the others may be underrated - If you know of a critical bug that is not on that list, please file an LTS backpor

Re: Removing Java Reference Runner code

2019-04-25 Thread Kenneth Knowles
Thanks for providing all this background on the PR. It is very easy to see where it came from. Definitely nice to have less code and fewer things that can break. Perhaps lazy consensus is enough. Kenn On Thu, Apr 25, 2019 at 4:01 PM Daniel Oliveira wrote: > Hey everyone, > > I made a preliminar

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-04-25 Thread Kenneth Knowles
You could use a CombiningState with a CombineFn that returns the minimum for this case. But I've come to feel there is a mismatch. On the one hand, ParDo() is a way to drop to a lower level and write logic that does not fit a more general computational pattern, really taking fine control. On the o

Re: [PROPOSAL] Prepare for LTS bugfix release 2.7.1

2019-04-25 Thread Kenneth Knowles
on a minimum. And of course that consensus cannot force anyone to do the work, but is just a resolution of the community. Kenn On Thu, Apr 25, 2019 at 10:29 PM Jean-Baptiste Onofré wrote: > +1 it sounds good to me. > > Thanks ! > > Regards > JB > > On 26/04/2019 02:42, Kenn

Re: [BEAM-7164] Python precommit failing on Java PRs. dataflow:setupVirtualenv

2019-04-26 Thread Kenneth Knowles
Flakiness in Java got a lot better when we put the Maven cache outside the wiped build directory. I am not sure about Gradle now... It is obviously less hermetic, but these things should be immutable so a cache is acceptable. Is there a way to achieve this for Python? For Maven/Gradle a package be

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-04-26 Thread Kenneth Knowles
hat if ValueState were just implemented as a wrapper of CombiningState >> with a LatestCombineFn and documented as such (and perhaps we encourage >> users to consider using a CombiningState explicitly if at all possible)? >> >> Brian >> >> >> >> O

Re: Removing Java Reference Runner code

2019-04-26 Thread Kenneth Knowles
; > > On Fri, Apr 26, 2019 at 4:41 AM Maximilian Michels wrote: > >> Thanks for following up with this. I have mixed feelings to see the >> portable Java DirectRunner go, but I'm in favor of this change because >> it removes a lot of code that we do not really make

Re: [PROPOSAL] Preparing for Beam 2.13.0 release

2019-04-26 Thread Kenneth Knowles
By the way, that link is referenced by https://beam.apache.org/community/policies/ Is there a better way to surface the calendar? Kenn On Fri, Apr 26, 2019 at 12:23 PM Anton Kedin wrote: > Following Ankur's link I see a "[+]GoogleCalendar" button in the bottom > right corner of the page. Click

Re: Hazelcast Jet Runner

2019-04-26 Thread Kenneth Knowles
your feedback! > > Best regards, > Jozsef > > On 2019/04/19 20:52:42, Kenneth Knowles wrote: > > The ValidatesRunner tests are the best source we have for knowing the > > capabilities of a runner. Are there instructions for running the tests? > > > > Assuming we can

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-04-28 Thread Kenneth Knowles
adata and so having access to something which gives us fine > grain control ( as Kenneth mentioned) is useful. > > Cheers > > Reza > > On Sat, 27 Apr 2019 at 02:59, Kenneth Knowles wrote: > >> To be clear, the intent was always that ValueState would be not usable in >

Re: Request wiki access

2019-04-28 Thread Kenneth Knowles
Done. You should have edit access now. Let me know if there's any problem. Kenn On Sun, Apr 28, 2019 at 1:58 AM Alex Van Boxel wrote: > it seems the IntelliJ guide is a bit behind, I like to correct some small > changes. Can somebody give me access to the wiki. I've linked my JIRA > account wit

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-29 Thread Kenneth Knowles
Specifically, a lot of shared code assumes that repeatedly setting a timer is nearly free / the same cost as determining whether or not to set the timer. ReduceFnRunner has been refactored in a way so it would be very easy to set the GC timer once per window that occurs in a bundle, but there's pro

Re: Pipeline options validation

2019-04-29 Thread Kenneth Knowles
Does it make use of the @Nullable annotation or just assume any object reference could be null? Now that we are on Java 8 can it use Optional as well? (pet issue of mine :-) On Mon, Apr 29, 2019 at 5:29 PM Lukasz Cwik wrote: > The original ask for having the ability to introspect whether a field

Re: [DISCUSS] Performance of Beam compare to "Bare Runner"

2019-04-30 Thread Kenneth Knowles
ing the >>>> portability framework could be a performance win (specifically, no >>>> cloning would happen between operators of fused stages, and the >>>> cloning between operators could be on the raw bytes[] (if needed at >>>> all, because we know they

Re: Scope of windows?

2019-04-30 Thread Kenneth Knowles
> > either keep the existing one or reset it to the default. A runner can > > mutate this to a continuation trigger under the hood, which should be > > strictly looser (triggers are a promise about the earliest possible > > firing, they don't force firings to happen). >

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-04-30 Thread Kenneth Knowles
a Jira for OnTimer Context to have Key. > >> > The GC needs are mostly due to not having a Map State object in all > runners yet. > >> > >> Yeah. GC could probably be done with a max combine. The Key (which > >> should be in the API) could be an AnyCombine

Re: Structured streaming based spark runner.

2019-04-30 Thread Kenneth Knowles
Very cool. Took a look. On Tue, Apr 30, 2019 at 6:23 PM Ankur Goenka wrote: > Exciting! Thanks Etienne for sharing the design and progress. > > On Tue, Apr 30, 2019 at 10:11 AM Etienne Chauchot > wrote: > >> Hi guys, >> As part of the ongoing work on spark runner POC based on structured >> stre

Re: Scope of windows?

2019-04-30 Thread Kenneth Knowles
triggers are, by definition, not possible to specify elsewhere than sinks. OTOH today's triggers fundamentally belong to aggregation steps. > On Tue, Apr 30, 2019 at 6:24 PM Kenneth Knowles wrote: > > > To go in the direction of consistency amongst the core SDKs, we could > m

Re: [DISCUSS] Adding GroupByKeyAndSort

2019-04-30 Thread Kenneth Knowles
lob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/CombineTest.java#L655 (etc) Admittedly, it could use more precise and concise formalization on the one hand, and more conceptual description for users, independent of language. Kenn > Viliam > > On Tue, 16 Apr 2019 at

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-01 Thread Kenneth Knowles
ot;) BagState bufferState, >>>>> @StateId("count") BagState countState) { >>>>> >>>>> int count = Iterables.getFirst(countState.read(), 0); >>>>> count = count + 1; >>>>> countState.clear(); >>>>

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-01 Thread Kenneth Knowles
e sure it is in the right component and priority makes sense, and maybe alert someone who might want to know about it. Kenn On Mon, Mar 4, 2019 at 9:23 AM Kenneth Knowles wrote: > This effort to improve our triage is still ongoing. To recall: > > Issues are no longer automatically

Re: [DISCUSS] (Forked thread) Beam issue triage & assignees

2019-05-01 Thread Kenneth Knowles
Needs Triage status > when they create it? > Thanks > -P. > > On Wed, May 1, 2019 at 11:12 AM Kenneth Knowles wrote: > >> An update here: we have the new workflow in place. I have transitioned >> untriaged issues to the "Needs Triage" status" so they

Re: Congrats to Beam's first 6 Google Open Source Peer Bonus recipients!

2019-05-01 Thread Kenneth Knowles
Congrats! All well deserved! Kenn On Wed, May 1, 2019 at 8:09 PM Reza Rokni wrote: > Congratulations! > > On Thu, 2 May 2019 at 10:53, Connell O'Callaghan > wrote: > >> Well done - congratulations to you all!!! Rose thank you for sharing this >> news!!! >> >> On Wed, May 1, 2019 at 19:45 Rose

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
pes > >>> >> >> > >>> >> >> Ahmet > >>> >> >> > >>> >> >> [1] > https://lists.apache.org/thread.html/f1f342332c1e180f57d60285bebe614ffa77bb53c4f74c4cbc049096@%3Cdev.airflow.apache.org%3E > >>&g

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
Ah, and here's one on general@incubator specifically about RCs: https://lists.apache.org/thread.html/c4afcf0807d71f844d912a7e5fe6b481f0779bdcf88ccf9abe50a160@%3Cgeneral.incubator.apache.org%3E Kenn On Thu, May 2, 2019 at 8:49 AM Kenneth Knowles wrote: > I'd suggest looking f

Re: [Discuss] Publishing pre-release artifacts to repositories

2019-05-02 Thread Kenneth Knowles
candidate or snapshot > These guidelines were pending assessment by legal & infra. I don't know if there has been an update. It has been a few months. Kenn On Thu, May 2, 2019 at 8:51 AM Kenneth Knowles wrote: > Ah, and here's one on general@incubator specifically about

Re: [Forked] BEAM-4046 (was [PROPOSAL] Introduce beam-sdks-java gradle project)

2019-05-02 Thread Kenneth Knowles
> >>> We may be saying the same thing but wanted to be clear that we only need >>> to override the default that publishing plugin uses to always be >>> "org.apache.beam" instead of defaulting to project.group >>> >>> On Wed, Apr 10,

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-02 Thread Kenneth Knowles
Meta: All of Beam SQL is still "experimental" isn't it? There's very little chance that the structure of Beam SQL pipelines will be stable enough for e.g. pipeline update. So that is not worth worrying about at this stage. And this doesn't seem to affect APIs / compile time compatibility. As to th

Re: [DISCUSS] Reconciling ValueState in Java and Python (was: [docs] Python State & Timers)

2019-05-02 Thread Kenneth Knowles
en as they seem very familiar with real-world use cases here. Kenn On Thu, May 2, 2019 at 2:53 AM Robert Bradshaw wrote: > On Wed, May 1, 2019 at 8:09 PM Kenneth Knowles wrote: > > > > On Wed, May 1, 2019 at 8:51 AM Reuven Lax wrote: > >> > >> ValueState i

Re: [DISCUSS][SQL] Providing support for DISTINCT aggregations

2019-05-03 Thread Kenneth Knowles
All good points. My version of the two shuffle approach does not work at all. On Fri, May 3, 2019 at 11:38 AM Brian Hulette wrote: > Rui's point about FLOAT/DOUBLE columns is interesting as well. We couldn't > support distinct aggregations on floating point columns with the > two-shuffle approac

Re: Better naming for runner specific options

2019-05-03 Thread Kenneth Knowles
Even though they are in classes named for specific runners, they are not namespaced. All PipelineOptions exist in a global namespace so they need to be careful to be very precise. It is a good point that even though they may be multiple uses for "machine type" they are probably not going to both h

[ANNOUNCE] New committer announcement: Udi Meiri

2019-05-03 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Udi Meiri. Udi has been contributing to Beam since late 2017, starting with HDFS support in the Python SDK and continuing with a ton of Python work. I also will highlight his work on community-building infrastructur

Re: PardoLifeCycle: Teardown after failed call to setup

2019-05-06 Thread Kenneth Knowles
The specification of TearDown is that it is best effort, certainly. If your runner supports it, then the test is good to make sure there is not a regression. If your runner has partial support, that is within spec. But the idea of the spec is more than you might have such a failure that it is impos

  1   2   3   4   5   6   7   8   9   10   >