On Thu, Jan 26, 2017 at 6:58 PM, Kenneth Knowles
wrote:
> On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
>> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov
>> wrote:
>> >
>> > you can't wrap DoFn's, period
>>
>> As a simple example, given a DoFn it's
Congrats! Well deserved Stas
בתאריך 27 בינו' 2017 7:26, "Frances Perry" כתב:
> Woohoo! Congrats ;-)
>
> On Thu, Jan 26, 2017 at 9:05 PM, Jean-Baptiste Onofré
> wrote:
>
> > Welcome aboard !
> >
> > Regards
> > JB
> >
> > On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci wrote:
> > >Please join
Hi Eugene
A simple way would be to create a BatchedDoFn in an extension.
WDYT ?
Regards
JB
On Jan 26, 2017, 21:48, at 21:48, Eugene Kirpichov
wrote:
>I don't think we should make batching a core feature of the Beam
>programming model (by adding it to DoFn as this code snippet implies).
>I'm
Woohoo! Congrats ;-)
On Thu, Jan 26, 2017 at 9:05 PM, Jean-Baptiste Onofré
wrote:
> Welcome aboard !
>
> Regards
> JB
>
> On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci wrote:
> >Please join me and the rest of Beam PMC in welcoming the following
> >contributors as our newest committers. They h
Welcome aboard !
Regards
JB
On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci wrote:
>Please join me and the rest of Beam PMC in welcoming the following
>contributors as our newest committers. They have significantly
>contributed
>to the project in different ways, and we look forward to many more
Congrats!
On Fri, Jan 27, 2017, 06:25 Thomas Weise wrote:
> Congrats!
>
>
> On Thu, Jan 26, 2017 at 7:49 PM, María García Herrero <
> mari...@google.com.invalid> wrote:
>
> > Congratulations and thank you for your contributions thus far!
> >
> > On Thu, Jan 26, 2017 at 6:00 PM, Robert Bradshaw <
Congrats!
On Thu, Jan 26, 2017 at 7:49 PM, María García Herrero <
mari...@google.com.invalid> wrote:
> Congratulations and thank you for your contributions thus far!
>
> On Thu, Jan 26, 2017 at 6:00 PM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> > Welcome and congratulations!
>
Congratulations and thank you for your contributions thus far!
On Thu, Jan 26, 2017 at 6:00 PM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
> Welcome and congratulations!
>
> On Thu, Jan 26, 2017 at 5:05 PM, Sourabh Bajaj
> wrote:
> > Congrats!!
> >
> > On Thu, Jan 26, 2017 at 5:02 PM
On Thu, Jan 26, 2017 at 4:15 PM, Robert Bradshaw <
rober...@google.com.invalid> wrote:
> On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov
> wrote:
> >
> > you can't wrap DoFn's, period
>
> As a simple example, given a DoFn it's perfectly natural to want
> to "wrap" this as a DoFn, KV>. State, si
Welcome and congratulations!
On Thu, Jan 26, 2017 at 5:05 PM, Sourabh Bajaj
wrote:
> Congrats!!
>
> On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster
> wrote:
>
>> Congrats all! Very exciting. :)
>>
>> On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson
>> wrote:
>>
>> > Welcome!
>> >
>> > On Thu, Jan 2
On Thu, Jan 26, 2017 at 5:04 PM, Eugene Kirpichov
wrote:
> It would be nice to start with an inventory of the batching use cases we
> already have implemented manually, and see what kind of API would be
> sufficient to replace all of them.
Sounds like a good way to make forward progress.
> E.g.:
On Thu, Jan 26, 2017 at 4:20 PM, Ben Chambers
wrote:
> Here's an example API that would make this part of a DoFn. The idea here is
> that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the
> runner (and DoFnRunner) could see that it has asked for batches, so rather
> than calling a
Congrats!!
On Thu, Jan 26, 2017 at 5:02 PM Jason Kuster
wrote:
> Congrats all! Very exciting. :)
>
> On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson
> wrote:
>
> > Welcome!
> >
> > On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci wrote:
> >
> > > Please join me and the rest of Beam PMC in welcoming
It would be nice to start with an inventory of the batching use cases we
already have implemented manually, and see what kind of API would be
sufficient to replace all of them.
E.g.:
- both fixed and dynamic batch size as specified above are insufficient for
what PubsubIO and ElasticsearchIO do (t
Congrats all! Very exciting. :)
On Thu, Jan 26, 2017 at 4:48 PM, Jesse Anderson
wrote:
> Welcome!
>
> On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci wrote:
>
> > Please join me and the rest of Beam PMC in welcoming the following
> > contributors as our newest committers. They have significantly
> c
Welcome!
On Thu, Jan 26, 2017, 7:27 PM Davor Bonaci wrote:
> Please join me and the rest of Beam PMC in welcoming the following
> contributors as our newest committers. They have significantly contributed
> to the project in different ways, and we look forward to many more
> contributions in the
Please join me and the rest of Beam PMC in welcoming the following
contributors as our newest committers. They have significantly contributed
to the project in different ways, and we look forward to many more
contributions in the future.
* Stas Levin
Stas has contributed across the breadth of the
Here's an example API that would make this part of a DoFn. The idea here is
that it would still be run as `ParDo.of(new MyBatchedDoFn())`, but the
runner (and DoFnRunner) could see that it has asked for batches, so rather
than calling a `processElement` on every input `I`, it assembles a
`Collectio
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov
wrote:
> I agree that wrapping the DoFn is probably not the way to go, because the
> DoFn may be quite tricky due to all the reflective features: e.g. how do
> you automatically "batch" a DoFn that uses state and timers? What about a
> DoFn that us
The third option for batching:
- Functionality within the DoFn and DoFnRunner built as part of the SDK.
I haven't thought through Batching, but at least for the
IntraBundleParallelization use case this actually does make sense to expose
as a part of the model. Knowing that a DoFn supports paralle
On Thu, Jan 26, 2017 at 3:42 PM, Eugene Kirpichov <
kirpic...@google.com.invalid> wrote:
> The class for invoking DoFn's,
> DoFnInvokers, is absent from the SDK (and present in runners-core) for a
> good reason.
>
This would be true if it weren't for that pesky DoFnTester :-)
And even if we solv
I agree that wrapping the DoFn is probably not the way to go, because the
DoFn may be quite tricky due to all the reflective features: e.g. how do
you automatically "batch" a DoFn that uses state and timers? What about a
DoFn that uses a BoundedWindow parameter? What about a splittable DoFn?
What a
On Thu, Jan 26, 2017 at 3:31 PM, Ben Chambers
wrote:
> I think that wrapping the DoFn is tricky -- we backed out
> IntraBundleParallelization because it did that, and it has weird
> interactions with both the reflective DoFn and windowing. We could maybe
> make some kind of "DoFnDelegatingDoFn" th
I think that wrapping the DoFn is tricky -- we backed out
IntraBundleParallelization because it did that, and it has weird
interactions with both the reflective DoFn and windowing. We could maybe
make some kind of "DoFnDelegatingDoFn" that could act as a base class and
get some of that right, but..
On Thu, Jan 26, 2017 at 12:48 PM, Eugene Kirpichov
wrote:
> I don't think we should make batching a core feature of the Beam
> programming model (by adding it to DoFn as this code snippet implies). I'm
> reasonably sure there are less invasive ways of implementing it.
+1, either as a PTransform,
I don't think we should make batching a core feature of the Beam
programming model (by adding it to DoFn as this code snippet implies). I'm
reasonably sure there are less invasive ways of implementing it.
On Thu, Jan 26, 2017 at 12:22 PM Jean-Baptiste Onofré
wrote:
> Agree, I'm curious as well.
>
Agree, I'm curious as well.
I guess it would be something like:
.apply(ParDo(new DoFn() {
@Override
public long batchSize() {
return 1000;
}
@ProcessElement
public void processElement(ProcessContext context) {
...
}
}));
If batchSize (overrided by user) returns a p
Thank you Kenn!
Shen
On Thu, Jan 26, 2017 at 1:33 PM, Kenneth Knowles
wrote:
> On Thu, Jan 26, 2017 at 9:48 AM, Thomas Groh
> wrote:
> >
> > The default watermark policy for a bounded source should be negative
> > infinity until all of the data is read, then positive infinity.
>
>
> Just to el
On Thu, Jan 26, 2017 at 9:48 AM, Thomas Groh
wrote:
>
> The default watermark policy for a bounded source should be negative
> infinity until all of the data is read, then positive infinity.
Just to elaborate - there isn't a way for a bounded source to communicate a
watermark. Runners each do th
Hi Thomas,
Thanks a lot for explaining.
Best,
Shen
On Thu, Jan 26, 2017 at 12:48 PM, Thomas Groh
wrote:
> The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which
> is
> equivalent to -2**63 microseconds. We also occasionally refer to this
> timestamp as "negative infinity".
>
The default timestamp should be BoundedWindow.TIMESTAMP_MIN_VALUE, which is
equivalent to -2**63 microseconds. We also occasionally refer to this
timestamp as "negative infinity".
The default watermark policy for a bounded source should be negative
infinity until all of the data is read, then posi
First off, let me say that a *correctly* batching DoFn is a lot of
value, especially because it's (too) easy to (often unknowingly)
implement it incorrectly.
My take is that a BatchingParDo should be a PTransform,
PCollection> that takes a DoFn, ? extends
Iterable> as a parameter, as well as some
It think relaxing the query to not be an exact match is reasonable. I'm
wondering if it should be substring or regex. either one preserves the
existing behavior of, when passed a full step path returning only the
metrics for that specific step, but it adds the ability to just know
approximately the
Hi Etienne,
Could you post some snippets of how your transform is to be used in a
pipeline? I think that would make it easier to discuss on this thread and
could save a lot of churn if the discussion ends up leading to a different
API.
On Thu, Jan 26, 2017 at 8:29 AM Etienne Chauchot
wrote:
> W
Wonderful !
Thanks Kenn !
Etienne
Le 26/01/2017 à 15:34, Kenneth Knowles a écrit :
Hi Etienne,
I was drafting a proposal about @OnWindowExpiration when this email
arrived. I thought I would try to quickly unblock you by responding with a
TL;DR: you can achieve your goals with state & timers
Fantastic !
Let me take a look on the Spark runner ;)
Thanks !
Regards
JB
On 01/26/2017 03:34 PM, Kenneth Knowles wrote:
Hi Etienne,
I was drafting a proposal about @OnWindowExpiration when this email
arrived. I thought I would try to quickly unblock you by responding with a
TL;DR: you can ac
Hi Etienne,
I was drafting a proposal about @OnWindowExpiration when this email
arrived. I thought I would try to quickly unblock you by responding with a
TL;DR: you can achieve your goals with state & timers as they currently
exist. You'll set a timer for window.maxTimestamp().plus(allowedLatenes
Ben - yes, there is still some ambiguity regarding the querying of the
metrics results.
You've discussed in this thread the notion that metrics step names should
be converted to unique names when aggregating metrics, so that each step
will aggregate its own metrics, and not join with other steps b
Hi,
I have started to implement this ticket. For now it is implemented as a
PTransform that simply does ParDo.of(new DoFn) and all the processing
related to batching is done in the DoFn.
I'm starting to deal with windows and bundles (starting to take a look
at the State API to process trans-
+1 to what Dan said
On Wed, 25 Jan 2017 at 21:40 Kenneth Knowles wrote:
> +1
>
> On Jan 25, 2017 11:15, "Jean-Baptiste Onofré" wrote:
>
> > +1
> >
> > It sounds good to me.
> >
> > Thanks Dan !
> >
> > Regards
> > JB
> >
> > On Jan 25, 2017, 19:39, at 19:39, Dan Halperin
>
> > wrote:
> > >He
It makes sense.
Agreed.
Regards
JB
On 01/25/2017 08:34 PM, Kenneth Knowles wrote:
There's actually not a JIRA filed beyond BEAM-25 for what Eugene is
referring to. Context: Prior to windowing and streaming it was safe to
buffer elements in @ProcessElement and then actually perform output in
@F
Le 25/01/2017 à 20:34, Kenneth Knowles a écrit :
There's actually not a JIRA filed beyond BEAM-25 for what Eugene is
referring to. Context: Prior to windowing and streaming it was safe to
buffer elements in @ProcessElement and then actually perform output in
@FinishBundle. This pattern is only
42 matches
Mail list logo