That would be one way of doing it, yes...
On Thu, Apr 30, 2015 at 10:23 PM, Gyula Fóra wrote:
> Okay, so the commit would be something like:
>
> commitState(OperatorState state)
>
>
> On Thu, Apr 30, 2015 at 10:17 PM, Stephan Ewen wrote:
>
> > I think your assumption (and the current kafka sour
Okay, so the commit would be something like:
commitState(OperatorState state)
On Thu, Apr 30, 2015 at 10:17 PM, Stephan Ewen wrote:
> I think your assumption (and the current kafka source implementation) is
> that there is one state object that you update/mutate all the time.
>
> If you draw a
Okay, sounds very reasonable :)
On Thu, Apr 30, 2015 at 10:15 PM, Stephan Ewen wrote:
> For the variant with the "run()" method, this requires strong assumptions
> about the internal behavior of the source.
>
> Unless I am overlooking something, the source needs to guarantee this:
>
> - It need
I think your assumption (and the current kafka source implementation) is
that there is one state object that you update/mutate all the time.
If you draw a snapshot state object at the time of checkpoint, the source
can continue and that particular offset is remembered as the state of this
checkpoi
For the variant with the "run()" method, this requires strong assumptions
about the internal behavior of the source.
Unless I am overlooking something, the source needs to guarantee this:
- It needs to lock internally and perform the state update and record emit
call inside the locked scope
-
Regarding the commits (for instance kafka offset):
I dont exactly get how you mean to do this, if the source continues
processing after the checkpoint and before the commit, it will not know
what state has been committed exactly, so it would need to know the time of
checkpoint and store a local co
Hi,
The only thing we need is to guarantee that the source will not output any
records or update the state while we take the snapshot and send the
barrier. There are multiple ways of doing this I guess. We could simply
lock on these objects for instance or add the methods you wrote. If we
lock, we
Thanks for the comments!
Concerning acknowledging the checkpoint:
The sinks need to definitely acknowledge it.
If we asynchronously write the state of operator (and emit downstream
barriers before that is complete),
then I think that we also need those operators to acknowledge the
checkp
I agree with all suggestions, thanks for summing it up Stephan.
A few more points I have in mind at the moment:
- Regarding the acknowledgements, indeed we don’t need to make all operators
commit back, we just have to make sure that all sinks have acknowledged a
checkpoint to consider it comple
There is already support for inflate compressed files and I introduced logic to
handle unsplittable formats.
Sent from my iPhone
> On 30.04.2015, at 19:39, Stephan Ewen wrote:
>
> I think that would be very worthwhile :-) Happy to hear that you want to
> contribute that!
>
> Decorating the i
I was looking into the handling of state in streaming operators, and it is
a bit hidden from the system
Right now, functions can (of they want) put some state into their context.
At runtime, state may occur or not. Before runtime, the system cannot tell
which operators are going to be stateful, an
Hi all!
I think we need to change the interface of the streaming source function.
The function currently has simply a run() method where it does its work,
until canceled.
With this, it is hard to write sources, where the state and the snapshot
barriers are exactly aligned.
When performing the ch
I think that would be very worthwhile :-) Happy to hear that you want to
contribute that!
Decorating the input stream sounds like a great approach and would also
work for other compression formats.
The other thing that needs to be taken into account is that GZIP files are
not splittable in the sa
Sound good to me =)
- Henry
On Thu, Apr 30, 2015 at 2:29 AM, Maximilian Michels wrote:
> Hi everyone,
>
> As of today [0], the ASF officially offers Git-based repositories for the
> project websites. I filed a JIRA [1] to get us a Git repository for our
> website. I would assume that everyone li
Thank you for the update.
I think its best if we wait for you to come back because you know all the
details of the migration ,)
On Thu, Apr 30, 2015 at 6:07 PM, Maximilian Michels wrote:
> I transferred the website files to the new Git repository. I changed the
> website to build to the "conten
I transferred the website files to the new Git repository. I changed the
website to build to the "content" instead of the "site" directory, renamed
the files, and adapted the "How to contribute" guide. I pushed everything
to the "asf-site" branch (has to be in that branch, master is not possible
at
Hi everyone,
I just recently came across a use-case where I needed to read gzip files and
handle byte order marks transparently. I know that gzip can be read with Hadoop
input formats but that did not work for me since I wanted to reuse my existing
custom Flink input formats.
It turned out tha
Theodore Vasiloudis created FLINK-1965:
--
Summary: Implement the Orthant-wise Limited Memory QuasiNewton
optimization algorithm, a variant of L-BFGS that handles L1 regularization
Key: FLINK-1965
URL: https://
Robert Metzger created FLINK-1964:
-
Summary: Rework TwitterSource to use a Properties object instead
of a file path
Key: FLINK-1964
URL: https://issues.apache.org/jira/browse/FLINK-1964
Project: Flink
Fabian Hueske created FLINK-1963:
Summary: Improve distinct() transformation
Key: FLINK-1963
URL: https://issues.apache.org/jira/browse/FLINK-1963
Project: Flink
Issue Type: Improvement
So, would it be a reasonable solution to just modify the
WindowUtils.isParallelPolicy() method to return false in case of "eviction
instanceof CountEvictionPolicy && trigger instanceof TimeTriggerPolicy" ?
2015-04-30 12:21 GMT+02:00 Gyula Fóra :
> I'm referring to Peter's problem. If we just crea
I'm referring to Peter's problem. If we just create more count discretizers
it doesn't really break the semantics given the network guarantees but it
is not very intuitive.
On Thursday, April 30, 2015, Aljoscha Krettek wrote:
> @Gyula: Are you referring to the pre-aggregator or the thing Peter
>
Great! :)
Regards.
Chiwan Park (Sent with iPhone)
> On Apr 30, 2015, at 6:52 PM, Fabian Hueske wrote:
>
> excellent! :-)
>
> 2015-04-30 11:47 GMT+02:00 Stephan Ewen :
>
>> git for the win!
>>
>> On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger
>> wrote:
>>
>>> Great, thank you for taking
Vasia Kalavri created FLINK-1962:
Summary: Add Gelly Scala API
Key: FLINK-1962
URL: https://issues.apache.org/jira/browse/FLINK-1962
Project: Flink
Issue Type: Task
Components: Gell
@Gyula: Are you referring to the pre-aggregator or the thing Peter mentioned?
On Thu, Apr 30, 2015 at 11:12 AM, Gyula Fóra wrote:
> The problem is in the WindowUtils.isParallel policy method. It makes count
> policies automatically parallel as well.
>
> On Thursday, April 30, 2015, Aljoscha Krett
excellent! :-)
2015-04-30 11:47 GMT+02:00 Stephan Ewen :
> git for the win!
>
> On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger
> wrote:
>
> > Great, thank you for taking care of this.
> >
> > On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels
> > wrote:
> >
> > > Hi everyone,
> > >
> > > As
git for the win!
On Thu, Apr 30, 2015 at 11:39 AM, Robert Metzger
wrote:
> Great, thank you for taking care of this.
>
> On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels
> wrote:
>
> > Hi everyone,
> >
> > As of today [0], the ASF officially offers Git-based repositories for the
> > project
Great, thank you for taking care of this.
On Thu, Apr 30, 2015 at 11:29 AM, Maximilian Michels wrote:
> Hi everyone,
>
> As of today [0], the ASF officially offers Git-based repositories for the
> project websites. I filed a JIRA [1] to get us a Git repository for our
> website. I would assume t
Hi everyone,
As of today [0], the ASF officially offers Git-based repositories for the
project websites. I filed a JIRA [1] to get us a Git repository for our
website. I would assume that everyone likes the idea of switching to Git.
If not, please raise your objections.
The new repository is alre
The problem is in the WindowUtils.isParallel policy method. It makes count
policies automatically parallel as well.
On Thursday, April 30, 2015, Aljoscha Krettek wrote:
> Hi,
> no, I think the two are unrelated. But that's another problem we need
> to tackle then.
>
> Cheers,
> Aljoscha
>
> On T
There was an attempt to build such a queue during the Dopa project when
Flink was still Stratosphere.
Probably it could be a good idea to collect the good and bad things learned
from it to start designing the new scheduler :)
On Thu, Apr 30, 2015 at 10:08 AM, Stephan Ewen wrote:
> Most component
@Fabian: Small misunderstanding :) I suggested to get rid of the multi-user
mode in a standalone setup because Flink's support is not sufficient
enough. Implementing proper support, as Stephan mentioned, is some work.
It's already done in YARN.
IMO we can save a lot of time, lines of code, and bra
Hi,
no, I think the two are unrelated. But that's another problem we need
to tackle then.
Cheers,
Aljoscha
On Thu, Apr 30, 2015 at 9:15 AM, Szabó Péter wrote:
> Hey,
>
> our intern, Pablo pointed out that there is some problem with mixed
> windowing policies. When you write
> ...
> .window(C
Most components are written multi-job aware.
The only thing that is not in there right now is scheduling policies for
fair resource sharing. This is important in shared clusters.
Since YARN implements all those things (various job queues with different
priorities/policies etc), I suggest to not t
Hey,
our intern, Pablo pointed out that there is some problem with mixed
windowing policies. When you write
...
.window(Count ...)
.every(Time ...)
.mapWindow(...)
...
The result makes no sense, as the window is not of the specified length.
Maybe, there is some conflict between Time and
35 matches
Mail list logo