Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
We can keep calculating the total time in the checkpoint coordinator. That
way, it included message roundtrips.
The state handles (or whatever is the successor to them) should have the
s
Github user aljoscha commented on the issue:
https://github.com/apache/flink/pull/2345
@StephanEwen That would probably be good, on a side note, time is
calculated in the `CheckpointCoordinator` as ` - ` right now.
This is somewhat confusing because it does not give the actual
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
How about having two times: The "synchronous" component and the
"asynchronous" component of the time. Plus probably a total for those that
don't want to sum them up themselves.
---
If your proj
Github user uce commented on the issue:
https://github.com/apache/flink/pull/2345
+1 to merge this.
On a related note: we might want to add information to the checkpoint
statistics part of the web frontend saying whether the checkpoints were async
or sync. Users might get con
Github user wenlong88 commented on the issue:
https://github.com/apache/flink/pull/2345
@StephanEwen you are right. But in specific situation, we may need some
temporary compromise do make the system work well, and then remove the
compromised points latter as soon as possible .
I
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
I can see the merits of both checkpointing approaches but Stephan is right
in the sense that allowing semi-async snapshots with dynamic scaling would need
a completely new implementation and would pro
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
We are very much trying to avoid different db instances per operator. Each
db instance will add quite a big new memory footprint. For higher
max-parallelism, this will be a problem.
The
Github user wenlong88 commented on the issue:
https://github.com/apache/flink/pull/2345
3q, @aljoscha
Maybe we can use different db instance for different key group. this
approach can only work well when key group number is limited to a certain
number which can not be too large.
Github user aljoscha commented on the issue:
https://github.com/apache/flink/pull/2345
@wenlong88 I was talking about this effort to enable key-group sharding in
Flink: https://issues.apache.org/jira/browse/FLINK-3755
With this is becomes necessary to checkpoint the keyed stat
Github user wenlong88 commented on the issue:
https://github.com/apache/flink/pull/2345
@aljoscha I am curious at what problem and incompatibility between
semi-aync snapshot and key-group, can you explain some background information?
---
If your project is set up for it, you can repl
Github user wenlong88 commented on the issue:
https://github.com/apache/flink/pull/2345
@StephanEwen
http://rocksdb.org/blog/2609/use-checkpoints-for-efficient-snapshots/
since sst files is immutable once created, in when doing checkpoint,
rocksdb creates hard link for all live s
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
@wenlong88 Can you explain this a bit more? How exactly are you creating
the local checkpoint?
---
If your project is set up for it, you can reply to this email and have your
reply appear on Gi
Github user wenlong88 commented on the issue:
https://github.com/apache/flink/pull/2345
@aljoscha we use rocksdb checkpoint machanism to do the semi-async
checkpoint, which use hard link to make checkpoint, cost quite a little IO and
time in synchronized phrase. This works well even w
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
Hi,
Isn't this way of checkpointing is much much slower then the semi async
version?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
But I wonder what would happen in a scenario with a lot of states:
Semi async: short local copy time at every snapshot + very fast restore
Fully async: no copy time + very slow restore (put
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
Good thing about the way fully async checkpoints are restored though is
that it is very trivial to insert some state adaptor code :)
---
If your project is set up for it, you can reply to this email
Github user aljoscha commented on the issue:
https://github.com/apache/flink/pull/2345
Jip, that's also good. ð
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wi
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
+1 for this
There seems to be an issue with the RocksDB backup engine, to we should
probably discourage that mode even in current releases.
I would also remove the `HDFSCopyFromL
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
But you are right it is probably more important to keep the latency down
for the running programs, and for that the fully async seems to be strictly
better
---
If your project is set up for it, you
Github user gyfora commented on the issue:
https://github.com/apache/flink/pull/2345
Some of the benefits we lose on restore. Especially for very large states
this can be pretty serious.
Maybe this is required for the sharding to some extent but I don't see this
as completely
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
The "full async" takes more time but runs completely in the background, so
performs better in most cases than "semi async".
---
If your project is set up for it, you can reply to this email and
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2345
@gyfora You mean if the "full async" is slower than the "semi async"?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proje
Github user aljoscha commented on the issue:
https://github.com/apache/flink/pull/2345
Technically, `HDFSCopyFromLocal` and `HDFSCopyToLocal` are now unused.
Should we remove them? They might be useful for some stuff in the future.
---
If your project is set up for it, you can reply
23 matches
Mail list logo