I would expect that to be possible as well, yes.
> On 21. Apr 2018, at 17:33, Vishal Santoshi wrote:
>
> >> After the savepoint state has been written, the sink might start new
> >> .in-progress files. These files are not part of the savepoint but renamed
> >> to .pending in close().
> >> On r
>> After the savepoint state has been written, the sink might start new
.in-progress files. These files are not part of the savepoint but renamed
to .pending in close().
>> On restore all pending files that are part of the savepoint are moved
into final state (and possibly truncated). See
handlePen
Thank you Fabian,
What is more important ( and I think you might have addressed it in
your post so sorry for being a little obtuse ) is that deleting them does
not violate "at-least-once" delivery. And if that is a definite than we
are fine with it, though we will test it further.
Thanks and
Hi Vishal, hi Mu,
After the savepoint state has been written, the sink might start new
.in-progress files. These files are not part of the savepoint but renamed
to .pending in close().
On restore all pending files that are part of the savepoint are moved into
final state (and possibly truncated).
Hi Aljoscha,
Thanks for confirming that fact that Flink doesn't clean up pending files.
Is that safe to clean(remove) all the pending files after cancel(w/ or w/o
savepointing) or failover?
If we do that, will we lose some data?
Thanks!
Best,
Mu
On Wed, Feb 21, 2018 at 5:27 AM, Vishal Santos
Sorry, but just wanted to confirm that the assertion "at-least-once"
delivery true if there is a dangling pending file ?
On Mon, Feb 19, 2018 at 2:21 PM, Vishal Santoshi
wrote:
> That is fine, till flink assure at-least-once semantics ?
>
> If the contents of a .pending file, through the turbu
That is fine, till flink assure at-least-once semantics ?
If the contents of a .pending file, through the turbulence ( restarts etc
) are assured to be in another file than anything starting with "_"
underscore will by default ignored by hadoop ( hive or MR etc ).
On Mon, Feb 19, 2018 at 11:03
Hi,
Sorry for the confusion. The framework (Flink) does currently not do any
cleanup of pending files, yes.
Best,
Aljoscha
> On 19. Feb 2018, at 17:01, Vishal Santoshi wrote:
>
> >> You should only have these dangling pending files after a failure-recovery
> >> cycle, as you noticed. My sugg
>> You should only have these dangling pending files after a
failure-recovery cycle, as you noticed. My suggestion would be to
periodically clean up older pending files.
A little confused. Is that what the framework should do, or us as part of
some cleanup job ?
On Mon, Feb 19, 2018 at 10:47
Hi,
The BucketingSink does not clean up pending files on purpose. In a distributed
setting, and especially with rescaling of Flink operators, it is sufficiently
hard to figure out which of the pending files you actually can delete and which
of them you have to leave because they will get moved
Hi Vishal,
what pending files should indeed get eventually finalized. This happens on
a checkpoint complete notification. Thus, what you report seems not right.
Maybe Aljoscha can shed a bit more light into the problem.
In order to further debug the problem, it would be really helpful to get
acce
Hi Vishal,
I have the same concern about save pointing in BucketingSink.
As for your question, I think before the pending files get cleared in
handleRestoredBucketState .
They are finalized in notifyCheckpointComplete
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-fi
-rw-r--r-- 3 root hadoop 11 2018-02-14 18:48
/kraken/sessions_uuid_csid_v3/dt=2018-02-14/_part-0-18.valid-length
-rw-r--r-- 3 root hadoop 54053518 2018-02-14 19:15
/kraken/sessions_uuid_csid_v3/dt=2018-02-14/_part-0-19.pending
-rw-r--r-- 3 root hadoop 11 2018-02-14 21:17
/
without --allowNonRestoredState, on a suspend/resume we do see the length
file along with the finalized file ( finalized during resume )
-rw-r--r-- 3 root hadoop 10 2018-02-09 13:57
/vishal/sessionscid/dt=2018-02-09/_part-0-28.valid-length
that does makes much more sense.
I guess we sh
This is 1.4 BTW. I am not sure that I am reading this correctly but the
lifecycle of cancel/resume is 2 steps
1. Cancel job with SP
closeCurrentPartFile
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors
What should be the behavior of BucketingSink vis a vis state ( pending ,
inprogess and finalization ) when we suspend and resume ?
So I did this
* I had a pipe writing to hdfs suspend and resume using
--allowNonRestoredState as in I had added a harmless MapOperator (
stateless ).
* I see that
16 matches
Mail list logo