https://github.com/apache/flink/pull/11475
On Sat, Mar 21, 2020 at 10:37 AM Jacob Sevart wrote:
> Thanks, will do.
>
> I only want the time stamp to reset when the job comes up with no state.
> Checkpoint recoveries should keep the same value.
>
> Jacob
>
> On Sat, Mar 21, 2020 at 10:16 AM Till
Thanks, will do.
I only want the time stamp to reset when the job comes up with no state.
Checkpoint recoveries should keep the same value.
Jacob
On Sat, Mar 21, 2020 at 10:16 AM Till Rohrmann wrote:
> Hi Jacob,
>
> if you could create patch for updating the union state metadata
> documentatio
Hi Jacob,
if you could create patch for updating the union state metadata
documentation that would be great. I can help with the review and merging
this patch.
If the value stays fixed over the lifetime of the job and you know it
before starting the job, then you could use the config mechanism. W
Thanks, makes sense.
What about using the config mechanism? We're collecting and distributing
some environment variables at startup, would it also work to include a
timestamp with that?
Also, would you be interested in a patch to note the caveat about union
state metadata in the documentation?
J
Did I understand you correctly that you use the union state to synchronize
the per partition state across all operators in order to obtain a global
overview? If this is the case, then this will only work in case of a
failover. Only then, all operators are being restarted with the union of
all opera
Thanks! That would do it. I've disabled the operator for now.
The purpose was to know the age of the job's state, so that we could
consider its output in terms of how much context it knows. Regular state
seemed insufficient because partitions might see their first traffic at
different times.
How
Hi Jacob,
I think you are running into some deficiencies of Flink's union state here.
The problem is that for every entry in your list state, Flink stores a
separate offset (a long value). The reason for this behaviour is that we
use the same state implementation for the union state as well as for
Oh, I should clarify that's 43MB per partition, so with 48 partitions it
explains my 2GB.
On Fri, Mar 13, 2020 at 7:21 PM Jacob Sevart wrote:
> Running *Checkpoints.loadCheckpointMetadata *under a debugger, I found
> something:
> *subtaskState.managedOperatorState[0].sateNameToPartitionOffsets("
Running *Checkpoints.loadCheckpointMetadata *under a debugger, I found
something:
*subtaskState.managedOperatorState[0].sateNameToPartitionOffsets("startup-times").offsets.value
*weights
43MB (5.3 million longs).
"startup-times" is an operator state of mine (union list of
java.time.Instant). I see
Hi
As Gordon said, the metadata will contain the ByteStreamStateHandle, when
writing out the ByteStreamStateHandle, will write out the handle name --
which is a path(as you saw). The ByteStreamStateHandle will be created when
state size is small than `state.backend.fs.memory-threshold`(default is
Thanks, I will monitor that thread.
I'm having a hard time following the serialization code, but if you know
anything about the layout, tell me if this makes sense. What I see in the
hex editor is, first, many HDFS paths. Then gigabytes of unreadable data.
Then finally another HDFS path at the end
Hi Jacob,
As I said previously I am not 100% sure what can be causing this
behavior, but this is a related thread here:
https://lists.apache.org/thread.html/r3bfa2a3368a9c7850cba778e4decfe4f6dba9607f32addb69814f43d%40%3Cuser.flink.apache.org%3E
Which you can re-post your problem and monitor for a
Kostas and Gordon,
Thanks for the suggestions! I'm on RocksDB. We don't have that setting
configured so it should be at the default 1024b. This is the full "state.*"
section showing in the JobManager UI.
[image: Screen Shot 2020-03-04 at 9.56.20 AM.png]
Jacob
On Wed, Mar 4, 2020 at 2:45 AM Tzu-
Hi Jacob,
Apart from what Klou already mentioned, one slightly possible reason:
If you are using the FsStateBackend, it is also possible that your state is
small enough to be considered to be stored inline within the metadata file.
That is governed by the "state.backend.fs.memory-threshold" confi
Hi Jacob,
Could you specify which StateBackend you are using?
The reason I am asking is that, from the documentation in [1]:
"Note that if you use the MemoryStateBackend, metadata and savepoint
state will be stored in the _metadata file. Since it is
self-contained, you may move the file and rest
Per the documentation:
"The meta data file of a Savepoint contains (primarily) pointers to all
files on stable storage that are part of the Savepoint, in form of absolute
paths."
I somehow have a _metadata file that's 1.9GB. Running *strings *on it I
find 962 strings, most of which look like HDFS
16 matches
Mail list logo