Hi Ken,
I've got a Flink MiniCluster with 12 slots. Even with only 6 pipelined
> operators, each with a parallelism of 1, it still hangs while starting.
>
Could you double check that the minicluster has 12 slots when each or your
operators has only 1 parallelism?
I've looked into the codes. Curr
Hi, Ken
>From the RM perspective, could you share the following logs:
- "Request slot with profile {} for job {} with allocation id {}.".
- "Requesting new slot [{}] and profile {} with allocation id {} from
resource manager."
This will help to figure out how many slots your job indeed requests.
A
Hi Til,
> On Sep 3, 2020, at 12:31 AM, Till Rohrmann wrote:
>
> Hi Ken,
>
> I believe that we don't have a lot if not any explicit logging about the slot
> sharing group in the code. You can, however, learn indirectly about it by
> looking at the required number of AllocatedSlots in the SlotP
Heidy, which state backend are you using? With RocksDB Flink will have to
do ser/de on every access and update, but with the FsStateBackend, your
sparse matrix will sit in memory, and only have to be serialized during
checkpointing.
David
On Wed, Sep 9, 2020 at 2:41 PM Heidi Hazem Mohamed
wrote:
Good news!
Eliminating
bsEnv.setStateBackend(
new RocksDBStateBackend(
"s3://flink-jdbc-test_graph-minio_1/data/checkpoints:9000",
true
)
)
moving all configuration into FLINK_PROPERTIES and switching to http seemed
to do the trick!
Thanks for all the help!
On Wed, Sep 9, 2020 at 9
Arti,
The problem with watermarks and the File source operator will be fixed in
1.11.2 [1]. This bug was introduced in 1.10.0, and isn't related to the new
WatermarkStrategy api.
[1] https://issues.apache.org/jira/browse/FLINK-19109
David
On Wed, Sep 9, 2020 at 2:52 PM Arti Pande wrote:
> Hi
Thanks yall,
Yangze,
> I've tried to use MinIO as state backend and everything seems works well
For clarity, I'm using RocksDB state backend with Minio as state storage.
> s3.endpoint: http://localhost:9000
Also, I'm doing everything from docker-compose so localhost isn't going to
work in my case.
Generally +1
The one use case I've seen of union state I've seen in production (outside
of sources and sinks) is as a "poor mans" broadcast state. This was
obviously before that feature was added which is now a few years ago so I
don't know if those pipelines still exist. FWIW, if they do the stat
+1 to getting rid of non-keyed state as is in general and for union state
in particular. I had a hard time to wrap my head around the semantics of
non-keyed state when designing the rescale of unaligned checkpoint.
The only plausible use cases are legacy source and sinks. Both should also
be rewor
Hi Rex,
you could also check the end to end tests that use minio in flink's repo.
You definitely need to use an http endpoint.
The setup [1] uses also another way to specify the s3.path.style.access
(with dashes). I think we needed it especially for presto. It seems like
the settings differ a bit
Hi Devs,
@Users: I'm cc'ing the user ML to see if there are any users that are
relying on this feature. Please comment here if that is the case.
I'd like to discuss the deprecation and eventual removal of UnionList
Operator State, aka Operator State with Union Redistribution. If you
don't kn
Pankaj,
I just checked, and the latency metrics for SocketWindowWordCount show up
just fine for me with Flink 1.11.1. For me, the latency metrics existed
even before I provided any data on the socket for the job to process. This
makes sense, as the latency tracking markers will propagate through t
Hi Xingbo and Till,
Thank you for your help!
On Wed, Sep 2, 2020 at 9:38 PM Xingbo Huang wrote:
> Hi Manas,
>
> As Till said, you need to check whether the execution environment used is
> LocalStreamEnvironment. You need to get the class object corresponding to
> the corresponding java object th
Hi David,
Thanks for replying! Sorry, I forgot to mention I am using Flink Version:
1.11.1, Commit ID: 7eb514a.
Is it possible that the default SocketWindowWordCount job is too simple to
generate Latency metrics? Or that the latency metrics disappear from the
output JSON when the data ingestion is
Hi Aljoscha,
By "checkpoints do not work" what I mean is ever since Flink 1.9.2 till
1.11.1 when using File source the source operator (guessing split
enumerator or metadata reader) finishes immediately after starting (and
assigning the splits to split readers) hence when first checkpoint is
trigg
Hi Josson,
Thanks for getting back.
What are the JVM settings and in particular GC settings that you are using
(G1GC?)?
It could also be an issue that in 1.4 you were just slightly below the
threshold of GC issues, while in 1.8, something is using a bit more memory,
causing the GC issues to appea
Hi Walther,
Many thanks for your answer, I declared the state type as below
ValueStateDescriptor descriptor =
new ValueStateDescriptor(
"Rating Matrix",
TypeInformation.of(new TypeHint() {
}
));
Is there a better way?
Regards,
Heidy
Hi Hazem,
I guess your performance is mostly driven by the serialization overhead
in this case. How do you declare your state type?
Flink comes with different serializers. Not all of them are extracted
automatically when using reflective extraction methods:
- Note that `Serializable` decla
Dear,
I am writing a Flink program(Recommender system) needed a matrix as a state
which is the rating matrix, While the matrix is very sparse, I implemented a
sparse binary matrix to save the memory and save only the ones, not all the
matrix and use it as a data type and save it in a value Stat
Hey Juha!
I agree that we cannot reasonably expect from the majority of users to
understand block sizes, area sizes, etc to get their application running.
So the default should be "inform when there is a problem and suggest to use
more memory." Block/arena size tuning is for the absolute expertes,
Pankaj,
The Flink web UI doesn't do any visualizations of histogram metrics, so the
only way to access the latency metrics is either through the REST api or a
metrics reporter.
The REST endpoint you tried is the correct place to find these metrics in
all recent versions of Flink, but somewhere ba
Flink's built-in JDBC connector will read the data only once. JDBC does
not provide means to continously monitor a database table.
It depends on the size of your database, if you parameter table is small
it might be sufficient to write a simple Flink connector that
periodically reads the table
Hey Yun,
About the docs. I saw in the docs
(https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html)
this:
"An advanced option (expert mode) to reduce the number of MemTable flushes in
setups with many states, is to tune RocksDB’s ColumnFamily options (arena bl
Thanks for driving, this, it is a great find and a nice proposal for a
solution.
I generally really like the idea of the block size sanity checker.
I would also suggest to first go with logging a big fat WARNING rather than
crashing the job. Crashing the job like this would be an unrecoverable
fa
Hi Timo,
Thank you for the suggestions.
I see now both Process function and CEP approach will not fit in. Now if I
follow the third approach to stream the values from database() . Is it possible
to stream data continuously?
If I follow the bellow approach, both I see one time load only not cont
> The job sub directory will be cleaned up when the job
finished/canceled/failed.
Since we could submit multiple jobs into a Flink session, what i mean is
when a job
reached to the terminal state, the sub node(e.g.
/flink/application_/running_job_registry/4d255397c7aeb5327adb567238c983c1)
on th
26 matches
Mail list logo