Hi,
does the python API expose some kind of mapGroupsWithState operator which
can be applied on a window to handle arbitrary state?
I would want to perform time-series anomaly detection using a streaming
implementation of the matrix profile algorithm using
https://github.com/TDAmeritrade/stumpy T
> The job sub directory will be cleaned up when the job
> finished/canceled/failed.
What does this mean?
Also, to clarify: I'm a very sloppy developer. My jobs crash ALL the time...
and yet, the jobs would ALWAYS resume from the last checkpoint.
The only cases where I expect Flink to clean u
AFAIK, the HA data, including Zookeeper meta data and real data on DFS,
will only be cleaned up
when the Flink cluster reached terminated state.
So if you are using a session cluster, the root cluster node on Zk will be
cleaned up after you manually
stop the session cluster. The job sub directory
Hi, Rex,
I've tried to use MinIO as state backend and everything seems works well.
Just sharing my configuration:
```
s3.access-key:
s3.secret-key:
s3.endpoint: http://localhost:9000
s3.path.style.access: true
state.checkpoints.dir: s3://flink/checkpoints
```
I think the problem might be caused b
Hello!
I'm trying to test out Minio as state storage backend using docker-compose
on my local machine but keep running into errors that seem strange to me.
Any help would be much appreciated :)
The problem:
With the following environment:
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address
Thanks a bunch!
>For example, the Flink Kafka source operator's parallel instances maintain
as operator state a mapping of partitions to offsets for the partitions
that it is assigned to.
This I think clarifies things. This is literally state for the operator to
do its job, not really row data. T
Hello,
How do I visualize (or extract) the results for Latency Tracking for a
Flink local cluster? I set "metrics.latency.interval 100" in the
conf/flink-conf.yaml file, and started the cluster and
SocketWindowWordCount job. However, I could not find the latency
distributions anywhere in the web U
Hi Piotr,
2) SystemProcessingTimeService holds the HeapKeyedStateBackend and
HeapKeyedStateBackend has lot of Objects and that is filling the Heap
3) I am not using Flink Kafka Connector. But we are using Apache Beam
kafka connector. There is a change in the Apache Beam version. But the
kafk
Hi Juha
I planned to give some descriptions in Flink documentation to give such hints,
however, it has too many details for RocksDB and we could increase the managed
memory size to a proper value to avoid this in most cases.
Since you have come across this and reported in user mailing list, and
I'm using the standalone script to start the cluster.
As far as I can tell, it's not easy to reproduce. We found that zookeeper lost
a node around the time this happened, but all of our other 75 Flink jobs which
use the same setup, version and zookeeper, didn't have any issues. They didn't
eve
Hi,
this is indeed the correct behaviour right now. Which doesn't mean that
it's the behaviour that we would like to have.
The reason why we can't move the "pending" files to "final" is that we
don't have a point where we can do this in an idempotent and retryable
fashion. When we do regular
You are using the old connectors. The new connectors are available via
SQL DDL (and execute_sql() API) like documented here:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/filesystem.html
Maybe this will give your some performance boost, but certainly not
eno
Hi Timo,
I actually tried many things, increasing jvm heap size and flink managed
memory didn't help me. Running the same query without group by clause like
this:
select
avg(transaction_amount) as avg_ta,
avg(salary+bonus) as avg_income,
avg(salary+bonus) - avg(transaction_amount) as
Hi Sunitha,
what you are describing is a typical streaming enrichment. We need to
enrich the stream with some data from a database. There are different
strategies to handle this:
1) You are querying the database for every record. This is usually not
what you want because it would slow down y
Hi Vijay,
one comment to add is that the performance might suffer with multiple
map() calls. For safety reason, records between chained operators are
serialized and deserialized in order to strictly don't influence each
other. If all functions of a pipeline are guaranteed to not modify
incomi
Hi Violeta,
can you share your connector code with us? The plan looks quite
complicated given the relatively simple query. Maybe there is some
optimization potential. But before we dive deeper, I see a `Map(to:
Row)` which indicates that we might work with a legacy sink connector.
Did you tr
Hey Yun,
Thanks for the detailed answer. It clarified how things work. Especially what
is the role of RocksDB arena, and arena block size.
I think there's no real-world case where it would make sense to start to a
Flink job with RocksDB configured so that RocksDB flushes all the time, i.e.
whe
Thanks a lot for reporting this problem here Cristian!
I am not super familiar with the involved components, but the behavior you
are describing doesn't sound right to me.
Which entrypoint are you using? This is logged at the beginning, like this:
"2020-09-08 14:45:32,807 INFO
org.apache.flink.ru
> The only one where I could see that users want different behaviour
> BATCH jobs on the DataStream API. I agree that processing-time does
> not make much sense in batch jobs. However, if users have written some
> business logic using processing-time timers their jobs will silently
> not work if we
I agree with almost all of your points!
The only one where I could see that users want different behaviour BATCH
jobs on the DataStream API. I agree that processing-time does not make
much sense in batch jobs. However, if users have written some business
logic using processing-time timers thei
Hey Aljoscha
A couple of thoughts for the two remaining TODOs in the doc:
# Processing Time Support in BATCH/BOUNDED execution mode
I think there are two somewhat orthogonal problems around this topic:
1. Firing processing timers at the end of the job
2. Having processing timers in the B
21 matches
Mail list logo