It seems I set a wrong high-availability.storageDir,
s3://flink-test/recovery can work, but s3:///flink-test/recovery can not,
one / be removed.
Joshua Fan 于2021年8月5日周四 上午10:43写道:
> Hi Robert, Tobias
>
> I have tried many ways to build and validate the image.
>
> 1.put the s3 dependency to plu
Hello!
At Signifyd we use machine learning to protect our customers from credit
card fraud. Efficiently calculating feature values for our models based on
historical data is one of the primary challenges we face, and we’re meeting
it with Flink.
We need our system to be highly available and quick
Hi Robert, Tobias
I have tried many ways to build and validate the image.
1.put the s3 dependency to plugin subdirectory, the Dockerfile content is
below:
> FROM apache/flink:1.13.1-scala_2.11
> ADD ./flink-s3-fs-hadoop-1.13.1.jar
> /opt/flink/plugins/s3-hadoop/flink-s3-fs-hadoop-1.13.1.jar
> AD
Oh, and in batch jobs even if the whole DAG is a single node this is not
guaranteed. For example, for a sort operator the record will be stored in
memory or on disk and only after all records have arrived will these
records be sorted and sent to the downstream. So the state in your ASO is
still nee
Hi!
There is no such guarantee unless the whole DAG is a single node. Flink's
runtime runs the same node (task) in the same thread, while different nodes
(tasks) are executed in different threads, even in different machines.
Yuval Itzchakov 于2021年8月5日周四 上午2:10写道:
> Hi,
>
> I have a question reg
Hello
I read in this article
https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html
that it's possible to use SpecificRecordBase.class in the operators:
Avro Specific
Avro specific records will be automatically detected by checking that the given
type’s type hierarchy c
In my flink streaming application I have kafka datasource. I am using the
kafka property auto.offset.reset=latest. I am wondering if I need to use
FlinkKafkaConsumer.setStartFromLatest(). Are they similar? Can I use either
of them? Following is the documentation from flink code.
/**
* Specifies th
Hi Arvid,
I had 5.5.2 bundled into my application jar. I was able to get the
https://github.com/apache/flink/pull/16693 working by ensuring that
kafka-clients==2.4.1 was used just now. Thanks!!
On Wed, Aug 4, 2021 at 1:04 PM Arvid Heise wrote:
> Hi Kevin,
>
> Which Kafka client version are you
Hi again,
After a bit of experimentation (and actually reading the bug report I linked),
I realized the issue was that the parallelism was higher than the number of
Kafka partitions => reducing the parallelism enabled the checkpoints to work as
expected.
=> since it seems unsupported, maybe K
He folks,
This is a crosspost of a stack overflow question
(https://stackoverflow.com/questions/68631624/flink-job-cant-use-savepoint-in-a-batch-job)
which didn’t get any replies yet so please bare with me.
Let me start in a generic fashion to see if I somehow missed some concepts: I
have a st
Hi Nicolaus,
I double checked again our hdfs config, it is setting 1 instead of 2.
I will try the solution you provided.
Thanks again.
Best regards
Rainie
On Wed, Aug 4, 2021 at 10:40 AM Rainie Li wrote:
> Thanks for the context Nicolaus.
> We are using S3 instead of HDFS.
>
> Best regards
> R
Hi,
I have a question regarding the semantics of event processing from a source
downstream that I want to clarify.
I have a custom source which offloads data from our data warehouse. In my
custom source, I have some state which keeps track of the latest timestamps
that were read. When unloading,
Thanks for the detailed explanation Yun Tang and clearly all of the effort
you have put into it. Based on what was described here I would also vote
for going forward with the upgrade.
It's a pity that this regression wasn't caught in the RocksDB community. I
would have two questions/ideas:
1. Can
Hi Yun,
Thank you for the elaborate explanation and even more so for the super hard
work that you're doing digging into RocksDB and chasing after hundreds of
commits in order to fix them so we can all benefit.
I can say for myself that optimizing towards memory is more important
ATM for us, and I'
Hi Bin,
Flinks Kafka source has been rewritten using the new Source API.
You can find it here:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java
On Wed, Aug 4, 2021 at 8:51 PM Xinbin Huang wro
Hi team,
I'm trying to develop a custom source using the new Data Source API but I
have some hard time finding examples for it. Can you point me to some
existing Sources implemented with the new Data Source API? It would be
ideal if source is for a pull-based unbound source (i.e. Kafka).
Thanks!
Hi Yuval,
Upgrading RocksDB version is a long story since Flink-1.10.
When we first plan to introduce write buffer manager to help control the memory
usage of RocksDB, we actually wanted to bump up to RocksDB-5.18 from current
RocksDB-5.17. However, we found performance regression in our micro b
Thanks for the context Nicolaus.
We are using S3 instead of HDFS.
Best regards
Rainie
On Wed, Aug 4, 2021 at 12:39 AM Nicolaus Weidner <
nicolaus.weid...@ververica.com> wrote:
> Hi Rainie,
>
> I found a similar issue on stackoverflow, though quite different
> stacktrace:
> https://stackoverflow.
I would strongly recommend not using the harness for testing user functions.
Instead I'd just create an ITCase like this:
@ClassRule
public static final MiniClusterWithClientResource MINI_CLUSTER =
new MiniClusterWithClientResource(
new MiniClusterResourceConfiguration.Bui
Hi Kevin,
Which Kafka client version are you using? (=What is effectively bundled
into your application jar?)
On Wed, Aug 4, 2021 at 5:56 PM Kevin Lam wrote:
> Thanks Matthias.
>
> I just tried this backport (https://github.com/apache/flink/pull/16693)
> and got the following error, with the re
HI
I am trying to use RichAsyncFunction with flink's test harness. My code
looks like below
final MetadataEnrichment.AsyncFlowLookup fn = new
MetadataEnrichment.AsyncFlowLookup();
final AsyncWaitOperatorFactory> operator = new AsyncWaitOperatorFactory<>(fn, 2000,
1, AsyncDataStream.OutputMode.ORD
That's actually also what I'm seeing most of the time and what I'd expect to
improve with the newer RocksDB version.
Hence, I'd also favour the upgrade even if there is a slight catch with
respect to performance - we should, however, continue to investigate this
together with the RocksDB communi
Thanks Matthias.
I just tried this backport (https://github.com/apache/flink/pull/16693) and
got the following error, with the reproduction I described in
https://lists.apache.org/thread.html/r528102e08d19d3ae446e5df75710009128c736735c0aaf310f95abeb%40%3Cuser.flink.apache.org%3E
(ie. starting job
Hey Joshua,
Can you first validate if the docker image you've built is valid by running
it locally on your machine?
I would recommend putting the s3 filesystem files into the plugins [1]
directory to avoid classloading issues.
Also, you don't need to build custom images if you want to use build-i
Robert
We are checking using the metric
flink_taskmanager_job_task_operator_KafkaConsumer_assigned_partitions{jobname="SPECIFICJOBNAME"}
This metric gives the number of partitions assigned to each task(kafka
consumer operator).
Prasanna.
On Wed, Aug 4, 2021 at 8:59 PM Robert Metzger wrote:
>
Hi Prasanna,
How are you checking the assignment of Kafka partitions to the consumers?
The FlinkKafkaConsumer doesn't have a rebalance() method, this is a generic
concept of the DataStream API. Is it possible that you are
somehow partitioning your data in your Flink job, and this is causing the
d
Hi Weston,
Oh indeed, you are right! I quickly tried restoring a 1.9 savepoint on a
1.11 runtime and it worked. So in principle this seems to be supported.
I'm including Timo into this thread, he has a lot of experience with the
serializers.
On Tue, Aug 3, 2021 at 6:59 PM Weston Woods wrote:
>
That's what I thought Dian.
The problem is that setting the watermark strategy like that didn't work
either, the method on_event_time is never called.
I did some reading of
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/event-time/generating_watermarks/#watermark
Robert
When we apply a rebalance method to the kafka consumer, it is assigning
partitions of various topics evenly.
But my only concern is that the rebalance method might have a performance
impact .
Thanks,
Prasanna.
On Wed, Aug 4, 2021 at 5:55 PM Prasanna kumar
wrote:
> Robert,
>
> Flink ve
Hi All
I want to build a custom flink image to run on k8s, below is my Dockerfile
content:
> FROM apache/flink:1.13.1-scala_2.11
> ADD ./flink-s3-fs-hadoop-1.13.1.jar /opt/flink/lib
> ADD ./flink-s3-fs-presto-1.13.1.jar /opt/flink/lib
>
I just put the s3 fs dependency to the {flink home}/lib, and
For the details of what causes this regression, I would add @Yun Tang
to this discussion.
On Wed, Aug 4, 2021 at 2:36 PM Yuval Itzchakov wrote:
> We are heavy users of RocksDB and have had several issues with memory
> managed in Kubernetes, most of them actually went away when we upgraded
> fro
We are heavy users of RocksDB and have had several issues with memory
managed in Kubernetes, most of them actually went away when we upgraded
from Flink 1.9 to 1.13.
Do we know why there's such a huge performance regression? Can we improve
this somehow with some flag tweaking? It would be great if
I am hearing quite often from users who are struggling to manage memory
usage, and these are all users using RocksDB. While I don't know for
certain that RocksDB is the cause in every case, from my perspective,
getting the better memory stability of version 6.20 in place is critical.
Regards,
Davi
Robert,
Flink version 1.12.2.
Flink connector Kafka Version 2..12
The partitions are assigned equally if we are reading from a single topic.
Our Use case is to read from multiple topics [topics r4 regex pattern] we
use 6 topics and 1 partition per topic for this job.
In this case , few of the k
Hi all!
*!!! If you are a big user of the Embedded RocksDB State Backend and have
performance sensitive workloads, please read this !!!*
I want to quickly raise some awareness for a RocksDB version upgrade we
plan to do, and some possible impact on application performance.
*We plan to upgrade R
@Yun Tang Does it make sense to add RocksDB to the "always parent-first
options" to avoid these kind of errors when users package apps incorrectly?
My feeling is that these packaging errors occur very frequently.
On Wed, Aug 4, 2021 at 10:41 AM Yun Tang wrote:
> Hi Sandeep,
>
> Did you include
Hi Robert,
Thanks for the feed-back.
You are correct, the behavior is indeed different: when I make the source
bounded, the application eventually stops whereas without that setting it runs
forever.
In both cases neither checkpoints nor data is being written to the filesystem.
I re-ran the e
Hi Sandeep,
Did you include the RocksDB classes in the application jar package? You can
unpark your jar package to check whether them existed.
If so, since RocksDB classes are already included in the flink-dist package,
you don't need to include them in your jar package (maybe you explicitly add
Thanks Till.
We terminated one of the worker nodes.
We enabled HA by using Zookeeper.
Sure, we will try upgrade job to newer version.
Best regards
Rainie
On Tue, Aug 3, 2021 at 11:57 PM Till Rohrmann wrote:
> Hi Rainie,
>
> It looks to me as if Yarn is causing this problem. Which Yarn node are
39 matches
Mail list logo