Hi,
I was experimenting with DynamicKafkaSource with 2 clusters. My use case is of
a failover - when the active site fails, I want the Kafka Source to start
reading data from the standby site.
I observed that DynamicKafkaSource resets the offsets on Cluster-2 back to -3
though it was already at
are recommended to upgrade
to version 1.11.4 or 1.12.0, which fix this issue."
Cheers,
Jim
On Wed, Oct 30, 2024 at 1:26 AM Chirag Dewan via user
wrote:
Any view on this?
On Monday 28 October, 2024 at 04:16:17 pm IST, Chirag Dewan via user
wrote:
Hi,
There is a critical CVE
Any view on this?
On Monday 28 October, 2024 at 04:16:17 pm IST, Chirag Dewan via user
wrote:
Hi,
There is a critical CVE on Apache Avro - NVD - CVE-2024-47561
Is there a released Flink version which has upgraded Avro to 1.11.4 or 1.12?
If not, is it safe to upgrade just AVRO
Hi,
There is a critical CVE on Apache Avro - NVD - CVE-2024-47561
Is there a released Flink version which has upgraded Avro to 1.11.4 or 1.12?
If not, is it safe to upgrade just AVRO, keeping flink-avro on 1.16.3 (my
current Flink version).
Appreciate any inputs.
Thanks,Chirag
|
|
|
| | |
Hi,
I have a standalone Flink cluster which I have started using the jobmanager.sh
and taskmanager.sh scripts.
For security reasons, I wanted to disable the submit and cancel jar features
from the web ui but keep them enabled from the REST API so that my application
can submit jars.
But when I
th Redis I guess.
Best,Zakelly
On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user
wrote:
Hi,
I was looking at the FLIP-254: Redis Streams Connector and I was wondering if
Flink ever considered Redis as a state backend? And if yes, why was it
discarded compared to RocksDB?
If someone ca
Hi,
I was looking at the FLIP-254: Redis Streams Connector and I was wondering if
Flink ever considered Redis as a state backend? And if yes, why was it
discarded compared to RocksDB?
If someone can point me towards any deep dives on why RocksDB is a better fit
as a state backend, it would be h
of defensive programming for a public interface and
the decision here is to be more lenient when facing potentially erroneous user
input rather than blow up the whole application with a NullPointerException.
Best,Alexander Fedulov
On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user
wrote:
Hi
Hi Arjun,
Flink's FileSource doesnt move or delete the files as of now. It will keep the
files as is and remember the name of the file read in checkpointed state to
ensure it doesnt read the same file twice.
Flink's source API works in a way that single Enumerator operates on the
JobManager. Th
Hi,
I was looking at this check in DefaultFileFilter:
public boolean test(Path path) {
final String fileName = path.getName();
if (fileName == null || fileName.length() == 0) {
return true;
}Was wondering how can a file name be null?
And if null, shouldnt this be return false?
I
* Rotate the keytab time
to time* The keytab can be encrypted at rest but that's fully custom logic
outside of Flink
G
On Fri, Sep 15, 2023 at 7:05 AM Chirag Dewan via user
wrote:
Hi,
I am trying to implement a HDFS Source connector that can collect files from
Kerberos enabled HDFS. As pe
Hi,
I am trying to implement a HDFS Source connector that can collect files from
Kerberos enabled HDFS. As per the Kerberos support, I have provided my keytab
file to Job Managers and all the Task Managers.
Now, I understand that keytab file is a security concern and if left unsecured
can be use
he.org/flink/flink-docs-master/docs/deployment/security/security-delegation-token/
G
On Tue, Sep 5, 2023 at 1:31 PM Chirag Dewan via user
wrote:
Hi,
I am trying to use the FileSource to collect files from HDFS. The HDFS cluster
is secured and has Kerberos enabled.
My Flink cluster runs on Ku
Hi,
I am trying to use the FileSource to collect files from HDFS. The HDFS cluster
is secured and has Kerberos enabled.
My Flink cluster runs on Kubernetes (not using the Flink operator) with 2 Job
Managers in HA and 3 Task Managers. I wanted to understand the correct way to
configure the keytab
f97/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java#L57
Best,Ron
Chirag Dewan via user 于2023年8月17日周四 12:00写道:
Hi,I am trying to collect files from HDFS in my DataStream job. I need to
collect two types of files - CSV and Parquet.
I unders
Hi,I am trying to collect files from HDFS in my DataStream job. I need to
collect two types of files - CSV and Parquet.
I understand that Flink supports both formats, but in Streaming mode, Flink
doesnt support splitting these formats. Splitting is only supported in Table
API.
I wanted to under
Hi,
Can anyone share any experience on running Flink jobs across data centers?
I am trying to create a Multi site/Geo Replicated Kafka cluster. I want that my
Flink job to be closely colocated with my Kafka multi site cluster. If the
Flink job is bound to a single data center, I believe we will o
Hi,
We are tying to use Flink's File sink to distribute files to AWS S3 storage. We
are using Flink provided Hadoop s3a connector as plugin.
We have some observations that we needed to clarify:
1. When using file sink for local filesystem distribution, we can see that the
sink creates 3 se
`CsvBulkWriter` and
create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in
`DataStreamCsvITCase.testCustomBulkWriter`
Best,Shammon
On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user
wrote:
Hi,
I am working on a Java DataStream application and need to implement a File sink
with
Hi,
I am working on a Java DataStream application and need to implement a File sink
with CSV format.
I see that I have two options here - Row and Bulk
(https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1)
So for CSV file distribution wh
Hi,
Is it possible to use Avro 1.11 with Flink 1.14? I know that Avro version is
still at 1.10, but due to my job using Avro 1.11, I was planning to use it in
Flink as well.
Also, I know that Avro 1.10 had some performance issues with Flink 1.12
([FLINK-19440] Performance regression on 15.09.20
Hi,
I need to manage geo-redundancy in my Kafka cluster across zones. I am planning
to do this with Apache Mirror Maker to maintain an active-passive site.
I wanted to understand consumer and producer failover when the primary cluster
fails. Is there any way to detect and failover Flink's Kafka s
;t working (i.e., the overview over all jobs).
On 16/02/2022 06:15, Chirag Dewan wrote:
Ah, should have looked better. I think
https://issues.apache.org/jira/browse/FLINK-25732 causes this.
Are there any side effects of this? How can I avoid this problem so that it
doesn't affec
Ah, should have looked better. I think
https://issues.apache.org/jira/browse/FLINK-25732 causes this.
Are there any side effects of this? How can I avoid this problem so that it
doesn't affect my processing?
Thanks
On Wednesday, 16 February, 2022, 10:19:12 am IST, Chirag Dewan
Hi,
We are running a Flink cluster with 2 JMs in HA and 2 TMs on a standalone K8
cluster. After migrating to 1.14.3, we started to see some exceptions in the JM
logs:
2022-02-15 11:30:00,100 ERROR
org.apache.flink.runtime.rest.handler.job.JobIdsHandler [] POD_NAME:
eric-bss-em-sm-streamser
get the:
Caused by: org.apache.flink.util.SerializedThrowable:
java.lang.IllegalArgumentException: Key group 2 is not in
KeyGroupRange{startKeyGroup=64, endKeyGroup=95}.
I have checked that there's no concurrent access on the ValueState.
Any more leads?
Thanks,Chirag
On Monday, 7 June, 2021, 06:56:56 pm IST, Chirag
ed by: org.apache.flink.util.SerializedThrowable:
java.lang.IllegalArgumentException: Key group 2 is not in
KeyGroupRange{startKeyGroup=64, endKeyGroup=95}.
I have checked that there's no concurrent access on the ValueState.
Any more leads?
Thanks,Chirag
On Monday, 7 June, 2021, 06:56:56 pm IST, Ch
.
Does this lead to state corruption?
Thanks,Chirag
On Monday, 7 June, 2021, 08:54:39 am IST, Chirag Dewan
wrote:
Thanks for the reply Yun. I strangely don't see any nulls. And infact this
exception comes on the first few records and then job starts processing
normally.
Also, I don&
Thanks for the reply Yun. I strangely don't see any nulls. And infact this
exception comes on the first few records and then job starts processing
normally.
Also, I don't see any reason for Concurrent access to the state in my code.
Could more CPU cores than task slots to the Task Manager be th
Hi,
I am getting multiple exceptions while trying to use RocksDB as astate backend.
I have 2 Task Managers with 2 taskslots and 4 cores each.
Below is our setup:
Kafka(Topic with 2 partitions) ---> FlinkKafkaConsumer(2Parallelism) >
KeyedProcessFunction(4 Parallelism) > FlinkKafka
using the ProcessFunction on a keyed stream and
there you can use the TimerService. It is advised to use a KeyedProcessFunction
on a keyed stream, however for backwards compatibility the old behaviour has
been kept.
Hope that it clarifies the things a bit.
Best,
Dawid
On 17/03/2021 07:47, Chirag D
Hi,
Currently, both ProcessFunction and KeyedProcessFunction (and their CoProcess
counterparts) expose the Context and TimerService in the processElement()
method. However, if we use the TimerService in non keyed context, it gives a
runtime error.
I am a bit confused about these APIs. Is there
Hi,
I am intending to use the File source for a production use case. I have a few
use cases that are currently not supported like deleting a file once it's
processed.
So I was wondering if we can use this in production or write my own
implementation? Is there any recommendations around this?
Th
ee it.
Cheers,
Till
On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan wrote:
Hi,
We configured Job Manager HA with Kubernetes strategy and found that the Web UI
for all 3 Job Managers is accessible on their configured rpc addresses. There's
no information on the Web UI that suggests which Job Mana
Hi,
We configured Job Manager HA with Kubernetes strategy and found that the Web UI
for all 3 Job Managers is accessible on their configured rpc addresses. There's
no information on the Web UI that suggests which Job Manager is the leader or
task managers are registered to. However, from the log
Hi,
Can we have multiple replicas with ZK HA in K8 as well?In this case, how does
Task Managers and clients recover the Job Manager RPC address? Are they updated
in ZK?Also, since there are 3 replicas behind the same service endpoint and
only one of them is the leader, how should clients reach
Hi,
I am building an alerting system where based on some input events I need to
raise an alert from the user defined aggregate function.
My first approach was to use an asynchronous REST API to send alerts outside
the task slot. But this obviously involves IO from within the task and if I
under
Hi,
I am using Flink 1.7.2 with Kafka Connector 0.11 for Consuming records from
Kafka.
I observed that if the broker is down, Kafka Consumer does nothing but logs the
connection error and keeps on reconnecting to the broker. And infact the log
level seems to be DEBUG.
Is there any way to captu
Hi,
I was going through the Javadoc for CheckpointedFunction.java, it says that:
* // get the state data structure for the per-key state
* countPerKey = context.getKeyedStateStore().getReducingState(
* new ReducingStateDescriptor<>("perKeyCount", new
AddFunction<>()
ic is related to event-time alignment in sources, which has been
actively discussed in the community in the past and we might be able to solve
this in a similar way in the future.
Cheers,
Konstantin
On Fri, Feb 8, 2019 at 5:48 PM Chirag Dewan wrote:
Hi Vadim,
I would be interested in this too.
Pr
Hi Vadim,
I would be interested in this too.
Presently, I have to read my lookup source in the open method and keep it in a
cache. By doing that I cannot make use of the broadcast state until ofcourse
the first emit comes on the Broadcast stream.
The problem with waiting the event stream is the
Hi,
In the documentation, the JDBC sink is mentioned as a source on Table
API/stream.
Can I use the same sink with a Data stream as well?
My use case is to read the data from Kafka and send the data to Postgres.
I was also hoping to achieve Exactly-Once since these will mainly be Idempotent
writ
Hi,
Is there some sort of endorsed lib in Flink yet?
A brief about my use case :
I am using a 3PP in my job which uses SLF4J as logging facade but has included
a log4j1 binding in its source code. And I am trying to use log4j2 for my Flink
application.
I wired Flink to use log4j2 - added all depe
(aggregateFunction, windowFunction) and register metrics in the
windowFunction?
Best,
Dawid
On 19/12/2018 04:30, Chirag Dewan wrote:
Hi,
I am writing a Flink job for aggregating events in a window.
I am trying to use the AggregateFunction implementation for this.
Now, since
I have a similar issue. I raised a JIRA :
https://issues.apache.org/jira/browse/FLINK-11198
Thanks,
Chirag
On Wednesday, 19 December, 2018, 11:35:02 AM IST, Fabian Hueske
wrote:
Hi,
AFAIK it is not possible to collect metrics for an AggregateFunction.You can
open a feature request by cr
Hi,
I am writing a Flink job for aggregating events in a window.
I am trying to use the AggregateFunction implementation for this.
Now, since WindowedStream does not allow a RichAggregateFunction for
aggregation, I cant use the RuntimeContext to get the Metric group.
I dont even see any other w
ou can use
Thread.currentThread().getContextClassLoader(), which always should have the
user-code ClassLoader set.
Best,Aljoscha
On 4. Oct 2018, at 12:14, Chirag Dewan wrote:
Hi All,
Is there any other way to get hold of the FlinkUserClassLoaderother than the
RuntimeContext?
The problem is, Aggrega
Hi All,
Is there any other way to get hold of the FlinkUserClassLoaderother than the
RuntimeContext?
The problem is, AggregateFunction cant be a RichFunction. I understand that's
because of the state merging issue(from a thread here earlier). Now, I need
DynamicClassLoading in AggregateFunction.
de-to-broadcast-state-in-apache-flink
Am So., 30. Sep. 2018 um 10:48 Uhr schrieb Chirag Dewan
:
Thanks Lasse, that is rightly put. That's the only solution I can think of too.
Only thing which I can't get my head around is using the coMap and coFlatMap
functions with such a stream.
lookup is done first time is not simple but a simple solution
could be to implement a delay operation or keep the data in your process
function until data arrive from your database stream.
Med venlig hilsen / Best regardsLasse Nedergaard
Den 28. sep. 2018 kl. 06.28 skrev Chirag Dewan :
Hi,
I
Hi,
I saw Apache Flink User Mailing List archive. - static/dynamic lookups in flink
streaming being discussed, and then I saw this FLIP
https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API.
I know we havent made much progress on this topic. I still wanted to
.html#resuming-from- savepoints[2]: https://ci.apache.org/
projects/flink/flink-docs- release-1.5/monitoring/rest_
api.html#cancel-job-with- savepoint[3]: https://ci.apache.org/
projects/flink/flink-docs- release-1.5/ops/upgrading.html
Thanks, vino.
2018-07-19 14:25 GMT+08:00 Chirag Dewan :
Hi,
Hi,
I am planning to use the Stop Service for stopping/resuming/pausing my Flink
Job. My intention is to stop sources before we take the savepoint i.e. stop
with savepoint.
I know that since Flink 1.4.2, Stop is not stable/not production ready.
With Flink 1.5 can it be used for stopping jobs?
R
Hi,
I am coming across a use case where I may have to run more than100 parallel
jobs(which may have different processing needs) on a Flink cluster.
My flink cluster, currently, has 1 Job Manager and 4/5 Task Managers depending
on the processing needed is running on a Kubernetes cluster with 3 wo
Hi,
flink:latest docker image doesn't seem to work. I am not able to access the
Flink Dashboard after deploying it on Kubernetes.
Anyone else facing the issue?
Thanks,
Chirag
Hi,
I am evaluating some File Systems as state backend. I can see that Flink
currently supports S3, MAPRFS and HDFS as file systems.
However, I was hoping I can use Gluster as my state backend, since its already
a part of existing eco system. Since I have stateful operators in my job and I
am e
Hi,
I am working on a use case where my Flink job needs to collect data from
thousands of sources.
As an example, I want to collect data from more than 2000 File Directories,
process(filter, transform) the data and distribute the processed data streams
to 200 different directories.
Are there an
Hi,
I am trying to use Gluster File System as my FileSystem backed by RocksDB as
state backend. I can see from FsCheckpointStateOutputStream that the
DEFAULT_WRITE_BUFFER_SIZE = 4096.
Is the buffer size configurable in any way? Any idea about the checkpointing
performance with default buffer siz
I think you are looking for jobmanager.web.tmpdir along with upload.dir
>From the documentation :
-
jobmanager.web.tmpdir: This configuration parameter allows defining the Flink
web directory to be used by the web interface. The web interface will copy its
static files into the dire
internal implementation details that allow the FS checkpoints to be slightly
more consise in the file format but we might „de-optimize“ this minor
difference for the sake of compatibility in the near future.
Am 26.04.2018 um 15:22 schrieb Chirag Dewan :
Wow never considered it that way.
Thanks a lot
,Stefan
Am 26.04.2018 um 13:16 schrieb Chirag Dewan :
Hi,
I am working on a use case where I need to store a large amount of data in
state. I am using RocksDB as my state backend. Now to ensure data replication,
I want to store the RocksDB files in some distributed file system.
>From
Hi,
I am working on a use case where I need to store a large amount of data in
state. I am using RocksDB as my state backend. Now to ensure data replication,
I want to store the RocksDB files in some distributed file system.
>From the documentation I can see that Flink recommends a list of FileSy
erval, which flushes any buffered records.
Is my understanding correct here? Or am I still missing something?
thanks,
Chirag
On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan
wrote:
Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing
some data lo
Hi LiYue,
This should help : Apache Flink 1.5-SNAPSHOT Documentation: Windows
|
|
| |
Apache Flink 1.5-SNAPSHOT Documentation: Windows
|
|
|
So basically you need to register a processing time trigger at every 10 minutes
and on callback, you can FIRE the window result like this:
Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing
some data loss on Task Manager failure.
Its a simple job with parallelism=1 and a single Task Manager. After a few
checkpoints(kafka flush's) i kill one of my Task Manager running as a container
on Docker Swarm.
that you use in your JobManager. They
are described here:
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode
On 14. Feb 2018, at 15:32, Chirag Dewan wrote:
Thanks Aljoscha.
I haven't checked that bit. Is there any configuration for TaskManage
r
as well? They need this in order to find the JobManager leader.
Best,Aljoscha
On 14. Feb 2018, at 06:12, Chirag Dewan wrote:
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For
JobManager HA, I have started a 3 node zookeeper service on the same swarm
network and confi
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For
JobManager HA, I have started a 3 node zookeeper service on the same swarm
network and configured Flink's zookeeper quorum with zookeeper service
instances.
JobManager gets started with the LeaderElectionService and ge
68 matches
Mail list logo