How to deploy Flink in a geo-distributed environment

2018-06-26 Thread Stephen
Hi, Can Flink be deployed in a geo-distributed environment instead of being in local clusters? As far as I know, raw data should be moved to local cloud environment or local clusters before Flink handle it. Consider this situation where data sources are on different areas which might be cross diffe

Re: How to deploy Flink in a geo-distributed environment

2018-06-29 Thread Stephen
common-patterns-of-multi-datacenter-architectures-with-apache-kafka > > > On Wed, Jun 27, 2018 at 3:08 AM Stephen wrote: > >> Hi, >> Can Flink be deployed in a geo-distributed environment instead of being >> in local clusters? >> As far as I know, raw dat

potential software engineering problems

2018-08-03 Thread Stephen
Hi, everyone. I'm a graduate student on software engineering. I focus on Flink a few months. I'm really curious about what software engineering problems (e.g., testing, debugging, etc.) hinder your progress during developing Flink applications. Any response/suggestions I will appreciate. Thank you

question about setting different time window for operators

2018-08-23 Thread Stephen
Hi, Is that possible to set different operators with different time windows in a pipeline? For example, for the wordcount example, could I set execution period of filter operator 2s but set filter 3s? Thank you.

How to debug checkpoints failing to complete

2020-03-23 Thread Stephen Connolly
We have a topology and the checkpoints fail to complete a *lot* of the time. Typically it is just one subtask that fails. We have a parallelism of 2 on this topology at present and the other subtask will complete in 3ms though the end to end duration on the rare times when the checkpointing compl

Upgrading Flink

2020-04-06 Thread Stephen Connolly
Quick questions on upgrading Flink. All our jobs are compiled against Flink 1.8.x We are planning to upgrade to 1.10.x 1. Is the recommended path to upgrade one minor at a time, i.e. 1.8.x -> 1.9.x and then 1.9.x -> 1.10.x as a second step or is the big jump supported, i.e. 1.8.x -> 1.10.x in on

State Processor API with Beam

2020-04-06 Thread Stephen Patel
I've got an apache beam pipeline running on flink (1.9.1). I've been attempting to read a RocksDB savepoint taken from this beam-on-flink pipeline, using the state processor api, however it seems to have some incompatibilities around namespaces. Beam for instance uses a String namespace, while th

Re: State Processor API with Beam

2020-04-07 Thread Stephen Patel
Thanks Seth, I'll look into rolling my own KeyedStateInputFormat. On Mon, Apr 6, 2020 at 2:50 PM Seth Wiesman wrote: > Hi Stephen, > > You will need to implement a custom operator and user the `transform` > method. It's not just that you need to specify the namespace ty

Streaming Job eventually begins failing during checkpointing

2020-04-15 Thread Stephen Patel
I've got a flink (1.8.0, emr-5.26) streaming job running on yarn. It's configured to use rocksdb, and checkpoint once a minute to hdfs. This job operates just fine for around 20 days, and then begins failing with this exception (it fails, restarts, and fails again, repeatedly): 2020-04-15 13:15:

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
t it seems to be the same descriptor every time. Is that limit per operator? That is, can each operator host up to 32767 operator/broadcast states? I assume that's by name? On Wed, Apr 15, 2020 at 10:46 PM Yun Tang wrote: > Hi Stephen > > This is not related with RocksDB

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
ly at least. On Thu, Apr 16, 2020 at 9:03 AM Stephen Patel wrote: > I can't say that I ever call that directly. The beam library that I'm > using does call it in a couple places: > https://github.com/apache/beam/blob/v2.14.0/runners/flink/src/main/java/org/apache/beam/runners/fl

Re: Streaming Job eventually begins failing during checkpointing

2020-04-16 Thread Stephen Patel
(maybe by unregistering Operator States after they aren't used any more in the RequiresStableInput code). It seems to me that this isn't a Flink issue, but rather a Beam issue. Thanks for pointing me in the right direction. On Thu, Apr 16, 2020 at 11:29 AM Yun Tang wrote: > Hi Stephen > &

Two questions about Async

2020-04-21 Thread Stephen Connolly
1. On Flink 1.10 when I look at the topology overview, the AsyncFunctions show non-zero values for Bytes Received; Records Received; Bytes Sent but Records Sent is always 0... yet the next step in the topology shows approx the same Bytes Received as the async sent (modulo minor delays) and a non-ze

Re: Running Apache Flink on the GraalVM as a Native Image

2020-06-27 Thread Stephen Connolly
On Thu 25 Jun 2020 at 12:48, ivo.kn...@t-online.de wrote: > Whats up guys, > > > > I'm trying to run an Apache Flink Application with the GraalVM Native > Image but I get the following error: (check attached file) > > > > I suppose this happens, because Flink uses a lot of low-level-code and is >

Re: Running Apache Flink on the GraalVM as a Native Image

2020-06-28 Thread Stephen Connolly
On Sun 28 Jun 2020 at 01:34, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Thu 25 Jun 2020 at 12:48, ivo.kn...@t-online.de > wrote: > >> Whats up guys, >> >> >> >> I'm trying to run an Apache Flink Application with t

Re: Is outputting from components other than sinks or side outputs a no-no ?

2020-07-27 Thread Stephen Connolly
I am not 100% certain that David is talking about the same pattern of usage that you are Tom. David, the pattern Tom is talking about is something like this... try { do something with record } catch (SomeException e) { push record to DLQ } My concern is that if we have a different failure,

Logback on AWS EMR

2019-09-04 Thread Stephen Connolly
Has anyone configured AWS EMR’s flavour of Flink to use Logback (more specifically with the logstash encoder, which would require additional jars on the classpath) Or is there an alternative way people are using to send the logs to a service like Datadog Thanks in advance Stephen -- Sent from

Re: Logback on AWS EMR

2019-09-05 Thread Stephen Connolly
ace of logback). Then when you launch your flink jobs they will be clones with the correct files and happy as larry! Haven't figured out how to handle for ephemeral EMR clusters... but we aren't using them so :shrug: On Wed, 4 Sep 2019 at 22:17, Stephen Connolly < stephen.alan.co

Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-23 Thread Stephen Connolly
We are using a 3rd party library that allocates some resources in one of our topologies. Is there a listener or something that gets notified when the topology starts / stops running in the Task Manager's JVM? The 3rd party library uses a singleton, so I need to initialize the singleton when the f

Re: Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-23 Thread Stephen Connolly
Currently the best I can see is to make *everything* a Rich... and hook into the open and close methods... but feels very ugly. On Mon 23 Sep 2019 at 15:45, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > We are using a 3rd party library that allocates some resources

Re: Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-24 Thread Stephen Connolly
I have created https://issues.apache.org/jira/browse/FLINK-14184 as a proposal to improve Flink in this specific area. On Tue, 24 Sep 2019 at 03:23, Zhu Zhu wrote: > Hi Stephen, > > I think disposing static components in the closing stage of a task is > required. > This is be

POJO serialization vs immutability

2019-10-02 Thread Stephen Connolly
Thanks -Stephen

Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Stephen Young
I am currently looking into how Flink can support a live data collection platform. We want to collect certain data in real-time. This data will be sent to Kafka and we want to use Flink to calculate statistics and derived events from it. An important thing we need to be able to handle is amendm

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Stephen Young
se might also work with insert-only rows and a query > based on the flags in the data, correct? > > Regards, > Timo > > > On 04.02.20 16:14, Stephen Young wrote: > > I am currently looking into how Flink can support a live data collection > > platform. We want to

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-06 Thread Stephen Young
Are you able to advise any further Timo? Thanks! On 2020/02/04 16:10:04, Stephen Young wrote: > Hi Timo, > > Thanks for replying to me so quickly! > > We could do it with insert-only rows. When you say flags in the data do you > mean a field with a name like 'retracts

Rescaling a running topology

2020-02-07 Thread Stephen Connolly
So I am looking at the Flink Management REST API... and, as I see it, there are two paths to rescale a running topology: 1. Stop the topology with a savepoint and then start it up with the new savepoint; or 2. Use the /jobs/:jobid/rescaling

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) ... 20 more On Fri, 7 Feb 2020 at 11:40, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > So I am looking at the Flink Management REST API... and, as I see it, > there are two paths to rescale a running topology: > > 1. Stop the topolog

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
And now the job is stuck in a suspended state and I seem to have no way to get it out of that state again! On Fri, 7 Feb 2020 at 11:50, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > The plot thickens... I was able to rescale down... just not back up > again!!! &g

Re: Rescaling a running topology

2020-02-07 Thread Stephen Connolly
some time (I gave up waiting) On Fri, 7 Feb 2020 at 11:54, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > And now the job is stuck in a suspended state and I seem to have no way to > get it out of that state again! > > On Fri, 7 Feb 2020 at 11:50, Stephen Connolly

Reduce one event under multiple keys

2019-02-08 Thread Stephen Connolly
Ok, I'll try and map my problem into something that should be familiar to most people. Consider collection of PCs, each of which has a unique ID, e.g. ca:fe:ba:be, de:ad:be:ef, etc. Each PC has a tree of local files. Some of the file paths are coincidentally the same names, but there is no file s

Re: Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-10 Thread Stephen Connolly
On Sun, 10 Feb 2019 at 09:48, Chesnay Schepler wrote: > There are also versions of WindowedStream#aggregate that accept an > additional WindowFunction/ProcessWindowFunction, which do have access to > the key via apply()/process() respectively. These functions are called > post aggregation. > Coo

Re: Reduce one event under multiple keys

2019-02-10 Thread Stephen Connolly
uot;:"open","path":"/foo/bar/Admin guide.txt"} So there will be aggregates stored for ("ca:fe:ba:be","/"), ("ca:fe:ba:be","/foo"), ("ca:fe:ba:be","/foo/bar"), ("ca:fe:ba:be","/foo/bar/README.txt&quo

Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
I would like to process a stream of data firom different customers, producing output say once every 15 minutes. The results will then be loaded into another system for stoage and querying. I have been using TumblingEventTimeWindows in my prototype, but I am concerned that all the windows will star

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-10 Thread Stephen Connolly
;, or did you forget to call " + "'DataStream.assignTimestampsAndWatermarks(...)'?"); } } So I think I can just write my own where the offset is derived from hashing the element using my hash function. Good plan or bad plan? On Sun, 10 Feb 2019 at 19:55, Stephen Connolly < st

Re: Is there a windowing strategy that allows a different offset per key?

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 09:54, Fabian Hueske wrote: > Hi Stephen, > > First of all, yes, windows computing and emitting at the same time can > cause pressure on the downstream system. > > There are a few ways how you can achieve this: > * use a custom window assigner. A wi

Re: Reduce one event under multiple keys

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 09:42, Fabian Hueske wrote: > Hi Stephen, > > A window is created with the first record that is assigned to it. > If the windows are based on time and a key, than no window will be created > (and not space be occupied) if there is not a first record for a

Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
I have my main application updating with a blue-green deployment strategy whereby a new version (always called green) starts receiving an initial fraction of the web traffic and then - based on the error rates - we progress the % of traffic until 100% of traffic is being handled by the green versio

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > I have my main application updating with a blue-green deployment strategy > whereby a new version (always called green) starts receiving an initial > fraction of the web traffic and then

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
would be adding such state to the filter On Mon 11 Feb 2019 at 13:33, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> I have my main application upda

Re: Anyone tried to do blue-green topology deployments?

2019-02-11 Thread Stephen Connolly
On Mon, 11 Feb 2019 at 14:10, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Another possibility would be injecting pseudo events into the source and > having a stateful filter. > > The event would be something like “key X is now owned by green”. > > I can d

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Stephen Connolly
Congratulations to Thomas. I see that this is not his first time in the PMC rodeo... also somebody needs to update LDAP as he's not on https://people.apache.org/phonebook.html?pmc=flink yet! -stephenc On Tue, 12 Feb 2019 at 09:59, Fabian Hueske wrote: > Hi everyone, > > On behalf of the Flink P

How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hi, I’m having a strange situation and I would like to know where I should start trying to debug. I have set up a configurable swap in source, with three implementations: 1. A mock implementation 2. A Kafka consumer implementation 3. A Kinesis consumer implementation >From injecting a log and no

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
:14, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hi, I’m having a strange situation and I would like to know where I should > start trying to debug. > > I have set up a configurable swap in source, with three implementations: > > 1. A mock implementati

Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
Hmmm my suspicions are now quite high. I created a file source that just replays the events straight then I get more results On Tue, 19 Feb 2019 at 11:50, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hmmm after expanding the dataset such that there was additional

Re: EXT :Re: How to debug difference between Kinesis and Kafka

2019-02-19 Thread Stephen Connolly
urce > function isn’t getting data you have to watch out for this. > > > > *From:* Stephen Connolly [mailto:stephen.alan.conno...@gmail.com] > *Sent:* Tuesday, February 19, 2019 6:32 AM > *To:* user > *Subject:* EXT :Re: How to debug difference between Kinesis and Kafka >

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
t the best solution? On Thu, 21 Feb 2019 at 08:30, Dawid Wysakowicz wrote: > Hi Stephen, > > Watermark for a single operator is the minimum of Watermarks received from > all inputs, therefore if one of your shards/operators does not have > incoming data it will not produce Waterma

Can I make an Elasticsearch Sink effectively exactly once?

2019-02-21 Thread Stephen Connolly
>From how I understand it: https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html#elasticsearch-sinks-and-fault-tolerance the Flink Elasticsearch Sink guarantees at-least-once delivery of action > requests to Elasticsearch clusters. It does so by waiting for

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
ot;create event count for timebox" output or the "update event count for timebox from late events" output as long as it is always one and only one of those paths. > > > Best, > > Dawid > On 21/02/2019 14:18, Stephen Connolly wrote: > > Yes, it was the "wat

Re: Reduce one event under multiple keys

2019-02-21 Thread Stephen Connolly
Thanks! On Mon, 18 Feb 2019 at 12:36, Fabian Hueske wrote: > Hi Stephen, > > Sorry for the late response. > If you don't need to match open and close events, your approach of using a > flatMap to fan-out for the hierarchical folder structure and a window > operator (or

Re: How to debug difference between Kinesis and Kafka

2019-02-21 Thread Stephen Connolly
esults in case of reprocessing" I started to think that maybe the Watermarks are the Barrier but after your clarification I'm back to thinking they are separate similar mechanisms operating in the stream > Best, > > Dawid > > [1] > https://ci.apache.org/projects/flink/

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Thu, 21 Feb 2019 at 18:29, Frank Grimes wrote: > Hi, > > I've recently started to evaluate Flink and have found it odd that its > Tuple types, while Serializable, don't implement java.lang.Comparable. > This means that I either need to provide an KeySelector for many > operations or subtype th

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:16, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > On Thu, 21 Feb 2019 at 18:29, Frank Grimes > wrote: > >> Hi, >> >> I've recently started to evaluate Flink and have found it odd that its >> Tupl

Re: Why don't Tuple types implement Comparable?

2019-02-22 Thread Stephen Connolly
On Fri, 22 Feb 2019 at 10:38, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Fri, 22 Feb 2019 at 10:16, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> On Thu, 21 Feb 2019 at 18:29, Frank Grimes >> wrote: >> >

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-05 Thread Stephen Connolly
On Fri, 1 Mar 2019 at 13:05, LINZ, Arnaud wrote: > Hi, > > > > I think I should go into more details to explain my use case. > > I have one non parallel source (parallelism = 1) that list binary files in > a HDFS directory. DataSet emitted by the source is a data set of file > names, not file con

Re: Checkpoints and catch-up burst (heavy back pressure)

2019-03-05 Thread Stephen Connolly
On Tue, 5 Mar 2019 at 12:48, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Fri, 1 Mar 2019 at 13:05, LINZ, Arnaud > wrote: > >> Hi, >> >> >> >> I think I should go into more details to explain my use case. >> >

REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
In the documentation for the /jars/:jarid/plan endpoint https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-plan It says: > Program arguments can be passed both via the JSON request (recommended) or query parameters. Has anyone got sample code that sends th

Re: DataStream EventTime last data cannot be output?

2019-03-07 Thread Stephen Connolly
I had this issue myself. Your timestamp assigner will only advance the window as it receives data, thus when you reach the end of the data there will be data which is newer than the last window. One solution is to have the source flag that there will be no more data. If you can do this then that

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
able TRACE logging, > try again and look for logging messages from > "org.apache.flink.runtime.rest.handler.router.RouterHandler" > > On 07.03.2019 11:25, Stephen Connolly wrote: > > In the documentation for the /jars/:jarid/plan endpoint > > > https://ci.apache

Re: REST API question GET /jars/:jarid/plan

2019-03-07 Thread Stephen Connolly
Yep that was it. I have created https://issues.apache.org/jira/browse/FLINK-11853 so that it is easier for others to work around if they have restrictions on the HTTP client library choice On Thu, 7 Mar 2019 at 11:47, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > >

Re: Does flink configuration support configed by environment variables?

2019-04-01 Thread Stephen Connolly
I don't think it does. I ended up writing a small CLI tool to enabling templating the file from environment variables. There are loads of such tools, but mine is https://github.com/stephenc/envsub I have the dockerfile like so: ARG FLINK_VERSION=1.7.2-alpine FROM flink:${FLINK_VERSION} ARG ENVSUB

How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
7;m really looking for is some form of Node Transient State... are there any examples of this type of think. Flink 1.8.x Thanks, -Stephen

Re: How to handle JDBC connections in a topology

2019-07-24 Thread Stephen Connolly
Oh and I'd also need some way to clean up the per-node transient state if the topology stops running on a specific node. On Wed, 24 Jul 2019 at 08:18, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > Hi, > > So we have a number of nodes in our topology that ne

Subscribe

2017-10-10 Thread Stephen Jiang

[Slack] Request to upload new invitation link

2023-06-28 Thread Stephen Chu
Hi there, I'd love to join the Flink Slack channel, but it seems the link is outdated: https://join.slack.com/t/apache-flink/shared_invite/zt-1thin01ch-tYuj6Zwu8qf0QsivHY0anw Would someone be able to update or send me a new invite link? Thanks, Stephen

Can an Aggregate the key from a WindowedStream.aggregate()

2019-02-08 Thread stephen . alan . connolly
If I write my aggregation logic as a WindowFunction then I get access to the key as the first parameter in WindowFunction.apply(...) however the Javadocs for calling WindowedStream.apply(WindowFunction) state: > Note that this function requires that all data in the windows is buffered > until t