Hi,
Watermarks are meta events that travel independently of data events.
1) If you assingTimestampsAndWatermarks before keyBy, all parallel
instances of trips have some data(this is my assumption) so Watermarks
can be generated. Afterwards even if some of the keyed partitions have
no data, Waterm
Hi,
We have a keyed pipeline with persisted state.
Is there a way to broadcast a command and collect all values that persisted
in the state ?
The end result can be for example sending a fetch command to all operators
and emitting the results to some sink
why do we need it ? from time to time we
Hi Felipe,
I am not sure the algorithm requires to construct a new extension of the
window operator. I think your implementation of the CountMinSketch object
as an aggregator:
E.g.
1. AggregateState (ACC) should be the aggregating accumulate
count-min-sketch 2-D hash array (plus a few other needed
Any thoughts on this?
On Sun, Apr 7, 2019, 6:56 PM Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com> wrote:
> Hi,
>
> I have a very simple program using the local execution environment, that
> throws NPE and other exceptions related to concurrent access when launching
> a count for a Dat
AFAIK, there are no granular permissions like that built into Flink. Limiting
access to the REST API seems like a good place to start. The web UI uses the
API, but controlling it there means you’re locking down all means of access.
The designers of the API were disciplined about what HTTP verbs
Here you go https://issues.apache.org/jira/browse/FLINK-12333
Again thanks for the prompt response
On Wed, Apr 24, 2019 at 11:06 AM Till Rohrmann wrote:
> Good to hear. Could you create a documentation JIRA issue for this
> problem? Thanks a lot.
>
> Cheers,
> Till
>
> On Wed, Apr 24, 2019 at
If my understanding is correct, then why `assignTimestampsAndWatermarks` before
`keyBy` works? The `timeWindowAll` stream's input streams are task 1 and task
2, with task 2 idling, no matter whether `assignTimestampsAndWatermarks` is
before or after `keyBy`, because whether task 2 receives eleme
Ditto that, queryable-state.enable to true works.
Thanks everyone.
On Thu, Apr 25, 2019 at 6:28 AM Dawid Wysakowicz
wrote:
> Hi Vishal,
>
> As Guowei mentioned you have to enable the Queryable state. The default
> setting was changed in 1.8.0. There is an open JIRA[1] for changing the
> documen
I had come across flink-deployer actually, but somehow didn't want to
"learn" it... (versus just a bunch of lines in a script)
At some time with more bandwidth, we should migrate to this one and
standardize flink-deployer (and later take this to mainstream Flink :P).
---
Oytun Tez
*M O T A W O R
Hello
I want to run flink on apache Mesos with Marathon and I configure Zookeeper
too; so I run "mesos-appmaster.sh"; but it shows me this error:
2019-04-25 13:53:18,160 INFO
org.apache.flink.mesos.runtime.clusterframework.MesosResourceManager -
Mesos resource manager started.
2019-04-25 13:5
Hi
We are looking at running Flink on Kubernetes in Job cluster mode. As part
of our plans we do not want to allow modifications to the job cluster once
a job is running. For this we are looking at a "read-only" Flink UI, that
does not allow users to cancel a job or submit a job.
My question is,
Thanks Dawid,
Hi Sergey,
I am working on update the flink interpreter of zeppelin to support flink
1.9 (supposed to be released this summer).
For the current flink interpreter of zeppelin 0.9, I haven't verified it
against flink 1.8. could you show the full interpreter log ? And what is
the size
Hi Beckett,
Thanks for your feedback, See my comments inline
>>> How do user specify the listener? *
What I proposal is to register JobListener in ExecutionEnvironment. I don't
think we should make ClusterClient as public api.
>>> Where should the listener run? *
I don't think it is proper to
Great news! Thanks
On Thu, Apr 25, 2019, 2:59 AM Tzu-Li (Gordon) Tai
wrote:
> Hi Cliff,
>
> Thanks for bringing this up again.
>
> I think it would make sense to at least move this forward be only
> exclusively checking the schema for user keys in MapState, and allow value
> schema evolution.
>
Thanks for the proposal, Jeff. Adding a listener to allow users handle
events during the job lifecycle makes a lot of sense to me.
Here are my two cents.
* How do user specify the listener? *
It is not quite clear to me whether we consider ClusterClient as a public
interface? From what I understa
The passed job arguments can not be queried via the REST API. When
submitting jobs through the CLI these parameters never arrive at the
cluster; in case of REST API submission they are immediately discarded
after the submission has finished.
On 25/04/2019 12:25, Dawid Wysakowicz wrote:
Hi Ste
Hey,
Sorry for such a delay, but I have missed this message. Basically,
technically you could have Kafka broker installed in version say 1.0.0 and
using FlinkKafkaConsumer08. This could technically create issues.
I'm not sure if You can automate the process of skipping corrupted
messages, as You
Hi,
Feel free to open a JIRA for this issue. By the way have you
investigated what is the root cause for it hanging?
Best,
Dawid
On 25/04/2019 08:55, qi luo wrote:
> Hello,
>
> This issue occurred again and we dumped the TM thread. It indeed hung
> on socket read to download jar from Blob serve
Thanks for help Till,
I thought so, but I wanted to be sure.
Best Regards,
Dom.
Hi Sergey,
I am not very familiar with Zepellin. But I know Jeff (cc'ed) is working
on integrating Flink with some notebooks. He might be able to help you.
Best,
Dawid
On 25/04/2019 08:42, Smirnov Sergey Vladimirovich (39833) wrote:
>
> Hello,
>
>
>
> Trying to link Zeppelin 0.9 with Flink 1.
Hi Vishal,
As Guowei mentioned you have to enable the Queryable state. The default
setting was changed in 1.8.0. There is an open JIRA[1] for changing the
documentation accordingly.
Best,
Dawid
[1] https://issues.apache.org/jira/browse/FLINK-12274
On 25/04/2019 03:27, Guowei Ma wrote:
> You co
Hi Steve,
As far as I know, this information is not available in REST API, but it
would be good to double check with Chesnay(cc'ed). You can see the
complete list of available REST commands here[1].
Best,
Dawid
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/rest_api.
Hi Avi,
Just as some additional explanation. UID of operator is the way we map
state to corresponding operator. This allows loading savepoint with
changed DAG as long as the UIDs stay the same. This as you said explain
why you got the exception when you changed uid of some of the operators.
Best,
Your proposal could probably also be implemented by using Flink's support
for allowed lateness when defining a window [1]. It has basically the same
idea that there might be some elements which violate the watermark
semantics and which need to be handled separately.
[1]
https://ci.apache.org/proje
Hi,
Yes I think your explanation is correct. I can also recommend Seth's
webinar where he talks about debugging Watermarks[1]
Best,
Dawid
[1]
https://www.ververica.com/resources/webinar/webinar/debugging-flink-tutorial
On 22/04/2019 22:55, an0 wrote:
> Thanks, I feel I'm getting closer to the
There was someone working in IoT asking me whether Flink supports per-key
watermark also.
I’m not sure if we can do the statistics by using raw state manipulating. We
create a single state for every single key, and when receiving a key, we
extract the timestamp and to see if we need to send som
Hi Monika,
I would start with identifying the operator that causes backpressure.
More information how to monitor backpressure you can find here in the
docs[1]. You might also be interested in Seth's (cc'ed) webinar[2],
where he also talks how to debug backpressure.
Best,
Dawid
[1]
https://ci.ap
Hi Soheil,
The equivalent of DataStream's SinkFunction in DataSet API is the
mentioned OutputFormat. You can implement the OutputFormat.
Best,
Dawid
On 21/04/2019 20:01, Soheil Pourbafrani wrote:
> Hi, Using the DataStream API I could create a Custom Sink
> like classRichMySqlSink extends RichS
Hi,
Unfortunately those are just ignored. The timed out partial matches are
not emitted.
Best,
Dawid
On 20/04/2019 19:49, M Singh wrote:
> Dawid:
>
> So, what happens when there is a timeout - is there any value/field in
> the resulting data stream that indicates that this was a timeout ?
>
> T
Hi Mike,
I think the reason why there is no access to TimerService in async
function is that as it is an async function, there are no guarantees
when/and where(at which stage of the pipeline) the function will
actually be executed. This characteristic doesn't align with
TimerService and timely cal
Hi Smirnov,
Actually there is a way to tell Flink that data is already partitioned.
You can try the reinterpretAsKeyedStream[1] method. I must warn you
though this is an experimental feature.
Best,
Dawid
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/experimental.htm
Hi Averell,
I think your original solution is the right one, given your
requirements. I don't think it is over complicated.
As for the memory concerns, there is no bult-in mechanism for
backpressure/alignment based on event time. The community did take that
into consideration when discussing the
Hi Steven, Oytun
You may find the tool we open-sourced last year useful. It offers deploying and
updating jobs with savepointing.
You can find it on Github: https://github.com/ing-bank/flink-deployer
There’s also a docker image available in Docker Hub.
Marc
On 24 Apr 2019, 17:29 +0200, Oytun T
Hello,
This issue occurred again and we dumped the TM thread. It indeed hung on socket
read to download jar from Blob server:
"DataSource (at createInput(ExecutionEnvironment.java:548) (our.code))
(1999/2000)" #72 prio=5 os_prio=0 tid=0x7fb9a1521000 nid=0xa0994 runnable
[0x7fb97cfbf000
34 matches
Mail list logo