I need my job to aggregator every device's mertic as daily report.But I did not
find a window can cover exactly one day,or let everyday's beginning
as watermark .Should I custom a window or any other way to achieve?
各位大佬:
请教一个flink 认证的问题: Flink on yarn 运行在不用认证的 Hadoop
集群上,怎样访问带kerberos 认证集群的 hbase ?
下面是一些我们使用的描述和发现的问题:
我们有两个hadoop 集群,一个使用 Kerberos 认证模式,一个是 simple
认证模式,Flink 1.9.0 部署在 simple 认证的集群上。
最近在使用flink 读取 Kerberos 认证的集群的 hbase 上遇到了问题。配置
flink-conf.yaml 中的配置参数:security.ker
Hi Hanan,
Sometimes, the behavior depends on your implementation.
Since it's not a built-in connector, it would be better to share your
customized source with the community
so that the community would be better to help you figure out where is the
problem.
WDYT?
Best,
Vino
Hanan Yehudai 于2019年
HI , I am trying to do some performance test to my flink deployment.
I am implementing an extremely simplistic use case
I built a ZMQ Source
The topology is ZMQ Source - > Mapper- > DIscardingSInk ( a sink that does
nothing )
Data is pushed via ZMQ at a very high rate.
When the incoming rate f
-- Forwarded message -
发件人: Reo Lei
Date: 2019年11月26日周二 上午9:53
Subject: Re: How to recover state from savepoint on embedded mode?
To: Yun Tang
Hi Yun,
Thanks for your reply. what I say the embedded mode is the whole flink
cluster and job, include jobmanager, taskmanager and the
Hi Kostas/Congxian:
Thanks fo your response.
Based on your feedback, I found that I had missed adding uid to one of the
stateful operators and correcting that resolved the issue. I still have
stateless operators which I have no uid specified in the application.
So, I thought that adding uid w
Hi Kaihao,
Ping @Aljoscha Krettek @Tzu-Li (Gordon) Tai
to give more professional suggestions.
What's more, we may need to give a statement about if the state processor
API can process the snapshots generated by the old version jobs. WDYT?
Best,
Vino
Kaihao Zhao 于2019年11月25日周一 下午11:39写道:
>
Hey folks, I'm trying to stream large volume data and write them as csv
files to S3, and one of the restrictions is to try and keep the files to
below 100MB (compressed) and write one file per minute. I wanted to verify
with you guys regarding my understanding of StreamingFileSink:
1. From the doc
Hi Vijay,
IMO, the semantics of the source is not changeless. It can contain
integrate with third-party systems and consume events. However, it can also
contain more business logic about your data pre-process after consuming
events.
Maybe it needs some customization. WDYT?
Best,
Vino
Vijay Bala
Hi Yang,
Session mode is working exactly as you described. No exceptions.
Thank you!
Piper
On Sun, Nov 24, 2019 at 11:24 PM Yang Wang wrote:
> Hi Piper,
>
> In session mode, Flink will always use the free slots in the existing
> TaskManagers first.
> When it can not full fill the slot reques
Hello,
We're seeing some strange behavior with flink's KafkaConnector010 (Kafka
0.10.1.1) arbitrarily skipping data.
*Context*
KafkaConnector010 is used as source, and StreamingFileSink/BulkPartWriter
(S3) as sink with no intermediate operators. Recently, we noticed that
millions of Kafka records
Hi,
Need to pre-process data(transform incoming data to a different format)
before it hits the Source I have defined. How can I do that ?
I tried to use a .map on the DataStream but that is too late as the data
has already hit the Source I defined.
FlinkKinesisConsumer> kinesisConsumer =
getMonito
Thanks Caizhi, that was what I was afraid of. Thanks for the information on the
REST API 😊
It seems like the right solution would be to add it as a first-class feature
for Flink so I will add a feature request. I may end up using the REST API as a
workaround in the short-term - probably with a
I'm afraid I can't think of a solution. I don't see a way how this
operation can succeed or fail without anything being logged.
Is the cluster behaving normally afterwards? Could you check whether the
numRunningJobs ticks down properly after the job was canceled?
On 22/11/2019 13:27, Pavel P
What is the embedded mode mean here? If you refer to SQL embedded mode, you
cannot resume from savepoint now; if you refer to local standalone cluster, you
could use `bin/flink run -s` to resume on a local cluster.
Best
Yun Tang
From: Reo Lei
Date: Tuesday, November 26, 2019 at 12:37 AM
To: "u
Hi,
I have a job need running on embedded mode, but need to init some rule data
from a database before start. So I used the State Processor API to
construct my state data and save it to the local disk. When I want to used
this savepoint to recover my job, I found resume a job from a savepoint
need
Hi,
I would suggest the same thing as Vino did: it might be possible to use stdout
somehow, but it’s a better idea to coordinate in some other way. Produce some
(side?) output with a control message from one job once it finishes, that will
control the second job.
Piotrek
> On 25 Nov 2019, at
Hi,
We are running Flink 1.7 and recently due to Kafka cluster migration, we
need to find a way to modify kafka offset in FlinkKafkaConnector's state,
and we found Flink 1.9's State Processor API is the exactly tool we need,
we are able to modify the operator state via State Processor API, but whe
Hi
The problem is that the specified uid did not in the new job.
1. As far as I know, the answer is yes. There are some operators have their
own state(such as window state), could you please share the minimal code of
your job?
2.*truely* stateless operator do not need to have uid, but for the reas
Hi,
So you are trying to use the same window definition, but you want to aggregate
the data in two different ways:
1. keyBy(userId)
2. Global aggregation
Do you want to use exactly the same aggregation functions? If not, you can just
process the events twice:
DataStream<…> events = …;
DataS
Hi,
I’m glad to hear that you are interested in Flink! :)
> In the picture, keyBy window and apply operators share the same circle. Is
> is because these operators are chaining together?
It’s not as much about chaining, as the chain of DataStream API invocations
`someStream.keyBy(…).window(…
As a side note, I am assuming that you are using the same Flink Job
before and after the savepoint and the same Flink version.
Am I correct?
Cheers,
Kostas
On Mon, Nov 25, 2019 at 2:40 PM Kostas Kloudas wrote:
>
> Hi Singh,
>
> This behaviour is strange.
> One thing I can recommend to see if the
Hi Singh,
This behaviour is strange.
One thing I can recommend to see if the two jobs are identical is to
launch also the second job without a savepoint,
just start from scratch, and simply look at the web interface to see
if everything is there.
Also could you please provide some code from your
Hi,
I’m not sure if there is some simple way of doing that (maybe some other
contributors will know more).
There are two potential ideas worth exploring:
- use periodically triggered save points for monitoring? If I remember
correctly save points are never incremental
- use save point input/out
Hi,
Good to hear that you were able to workaround the problem.
I’m not sure what’s the exact reason why mmaped partitions caused those
failures, but you are probably right that they have caused some memory
exhaustion. Probably this memory is not capped by anything, but I would expect
kernel to
> If the call to mapResultToOutType(Result) finished without an error there
is no need to restart from the same row.
> The new scanner should start from the next row.
> Is that so or am I missing something?
Yeah, your are right. I've filed the issue
https://issues.apache.org/jira/browse/FLINK-1494
Thanks DIan for your pointers. MansOn Sunday, November 24, 2019, 08:57:53
PM EST, Dian Fu wrote:
Hi Mans,
Please see my reply inline below.
在 2019年11月25日,上午5:42,M Singh 写道:
Thanks Dian for your answers.
A few more questions:
1. If I do not assign uids to operators/sources and sink
Thanks Ciazhi & Thomas for your responses.
I read the throttling example but want to see if that work with a distributed
broker like Kinesis and how to have throttling feedback to the Kinesis source
so that it can vary the rate without interfering with watermarks, etc.
Thanks again
Mans
On
Hi,
I have some trouble with my HA K8 cluster.
Current my Flink application has infinite stream. (With 12 parallelism)
After few days I am losing my task managers. And they never reconnect to job
manager.
Because of this, application cannot get restored with restart policy.
I did few searches a
Currently, you still need zookeeper service to enable HA on k8s, and the
configuration for this part is no different from YARN mode [1].
By the way, there also exists other solution to implement HA like etcd [2], but
still in discussion.
[1]
https://ci.apache.org/projects/flink/flink-docs-relea
Hi Flavio,
>> When the resultScanner dies because of a timeout (this happens a lot when
>> you have backpressure and the time between 2 consecutive reads exceed the
>> scanner timeout), the code creates a new scanner and restart from where it
>> was (starRow = currentRow).
>> So there should no
related
https://issues.apache.org/jira/browse/FLINK-13792
Regards,
Julian.
On Mon, 25 Nov 2019 15:25:14 +0530 Caizhi Weng
wrote
Hi,
As far as I know, Flink currently doesn't have a built-in throttling function.
You can write your own user-defined function to achieve
Hi Rahul,
Only found some resources from the Internet you can consider.[1][2]
Best,
Vino
[1]: https://bahir.apache.org/docs/flink/current/flink-streaming-kudu/
[2]:
https://www.slideshare.net/0xnacho/apache-flink-kudu-a-connector-to-develop-kappa-architectures
Rahul Jain 于2019年11月25日周一 下午6:32写
Hi,
We are trying to use the Flink Kudu connector. Is there any documentation
available that we can read to understand how to use it ?
We found some sample code but that was not very helpful.
Thanks,
-rahul
Hi,
As far as I know, Flink currently doesn't have a built-in throttling
function. You can write your own user-defined function to achieve this.
Your function just gives out what it reads in and limits the speed it gives
out records at the same time.
If you're not familiar with user-defined funct
Thanks, I'll check it out.
On Mon, Nov 25, 2019 at 11:46 AM vino yang wrote:
> *This Message originated outside your organization.*
> --
> Hi Avi,
>
> The side output provides a superset of split's functionality. So anything
> can be implemented via split also can be
Hi Avi,
The side output provides a superset of split's functionality. So anything
can be implemented via split also can be implemented via side output.[1]
Best,
Vino
[1]:
https://stackoverflow.com/questions/51440677/apache-flink-whats-the-difference-between-side-outputs-and-split-in-the-data
Av
Hi Kelly,
As far as I know Flink currently does not have such metrics to monitor on
the number of tasks in each states. See
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html
for
the complete metrics list. (It seems that `taskSlotsAvailable` in the
metrics list is the m
Maybe the problem is indeed this..the fact that the scan starts from the
last seen row..in this case maybe the first result should be skipped
because it was already read..
On Mon, Nov 25, 2019 at 10:22 AM Flavio Pompermaier
wrote:
> What I can tell is how the HBase input format works..if you loo
Thank you, for your quick reply. I appreciate that. but this it not
exactly "side output" per se. it is simple splitting. IIUC The side output
is more for splitting the records buy something the differentiate them
(latnes , value etc' ) . I thought there is more idiomatic but if this is
it, than I
What I can tell is how the HBase input format works..if you look
at AbstractTableInputFormat [1] this is the nextRecord() function:
public T nextRecord(T reuse) throws IOException {
if (resultScanner == null) {
throw new IOException("No table result
Hi Avi,
As the doc of DataStream#split said, you can use the "side output" feature
to replace it.[1]
[1]:
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html
Best,
Vino
Avi Levi 于2019年11月25日周一 下午4:12写道:
> Hi,
> I want to split the output of one of the operators
Hi Komal,
> Thank you! That's exactly what's happening. Is there any way to force it
write to a specific .out of a TaskManager?
No, I am curious why the two jobs depend on stdout? Can we introduce
another coordinator other than stdout? IMO, this mechanism is not always
available.
Best,
Vino
Kom
Hi,
I want to split the output of one of the operators to two pipelines. Since
the *split* method is deprecated, what is the idiomatic way to do that
without duplicating the operator ?
[image: Screen Shot 2019-11-25 at 10.05.38.png]
44 matches
Mail list logo