Re: [DISCUSS] FLIP-36 - Support Interactive Programming in Flink Table API

2020-05-19 Thread Xuannan Su
Hi folks,

As the feature freeze of Flink 1.11 has passed and the release branch is
cut, I'd like to revive this discussion thread of FLIP-36[1]. A quick
summary of FLIP-36:

The FLIP proposes to add support for interactive programming in Flink Table
API. Specifically, it let users cache the intermediate results(tables) and
use them in later jobs.

Any comments or suggestions are very appreciated. I'd like to know how the
community thinks about the FLIP. In the meantime, I plan to revive the vote
thread if no comments are received in 2 days.

Best,
Xuannan

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle


On Thu, May 7, 2020 at 5:40 PM Xuannan Su  wrote:

> Hi,
>
> There are some feedbacks from @Timo and @Kurt in the voting thread for
> FLIP-36 and I want to share my thoughts here.
>
> 1. How would the FLIP-36 look like after FLIP-84?
> I don't think FLIP-84 will affect FLIP-36 from the public API perspective.
> Users can call .cache on a table object and the cached table will be
> generated whenever the table job is triggered to execute, either by
> Table#executeInsert or StatementSet#execute. I think that FLIP-36 should
> aware of the changes made by FLIP-84, but it shouldn't be a problem. At the
> end of the day, FLIP-36 only requires the ability to add a sink to a node,
> submit a table job with multiple sinks, and replace the cached table with a
> source.
>
> 2. How can we support cache in a multi-statement SQL file?
> The most intuitive way to support cache in a multi-statement SQL file is
> by using a view, where the view is corresponding to a cached table.
>
> 3. Unifying the cached table and materialized views
> It is true that the cached table and the materialized view are similar in
> some way. However, I think the materialized view is a more complex concept.
> First, a materialized view requires some kind of a refresh mechanism to
> synchronize with the table. Secondly, the life cycle of a materialized view
> is longer. The materialized view should be accessible even after the
> application exits and should be accessible by another application, while
> the cached table is only accessible in the application where it is created.
> The cached table is introduced to avoid recomputation of an intermediate
> table to support interactive programming in Flink Table API. And I think
> the materialized view needs more discussion and certainly deserves a whole
> new FLIP.
>
> Please let me know your thought.
>
> Best,
> Xuannan
>
> On Wed, Apr 29, 2020 at 3:53 PM Xuannan Su  wrote:
>
>> Hi folks,
>>
>> The FLIP-36 is updated according to the discussion with Becket. In the
>> meantime, any comments are very welcome.
>>
>> If there are no further comments, I would like to start the voting
>> thread by tomorrow.
>>
>> Thanks,
>> Xuannan
>>
>>
>> On Sun, Apr 26, 2020 at 9:34 AM Xuannan Su  wrote:
>>
>>> Hi Becket,
>>>
>>> You are right. It makes sense to treat retry of job 2 as an ordinary
>>> job. And the config does introduce some unnecessary confusion. Thank you
>>> for you comment. I will update the FLIP.
>>>
>>> Best,
>>> Xuannan
>>>
>>> On Sat, Apr 25, 2020 at 7:44 AM Becket Qin  wrote:
>>>
 Hi Xuannan,

 If user submits Job 1 and generated a cached intermediate result. And
 later
 on, user submitted job 2 which should ideally use the intermediate
 result.
 In that case, if job 2 failed due to missing the intermediate result,
 Job 2
 should be retried with its full DAG. After that when Job 2 runs, it will
 also re-generate the cache. However, once job 2 has fell back to the
 original DAG, should it just be treated as an ordinary job that follow
 the
 recovery strategy? Having a separate configuration seems a little
 confusing. In another word, re-generating the cache is just a byproduct
 of
 running the full DAG of job 2, but is not the main purpose. It is just
 like
 when job 1 runs to generate cache, it does not have a separate config of
 retry to make sure the cache is generated. If it fails, it just fail
 like
 an ordinary job.

 What do you think?

 Thanks,

 Jiangjie (Becket) Qin

 On Fri, Apr 24, 2020 at 5:00 PM Xuannan Su 
 wrote:

 > Hi Becket,
 >
 > The intermediate result will indeed be automatically re-generated by
 > resubmitting the original DAG. And that job could fail as well. In
 that
 > case, we need to decide if we should resubmit the original DAG to
 > re-generate the intermediate result or give up and throw an exception
 to
 > the user. And the config is to indicate how many resubmit should
 happen
 > before giving up.
 >
 > Thanks,
 > Xuannan
 >
 > On Fri, Apr 24, 2020 at 4:19 PM Becket Qin 
 wrote:
 >
 > > Hi Xuannan,
 > >
 > >  I am not entirely sure if I understand the cases you mentioned. The
 > users
 > > > can use the

[jira] [Created] (FLINK-17805) InputProcessorUtil doesn't handle indexes for multiple inputs operators correctly

2020-05-19 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-17805:
--

 Summary: InputProcessorUtil doesn't handle indexes for multiple 
inputs operators correctly
 Key: FLINK-17805
 URL: https://issues.apache.org/jira/browse/FLINK-17805
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Reporter: Piotr Nowojski
 Fix For: 1.11.0


This can cause {{ArrayIndexOutOfBound}} exception when input gates passed to 
{{InputProcessorUtil#createCheckpointedInputGatePair}} have lower IDs in the 
second input compared to input gates from the first one.

{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at 
org.apache.flink.streaming.runtime.io.CheckpointBarrierUnaligner$ThreadSafeUnaligner.notifyBufferReceived(CheckpointBarrierUnaligner.java:328)
at 
org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:218)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:637)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:615)
at 
org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.pollNext(SingleInputGate.java:603)
at 
org.apache.flink.runtime.taskmanager.InputGateWithMetrics.pollNext(InputGateWithMetrics.java:105)
at 
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:110)
at 
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:136)
at 
org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:178)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:341)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:206)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:196)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:553)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:526)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:713)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:539)
at java.lang.Thread.run(Thread.java:748)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17806) CollectSinkFunctionTest.testCheckpointedFunction fails with expected:<50> but was:<0>

2020-05-19 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17806:
--

 Summary: CollectSinkFunctionTest.testCheckpointedFunction fails 
with  expected:<50> but was:<0>
 Key: FLINK-17806
 URL: https://issues.apache.org/jira/browse/FLINK-17806
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.12.0
Reporter: Robert Metzger


CI 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1793&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=4ed44b66-cdd6-5dcf-5f6a-88b07dda665d

{code}
2020-05-19T04:28:24.6400821Z [INFO] Running 
org.apache.flink.streaming.runtime.io.benchmark.StreamNetworkBroadcastThroughputBenchmarkTest
2020-05-19T04:28:25.7508457Z java.util.concurrent.TimeoutException
2020-05-19T04:28:25.7511469Zat 
java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
2020-05-19T04:28:25.7512331Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
2020-05-19T04:28:25.7513238Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.sendRequest(CollectSinkFunctionTest.java:391)
2020-05-19T04:28:25.7514280Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.access$300(CollectSinkFunctionTest.java:60)
2020-05-19T04:28:25.7515499Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest$TestCollectRequestSender.sendRequest(CollectSinkFunctionTest.java:465)
2020-05-19T04:28:25.7516628Zat 
org.apache.flink.streaming.api.operators.collect.utils.TestCollectClient.run(TestCollectClient.java:77)
2020-05-19T04:28:25.7517231Z java.lang.InterruptedException
2020-05-19T04:28:25.7517907Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
2020-05-19T04:28:25.7518816Zat 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
2020-05-19T04:28:25.7519745Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunction.invoke(CollectSinkFunction.java:246)
2020-05-19T04:28:25.7520773Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest$CheckpointedDataFeeder.run(CollectSinkFunctionTest.java:574)
2020-05-19T04:28:25.7600809Z [ERROR] Tests run: 4, Failures: 1, Errors: 0, 
Skipped: 0, Time elapsed: 53.526 s <<< FAILURE! - in 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest
2020-05-19T04:28:25.7602305Z [ERROR] 
testCheckpointedFunction(org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest)
  Time elapsed: 26.262 s  <<< FAILURE!
2020-05-19T04:28:25.7603126Z java.lang.AssertionError: expected:<50> but was:<0>
2020-05-19T04:28:25.7603600Zat org.junit.Assert.fail(Assert.java:88)
2020-05-19T04:28:25.7604100Zat 
org.junit.Assert.failNotEquals(Assert.java:834)
2020-05-19T04:28:25.7604810Zat 
org.junit.Assert.assertEquals(Assert.java:645)
2020-05-19T04:28:25.7605373Zat 
org.junit.Assert.assertEquals(Assert.java:631)
2020-05-19T04:28:25.7606127Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.assertResultsEqual(CollectSinkFunctionTest.java:430)
2020-05-19T04:28:25.7607201Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.assertResultsEqualAfterSort(CollectSinkFunctionTest.java:441)
2020-05-19T04:28:25.7608296Zat 
org.apache.flink.streaming.api.operators.collect.CollectSinkFunctionTest.testCheckpointedFunction(CollectSinkFunctionTest.java:321)
2020-05-19T04:28:25.7609371Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-19T04:28:25.7610031Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-19T04:28:25.7610819Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-19T04:28:25.7611665Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-19T04:28:25.7612348Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-19T04:28:25.7613094Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-19T04:28:25.7613785Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-19T04:28:25.7614493Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-19T04:28:25.7615370Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2020-05-19T04:28:25.7616067Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-05-19T04:28:25.7616711Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-05-19T04:28:25.7617574Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-19T04:28:25.7618159Zat 

[jira] [Created] (FLINK-17807) Fix the broken link "/zh/ops/memory/mem_detail.html" in documentation

2020-05-19 Thread Jark Wu (Jira)
Jark Wu created FLINK-17807:
---

 Summary: Fix the broken link "/zh/ops/memory/mem_detail.html" in 
documentation
 Key: FLINK-17807
 URL: https://issues.apache.org/jira/browse/FLINK-17807
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Jark Wu
Assignee: Xintong Song


We may need to update {{mem_setup_tm.zh.md}} and {{mem_trouble.zh.md}} to 
resolve the remaining broken link:

http://localhost:4000/zh/ops/memory/mem_detail.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

2020-05-19 Thread Yun Tang (Jira)
Yun Tang created FLINK-17808:


 Summary: Rename checkpoint meta file to "_metadata" until it has 
completed writing
 Key: FLINK-17808
 URL: https://issues.apache.org/jira/browse/FLINK-17808
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.10.0
Reporter: Yun Tang
 Fix For: 1.12.0


In practice, some developers or customers would use some strategy to find the 
recent _metadata as the checkpoint to recover (e.g as many proposals in 
FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
the checkpoint have been completed as the writing to create the "_meatadata" 
file could break as some force quit (e.g. yarn application -kill).

We could create the checkpoint meta stream to write data to file named as 
"_metadata.inprogress" and renamed it to "_metadata" once completed writing. By 
doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17809) BashJavaUtil script logic does not work for paths with spaces

2020-05-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-17809:


 Summary: BashJavaUtil script logic does not work for paths with 
spaces
 Key: FLINK-17809
 URL: https://issues.apache.org/jira/browse/FLINK-17809
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Scripts
Affects Versions: 1.10.0, 1.11.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0, 1.10.2


Multiple paths aren't quoted (class path, conf_dir) resulting in errors if they 
contain spaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Need Help on Flink suitability to our usecase

2020-05-19 Thread Prasanna kumar
Hi,

I have the following usecase to implement in my organization.

Say there is huge relational database(1000 tables for each of our 30k
customers) in our monolith setup

We want to reduce the load on the DB and prevent the applications from
hitting it for latest events. So an extract is done from redo logs on to
kafka.

We need to set up a streaming platform based on the table updates that
happen(read from kafka) , we need to form events and send it consumer.

Each consumer may be interested in same table but different updates/columns
respective of their business needs and then deliver it to their
endpoint/kinesis/SQS/a kafka topic.

So the case here is *1* table update : *m* events : *n* sink.
Peak Load expected is easily a  100k-million table updates per second(all
customers put together)
Latency expected by most customers is less than a second. Mostly in
100-500ms.

Is this usecase suited for flink ?

I went through the Flink book and documentation. These are the following
questions i have

1). If we have situation like this *1* table update : *m* events : *n* sink
, is it better to write our micro service on our own or it it better to
implement through flink.
  1 a)  How does checkpointing happens if we have *1* input: *n* output
situations.
  1 b)  There are no heavy transformations maximum we might do is to
check the required columns are present in the db updates and decide whether
to create an event. So there is an alternative thought process to write a
service in node since it more IO and less process.

2)  I see that we are writing a Job and it is deployed and flink takes care
of the rest in handling parallelism, latency and throughput.
 But what i need is to write a generic framework so that we should be
able to handle any table structure. we should not end up writing one job
driver for each case.
There are at least 200 type of events in the existing monolith system
which might move to this new system once built.

3)  How do we maintain flink cluster HA . From the book , i get that
internal task level failures are handled gracefully in flink.  But what if
the flink cluster goes down, how do we make sure its HA ?
I had earlier worked with spark and we had issues managing it. (Not
much problem was there since there the latency requirement is 15 min and we
could make sure to ramp another one up within that time).
These are absolute realtime cases and we cannot miss even one
message/event.

There are also thoughts whether to use kafka streams/apache storm for the
same. [They are investigated by different set of folks]

Thanks,
Prasanna.


[jira] [Created] (FLINK-17810) Add document for K8s application mode

2020-05-19 Thread Yang Wang (Jira)
Yang Wang created FLINK-17810:
-

 Summary: Add document for K8s application mode
 Key: FLINK-17810
 URL: https://issues.apache.org/jira/browse/FLINK-17810
 Project: Flink
  Issue Type: Sub-task
Reporter: Yang Wang


Add document for how to start/stop K8s application cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Google Season of Document

2020-05-19 Thread Hossam Elsafty
Dear aljoscha and sjwiesman,

I am Hossam Elsafty a graduated from the Faculty of Engineering Computer and 
System Department. As one of my academic courses, I have finished Technical 
Reports Writing.

Regarding my previous experience, I have done two internships as a Software 
Engineer including one in Valeo Egypt and the other in the Saudi Arabia 
Ministry of National Gaurd. Also, I am certified from Udacity for finishing the 
Machine Learning Engineer Nanodegree program.

Currently, I am working as a Big Data Engineer for 1 year at Ejada System in 
Saudi Arabia, I worked in many projects for the Saudi Ministry of Finance using 
Cloudera Hadoop ecosystems.

I am sending this mail to ask if I can apply to your organization in  Google 
Season of Document.
I am looking forward to your reply.

Regards,
Hossam Fawzy Elsafty

Computer and System Department 2019
Faculty of Engineering ,Alexandria university



Discussion on Project Idea for Session Of docs 2020

2020-05-19 Thread Divya Sanghi
Hello Aljoscha , Sjwiesman

I am working on Big Data technologies and have hands-on experience on
Flink, Spark, Kafka.

These are some tasks done by me:
1. I did POC where I created a Docker image of Fink job and ran it on the
K8S cluster on the local machine.
Attaching my POC project: https://github.com/sanghisha145/flink_on_k8s

2. Even working on writing FlinkKafkaConsumer Job with autoscaling feature
to reduce consumer lag

I really find this project " Restructure the Table API & SQL Documentation
 " interesting and feel that I can contribute whole-heartedly to
documentation on it.

Let me know where I can start?

PS:  I have not written any open documentation but has written many for my
organization(created many technical articles on confluence page)

Thanks
Divya


Re: [NOTICE] Azure Pipelines Status

2020-05-19 Thread Robert Metzger
Microsoft has not increased our capacity yet (even though it was promised
to me yesterday again).

I have now merged a hotfix disabling the e2e test execution on pull
requests to have enough capacity on master.
Please run e2e tests using your private Azure accounts. Thanks for your
understanding!

Best,
Robert


On Thu, May 14, 2020 at 11:23 AM Robert Metzger  wrote:

> Roughly speaking, I see the following problematic areas (I have initially
> tried running the E2E tests on those machines):
>
> a) e2e tests starting Docker images (including Kubernetes). Since the
> tests on the Ali infra are running in docker themselves, we need to adjust
> the test scripts (which is not trivial, because both containers need to be
> in the same network, and the volume mount paths are different)
>
> b) tests that modify the underlying file system: common_kubernetes.sh
> installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a
> problem in the docker environment).
>
> c) Tests that don't clean up properly when failing. IIRC I saw leftover
> docker containers by test_streaming_kinesis.sh when I was trying to run the
> E2E tests on the Ali machines.
>
> And then there pull requests that propose changes to the e2e scripts that
> mess something up :)
> We certainly need to isolate the e2e test execution somehow. Maybe we
> could launch VMs on the Ali machines for running the E2Es? (Using Vagrant)
>
> If Microsoft is not going to provide us with more test capacity, I will
> evaluate other options for the E2E tests.
>
>
> On Thu, May 14, 2020 at 10:36 AM Till Rohrmann 
> wrote:
>
>> Thanks for the update Robert.
>>
>> One idea to make the e2e also run on the Alibaba infrastructure would be
>> to
>> ensure that e2e tests clean up after they have run. Do we know which e2e
>> tests don't do this properly?
>>
>> Cheers,
>> Till
>>
>> On Thu, May 14, 2020 at 8:38 AM Robert Metzger 
>> wrote:
>>
>> > Hi all,
>> >
>> > tl;dr: I will have to cancel some E2E test executions of pull requests
>> > because we have reached the capacity limit of Flink's Azure Pipelines
>> > account.
>> >
>> > Long version: We have two types of agent pools in Azure Pipelines:
>> > Microsoft-hosted VMs and Alibaba-hosted Docker environment.
>> > In the Microsoft VMs, we are running the E2E tests, because we have an
>> > environment that will always be destroyed after each execution (and the
>> E2E
>> > tests often leave dangling docker containers, processes etc.; and they
>> > modify files in system directories)
>> > In the Alibaba-hosted Docker environment, we are compiling and testing
>> the
>> > regular Maven tests.
>> >
>> > We only have 10 Microsoft-hosted VMs available, and each E2E execution
>> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E
>> > tests a day.
>> > On Tuesday, we had 110 builds, on Wednesday 98 builds.
>> > Because of this, I will (manually) cancel some E2E test executions for
>> pull
>> > requests. If I see that a PR is explicitly changing something on E2E
>> tests,
>> > I will keep it. If I see that a PR is a docs change, has other test
>> > failures etc., I will cancel the E2E execution.
>> >
>> > If you want to verify that the E2E tests are passing for your own
>> changes,
>> > you can set up Azure Pipelines for your GitHub account, it's free and
>> works
>> > quite well. Here's a tutorial:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository
>> >
>> > What can we do to avoid this situation in the future?
>> > Sadly, Microsoft does not allow to buy additional processing slots for
>> open
>> > source projects [1]. However, I'm in touch with a product manager at
>> > Microsoft who promised me (yesterday) to increase the limit for us.
>> >
>> > In the Alibaba environment, we have 80 slots available, and usually no
>> > capacity constraints. This means we don't need to make compromises
>> there.
>> >
>> > Sorry for this inconvenience.
>> >
>> > Best,
>> > Robert
>> >
>> > PS: I'm considering keeping this thread as a permanent "status update"
>> > thread for Azure Pipelines
>> >
>> > [1]
>> >
>> >
>> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html
>> >
>>
>


[jira] [Created] (FLINK-17811) Update docker hub Flink page

2020-05-19 Thread Andrey Zagrebin (Jira)
Andrey Zagrebin created FLINK-17811:
---

 Summary: Update docker hub Flink page
 Key: FLINK-17811
 URL: https://issues.apache.org/jira/browse/FLINK-17811
 Project: Flink
  Issue Type: Task
  Components: Deployment / Docker
Reporter: Andrey Zagrebin


In FLINK-17161, we refactored the Flink docker images docs. We should also 
update and possibly link the related Flink docs about docker integration in 
[docker hub Flink image 
description|https://hub.docker.com/_/flink?tab=description].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17812) Bundled reporters in plugins/ directory

2020-05-19 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-17812:


 Summary: Bundled reporters in plugins/ directory
 Key: FLINK-17812
 URL: https://issues.apache.org/jira/browse/FLINK-17812
 Project: Flink
  Issue Type: Improvement
  Components: Build System, Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.11.0


In FLINK-16963 we converted all metric reporters into plugins. Along with that 
we removed all relocations, as they are now unnecessary.

However, users so far enabled reporters by moving them to /lib, which now could 
result in dependency clashes.

To prevent this, and ease the overall setup, we could instead bundle all 
reporters in the plugins directory of the distribution, instead of opt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17813) Manually test unaligned checkpoints on a cluster

2020-05-19 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-17813:
--

 Summary: Manually test unaligned checkpoints on a cluster
 Key: FLINK-17813
 URL: https://issues.apache.org/jira/browse/FLINK-17813
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Checkpointing, Runtime / Network
Reporter: Piotr Nowojski
Assignee: Roman Khachatryan
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Google Season of Document

2020-05-19 Thread Seth Wiesman
Hi Hossam,

Please apply! We look forward to reviewing your application. Let us know if
you have any questions.

Seth

On Tue, May 19, 2020 at 8:09 AM Hossam Elsafty  wrote:

> Dear aljoscha and sjwiesman,
>
> I am Hossam Elsafty a graduated from the Faculty of Engineering Computer
> and System Department. As one of my academic courses, I have finished
> Technical Reports Writing.
>
> Regarding my previous experience, I have done two internships as a
> Software Engineer including one in Valeo Egypt and the other in the Saudi
> Arabia Ministry of National Gaurd. Also, I am certified from Udacity for
> finishing the Machine Learning Engineer Nanodegree program.
>
> Currently, I am working as a Big Data Engineer for 1 year at Ejada System
> in Saudi Arabia, I worked in many projects for the Saudi Ministry of
> Finance using Cloudera Hadoop ecosystems.
>
> I am sending this mail to ask if I can apply to your organization in
> Google Season of Document.
> I am looking forward to your reply.
>
> Regards,
> Hossam Fawzy Elsafty
>
> Computer and System Department 2019
> Faculty of Engineering ,Alexandria university
>
>


Re: Discussion on Project Idea for Session Of docs 2020

2020-05-19 Thread Seth Wiesman
Hi Divya,

Thrilled to see your interest in the project. We are looking to work with
someone to expand Flink SQL's reach. What we proposed on the blog is our
initial idea's but we are open to any suggestions or modifications on the
proposal if you have ideas about how it could be improved. I am happy to
answer any specific questions, but otherwise, the first step is to apply
via the GSoD website[1].

Seth

[1] https://developers.google.com/season-of-docs/docs/tech-writer-guide

On Tue, May 19, 2020 at 8:09 AM Divya Sanghi 
wrote:

> Hello Aljoscha , Sjwiesman
>
> I am working on Big Data technologies and have hands-on experience on
> Flink, Spark, Kafka.
>
> These are some tasks done by me:
> 1. I did POC where I created a Docker image of Fink job and ran it on the
> K8S cluster on the local machine.
> Attaching my POC project: https://github.com/sanghisha145/flink_on_k8s
>
> 2. Even working on writing FlinkKafkaConsumer Job with autoscaling feature
> to reduce consumer lag
>
> I really find this project " Restructure the Table API & SQL Documentation
>  " interesting and feel that I can contribute whole-heartedly to
> documentation on it.
>
> Let me know where I can start?
>
> PS:  I have not written any open documentation but has written many for my
> organization(created many technical articles on confluence page)
>
> Thanks
> Divya
>


Re: Interested In Google Season of Docs!

2020-05-19 Thread Seth Wiesman
Hi Roopal,

*Restructure the Table API & SQL Documentation*

Right now the Table / SQL documentation is very technical and dry to read.
It is more like a technical specification than documentation that a new
user could read. This project is to take the existing content and
rewrite/reorganize it to make what the community already has more
approachable.

*Extend the Table API & SQL Documentation*

This is to add brand new content. This could be tutorials and walkthroughs
to help onboard new users or expanding an under-documented section such as
Hive interop.

These projects admittedly sound very similar and I will make a note to
better explain them in future applications if the community applies next
year.

Please let me know if you have any more questions,

Seth

On Mon, May 18, 2020 at 2:08 AM Roopal Jain  wrote:

> Hi Marta,
>
> Thank you for responding to my email. I went through the link you attached.
> I have a few questions:
> 1. As per the https://flink.apache.org/news/2020/05/04/season-of-docs.html
> ,
> two ideas are mentioned, one is restructuring the Table API & SQL
> documentation and other is extending the same. I want to know how are these
> two different? Can one person do both?
> 2. You mentioned that FLIP-60 is very broad, should I start picking one
> section from it, for example, Table API and start contributing content from
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/ in
> the specified order in FLIP-60? Can you please guide me more on how should
> I proceed ahead (choosing a specific area)?
>
>
> Thanks & Regards,
> Roopal Jain
>
>
> On Sat, 16 May 2020 at 17:39, Marta Paes Moreira 
> wrote:
>
> > Hi, Roopal.
> >
> > Thanks for reaching out, we're glad to see that you're interested in
> > giving the Flink docs a try!
> >
> > To participate in Google Season of Docs (GSoD), you'll have to submit an
> > application once the application period opens (June 9th). Because FLIP-60
> > is very broad, the first step would be to think about what parts of it
> > you'd like to focus on — or, what you'd like to have as a project
> proposal
> > [1].
> >
> > You can find more information about the whole process of participating in
> > GSoD in [2].
> >
> > Let me know if I can help you with anything else or if you need support
> > with choosing your focus areas.
> >
> > Marta
> >
> > [1]
> >
> https://developers.google.com/season-of-docs/docs/tech-writer-application-hints#project_proposal
> > [2]
> >
> https://developers.google.com/season-of-docs/docs/tech-writer-application-hints
> >
> > On Sat, May 16, 2020 at 8:32 AM Roopal Jain  wrote:
> >
> >> Hello Flink Dev Community!
> >>
> >> I am interested in participating in Google Season of Docs for Apache
> >> Flink.
> >> I went through the FLIP-60 detailed proposal and thought this is
> something
> >> I could do well. I am currently working as a software engineer and have
> a
> >> B.E in Computer Engineering from one of India's reputed engineering
> >> colleges. I have prior open-source contribution with mentoring for
> Google
> >> Summer of Code and Google Code-In.
> >> I have prior work experience on Apache Spark and a good grasp on SQL,
> >> Java,
> >> and Python.
> >> Please guide me more on how to get started?
> >>
> >> Thanks & Regards,
> >> Roopal Jain
> >>
> >
>


[jira] [Created] (FLINK-17814) Translate native kubernetes document to Chinese

2020-05-19 Thread Yang Wang (Jira)
Yang Wang created FLINK-17814:
-

 Summary: Translate native kubernetes document to Chinese
 Key: FLINK-17814
 URL: https://issues.apache.org/jira/browse/FLINK-17814
 Project: Flink
  Issue Type: Task
Reporter: Yang Wang


[https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html]

 

Translate the native kubernetes document to Chinese.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17815) Change KafkaConnector to give per-partition metric group to WatermarkGenerator

2020-05-19 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-17815:


 Summary: Change KafkaConnector to give per-partition metric group 
to WatermarkGenerator
 Key: FLINK-17815
 URL: https://issues.apache.org/jira/browse/FLINK-17815
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kafka
Reporter: Aljoscha Krettek


We currently give a reference to the general consumer {{MetricGroup}}, this 
means that all {{WatermarkGenerators}} would write to the same metric group.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17816) Change Latency Marker to work with "scheduleAtFixedDelay" instead of "scheduleAtFixedRate"

2020-05-19 Thread Stephan Ewen (Jira)
Stephan Ewen created FLINK-17816:


 Summary: Change Latency Marker to work with "scheduleAtFixedDelay" 
instead of "scheduleAtFixedRate"
 Key: FLINK-17816
 URL: https://issues.apache.org/jira/browse/FLINK-17816
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Reporter: Stephan Ewen


Latency Markers and other periodic timers are scheduled with 
{{scheduleAtFixedRate}}. That means every X time the callable is called. If it 
blocks (backpressure) is can be called immediately again.

I would suggest to switch this to {{scheduleAtFixedDelay}}  to avoid calling 
for a lot of latency marker injections when there is no way to actually execute 
the injection call.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17817) CollectResultFetcher fails with EOFException in AggregateReduceGroupingITCase

2020-05-19 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17817:
--

 Summary: CollectResultFetcher fails with EOFException in 
AggregateReduceGroupingITCase
 Key: FLINK-17817
 URL: https://issues.apache.org/jira/browse/FLINK-17817
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.11.0
Reporter: Robert Metzger


CI: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1826&view=logs&j=e25d5e7e-2a9c-5589-4940-0b638d75a414&t=f83cd372-208c-5ec4-12a8-337462457129

{code}
2020-05-19T10:34:18.3224679Z [ERROR] 
testSingleAggOnTable_SortAgg(org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase)
  Time elapsed: 7.537 s  <<< ERROR!
2020-05-19T10:34:18.3225273Z java.lang.RuntimeException: Failed to fetch next 
result
2020-05-19T10:34:18.3227634Zat 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:92)
2020-05-19T10:34:18.3228518Zat 
org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:63)
2020-05-19T10:34:18.3229170Zat 
org.apache.flink.shaded.guava18.com.google.common.collect.Iterators.addAll(Iterators.java:361)
2020-05-19T10:34:18.3229863Zat 
org.apache.flink.shaded.guava18.com.google.common.collect.Lists.newArrayList(Lists.java:160)
2020-05-19T10:34:18.3230586Zat 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.executeQuery(BatchTestBase.scala:300)
2020-05-19T10:34:18.3231303Zat 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.check(BatchTestBase.scala:141)
2020-05-19T10:34:18.3231996Zat 
org.apache.flink.table.planner.runtime.utils.BatchTestBase.checkResult(BatchTestBase.scala:107)
2020-05-19T10:34:18.3232847Zat 
org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable(AggregateReduceGroupingITCase.scala:176)
2020-05-19T10:34:18.3233694Zat 
org.apache.flink.table.planner.runtime.batch.sql.agg.AggregateReduceGroupingITCase.testSingleAggOnTable_SortAgg(AggregateReduceGroupingITCase.scala:122)
2020-05-19T10:34:18.3234461Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-19T10:34:18.3234983Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-19T10:34:18.3235632Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-19T10:34:18.3236615Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-19T10:34:18.3237256Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-19T10:34:18.3237965Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-19T10:34:18.3238750Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-19T10:34:18.3239314Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-19T10:34:18.3239838Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2020-05-19T10:34:18.3240362Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-05-19T10:34:18.3240803Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-05-19T10:34:18.3243624Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-19T10:34:18.3244531Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-05-19T10:34:18.3245325Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-05-19T10:34:18.3246086Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-05-19T10:34:18.3246765Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-19T10:34:18.3247390Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-19T10:34:18.3248012Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-05-19T10:34:18.3248779Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-05-19T10:34:18.3249417Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-05-19T10:34:18.3250357Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2020-05-19T10:34:18.3251021Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2020-05-19T10:34:18.3251597Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-19T10:34:18.3252141Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-05-19T10:34:18.3252798Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
2020-05-19T10:34:18.3253527Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
202

Re: [NOTICE] Azure Pipelines Status

2020-05-19 Thread Robert Metzger
Microsoft has now doubled our CI capacity (to 20 concurrent VMs for
executing e2e tests).
If the e2e test execution is normalized tomorrow, I will revert the hotfix,
enabling e2e tests on PRs again.

Sorry for the back and forth.

On Tue, May 19, 2020 at 3:11 PM Robert Metzger  wrote:

> Microsoft has not increased our capacity yet (even though it was promised
> to me yesterday again).
>
> I have now merged a hotfix disabling the e2e test execution on pull
> requests to have enough capacity on master.
> Please run e2e tests using your private Azure accounts. Thanks for your
> understanding!
>
> Best,
> Robert
>
>
> On Thu, May 14, 2020 at 11:23 AM Robert Metzger 
> wrote:
>
>> Roughly speaking, I see the following problematic areas (I have initially
>> tried running the E2E tests on those machines):
>>
>> a) e2e tests starting Docker images (including Kubernetes). Since the
>> tests on the Ali infra are running in docker themselves, we need to adjust
>> the test scripts (which is not trivial, because both containers need to be
>> in the same network, and the volume mount paths are different)
>>
>> b) tests that modify the underlying file system: common_kubernetes.sh
>> installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a
>> problem in the docker environment).
>>
>> c) Tests that don't clean up properly when failing. IIRC I saw leftover
>> docker containers by test_streaming_kinesis.sh when I was trying to run the
>> E2E tests on the Ali machines.
>>
>> And then there pull requests that propose changes to the e2e scripts that
>> mess something up :)
>> We certainly need to isolate the e2e test execution somehow. Maybe we
>> could launch VMs on the Ali machines for running the E2Es? (Using Vagrant)
>>
>> If Microsoft is not going to provide us with more test capacity, I will
>> evaluate other options for the E2E tests.
>>
>>
>> On Thu, May 14, 2020 at 10:36 AM Till Rohrmann 
>> wrote:
>>
>>> Thanks for the update Robert.
>>>
>>> One idea to make the e2e also run on the Alibaba infrastructure would be
>>> to
>>> ensure that e2e tests clean up after they have run. Do we know which e2e
>>> tests don't do this properly?
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, May 14, 2020 at 8:38 AM Robert Metzger 
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > tl;dr: I will have to cancel some E2E test executions of pull requests
>>> > because we have reached the capacity limit of Flink's Azure Pipelines
>>> > account.
>>> >
>>> > Long version: We have two types of agent pools in Azure Pipelines:
>>> > Microsoft-hosted VMs and Alibaba-hosted Docker environment.
>>> > In the Microsoft VMs, we are running the E2E tests, because we have an
>>> > environment that will always be destroyed after each execution (and
>>> the E2E
>>> > tests often leave dangling docker containers, processes etc.; and they
>>> > modify files in system directories)
>>> > In the Alibaba-hosted Docker environment, we are compiling and testing
>>> the
>>> > regular Maven tests.
>>> >
>>> > We only have 10 Microsoft-hosted VMs available, and each E2E execution
>>> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E
>>> > tests a day.
>>> > On Tuesday, we had 110 builds, on Wednesday 98 builds.
>>> > Because of this, I will (manually) cancel some E2E test executions for
>>> pull
>>> > requests. If I see that a PR is explicitly changing something on E2E
>>> tests,
>>> > I will keep it. If I see that a PR is a docs change, has other test
>>> > failures etc., I will cancel the E2E execution.
>>> >
>>> > If you want to verify that the E2E tests are passing for your own
>>> changes,
>>> > you can set up Azure Pipelines for your GitHub account, it's free and
>>> works
>>> > quite well. Here's a tutorial:
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository
>>> >
>>> > What can we do to avoid this situation in the future?
>>> > Sadly, Microsoft does not allow to buy additional processing slots for
>>> open
>>> > source projects [1]. However, I'm in touch with a product manager at
>>> > Microsoft who promised me (yesterday) to increase the limit for us.
>>> >
>>> > In the Alibaba environment, we have 80 slots available, and usually no
>>> > capacity constraints. This means we don't need to make compromises
>>> there.
>>> >
>>> > Sorry for this inconvenience.
>>> >
>>> > Best,
>>> > Robert
>>> >
>>> > PS: I'm considering keeping this thread as a permanent "status update"
>>> > thread for Azure Pipelines
>>> >
>>> > [1]
>>> >
>>> >
>>> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html
>>> >
>>>
>>


[jira] [Created] (FLINK-17818) CSV Reader with Pojo Type and no field names fails

2020-05-19 Thread Danish Amjad (Jira)
Danish Amjad created FLINK-17818:


 Summary: CSV Reader with Pojo Type and no field names fails
 Key: FLINK-17818
 URL: https://issues.apache.org/jira/browse/FLINK-17818
 Project: Flink
  Issue Type: Bug
  Components: API / DataSet
Affects Versions: 1.10.1
Reporter: Danish Amjad


When a file is read with a CSVReader and a POJO is specified and the filed 
names are not specified, the output is obviously not correct. The API is not 
throwing any error despite a null check inside the API. i.e.

{code}
Preconditions.checkNotNull(pojoFields, "POJO fields must be specified (not 
null) if output type is a POJO.");
{code}

The *root cause* of the problem is that the _fieldNames_ argument is a variable 
argument and the variable is 'empty but not null' when not given any value. So, 
the check passes without failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17819) Yarn error unhelpful when forgetting HADOOP_CLASSPATH

2020-05-19 Thread Arvid Heise (Jira)
Arvid Heise created FLINK-17819:
---

 Summary: Yarn error unhelpful when forgetting HADOOP_CLASSPATH
 Key: FLINK-17819
 URL: https://issues.apache.org/jira/browse/FLINK-17819
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / YARN
Reporter: Arvid Heise


When running
{code:bash}
flink run -m yarn-cluster -yjm 1768 -ytm 50072 -ys 32 ...
{code}
without some export HADOOP_CLASSPATH, we get the unhelpful message
{noformat}
Could not build the program from JAR file: JAR file does not exist: -yjm
{noformat}
I'd expect something like
{noformat}
yarn-cluster can only be used with exported HADOOP_CLASSPATH, see  for 
more information{noformat}
 

I suggest to load a stub for YarnCluster deployment if the actual 
implementation fails to load, which prints this error when used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17820) Memory threshold is ignored for channel state

2020-05-19 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17820:
-

 Summary: Memory threshold is ignored for channel state
 Key: FLINK-17820
 URL: https://issues.apache.org/jira/browse/FLINK-17820
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.11.0


Config parameter state.backend.fs.memory-threshold is ignored for channel 
state. Causing each subtask to have a file per checkpoint. Regardless of the 
size of channel state (of this subtask).

This also causes slow cleanup and delays the next checkpoint.

 

The problem is that {{ChannelStateCheckpointWriter.finishWriteAndResult}} calls 
flush(); which actually flushes the data on disk.

 

>From FSDataOutputStream.flush Javadoc:

A completed flush does not mean that the data is necessarily persistent. Data 
persistence can is only assumed after calls to close() or sync().

 

Possible solutions:

1. not to flush in {{ChannelStateCheckpointWriter.finishWriteAndResult (which 
can lead to data loss in a wrapping stream).}}

{{2. change }}{{FsCheckpointStateOutputStream.flush behavior}}

{{3. wrap }}{{FsCheckpointStateOutputStream to prevent flush}}{{}}{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17821) Kafka010TableITCase>KafkaTableTestBase.testKafkaSourceSink failed on AZP

2020-05-19 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-17821:
---

 Summary: 
Kafka010TableITCase>KafkaTableTestBase.testKafkaSourceSink failed on AZP
 Key: FLINK-17821
 URL: https://issues.apache.org/jira/browse/FLINK-17821
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.12.0
Reporter: Zhu Zhu


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1871&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=34f486e1-e1e4-5dd2-9c06-bfdd9b9c74a8&l=12032

2020-05-19T16:29:40.7239430Z Test testKafkaSourceSink[legacy = false, topicId = 
1](org.apache.flink.streaming.connectors.kafka.table.Kafka010TableITCase) 
failed with:
2020-05-19T16:29:40.7240291Z java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
2020-05-19T16:29:40.7241033Zat 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2020-05-19T16:29:40.7241542Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
2020-05-19T16:29:40.7242127Zat 
org.apache.flink.table.planner.runtime.utils.TableEnvUtil$.execInsertSqlAndWaitResult(TableEnvUtil.scala:31)
2020-05-19T16:29:40.7242729Zat 
org.apache.flink.table.planner.runtime.utils.TableEnvUtil.execInsertSqlAndWaitResult(TableEnvUtil.scala)
2020-05-19T16:29:40.7243239Zat 
org.apache.flink.streaming.connectors.kafka.table.KafkaTableTestBase.testKafkaSourceSink(KafkaTableTestBase.java:145)
2020-05-19T16:29:40.7243691Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-19T16:29:40.7244273Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-19T16:29:40.7244729Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-19T16:29:40.7245117Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-19T16:29:40.7245515Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-19T16:29:40.7245956Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-19T16:29:40.7246419Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-19T16:29:40.7246870Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-19T16:29:40.7247287Zat 
org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
2020-05-19T16:29:40.7251320Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-19T16:29:40.7251833Zat 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
2020-05-19T16:29:40.7252251Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
2020-05-19T16:29:40.7252716Zat 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
2020-05-19T16:29:40.7253117Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-19T16:29:40.7253502Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-19T16:29:40.7254041Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-05-19T16:29:40.7254528Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-05-19T16:29:40.7255500Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-05-19T16:29:40.7256064Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-05-19T16:29:40.7256438Zat 
org.junit.runners.Suite.runChild(Suite.java:128)
2020-05-19T16:29:40.7256758Zat 
org.junit.runners.Suite.runChild(Suite.java:27)
2020-05-19T16:29:40.7257118Zat 
org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
2020-05-19T16:29:40.7257486Zat 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
2020-05-19T16:29:40.7257885Zat 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
2020-05-19T16:29:40.7258389Zat 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
2020-05-19T16:29:40.7258821Zat 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
2020-05-19T16:29:40.7259219Zat 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
2020-05-19T16:29:40.7259664Zat 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
2020-05-19T16:29:40.7260098Zat 
org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
2020-05-19T16:29:40.7260635Zat 
org.junit.rules.RunRules.evaluate(RunRules.java:20)
2020-05-19T16:29:40.7261065Zat 
org.junit.runners.ParentRunner.run(ParentRunner.java:363)
2020-05-19T16:29:40.7261467Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
2020-05-19T16:29:40.7261952Zat 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:27

[jira] [Created] (FLINK-17822) Flink CLI end-to-end test failed with "JavaGcCleanerWrapper$PendingCleanersRunner cannot access class jdk.internal.misc.SharedSecrets" in JDK 11

2020-05-19 Thread Dian Fu (Jira)
Dian Fu created FLINK-17822:
---

 Summary: Flink CLI end-to-end test failed with 
"JavaGcCleanerWrapper$PendingCleanersRunner cannot access class 
jdk.internal.misc.SharedSecrets" in JDK 11 
 Key: FLINK-17822
 URL: https://issues.apache.org/jira/browse/FLINK-17822
 Project: Flink
  Issue Type: Bug
Reporter: Dian Fu


Instance: 
https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/1887/logs/600

{code}
2020-05-19T21:59:39.8829043Z 2020-05-19 21:59:25,193 ERROR 
org.apache.flink.util.JavaGcCleanerWrapper   [] - FATAL 
UNEXPECTED - Failed to invoke waitForReferenceProcessing
2020-05-19T21:59:39.8829849Z java.lang.IllegalAccessException: class 
org.apache.flink.util.JavaGcCleanerWrapper$PendingCleanersRunner cannot access 
class jdk.internal.misc.SharedSecrets (in module java.base) because module 
java.base does not export jdk.internal.misc to unnamed module @54e3658c
2020-05-19T21:59:39.8830707Zat 
jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:361) 
~[?:?]
2020-05-19T21:59:39.8831166Zat 
java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:591) ~[?:?]
2020-05-19T21:59:39.8831744Zat 
java.lang.reflect.Method.invoke(Method.java:558) ~[?:?]
2020-05-19T21:59:39.8832596Zat 
org.apache.flink.util.JavaGcCleanerWrapper$PendingCleanersRunner.getJavaLangRefAccess(JavaGcCleanerWrapper.java:362)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8833667Zat 
org.apache.flink.util.JavaGcCleanerWrapper$PendingCleanersRunner.tryRunPendingCleaners(JavaGcCleanerWrapper.java:351)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8834712Zat 
org.apache.flink.util.JavaGcCleanerWrapper$CleanerManager.tryRunPendingCleaners(JavaGcCleanerWrapper.java:207)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8835686Zat 
org.apache.flink.util.JavaGcCleanerWrapper.tryRunPendingCleaners(JavaGcCleanerWrapper.java:158)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8836652Zat 
org.apache.flink.runtime.memory.UnsafeMemoryBudget.reserveMemory(UnsafeMemoryBudget.java:94)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8838033Zat 
org.apache.flink.runtime.memory.UnsafeMemoryBudget.verifyEmpty(UnsafeMemoryBudget.java:64)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8839259Zat 
org.apache.flink.runtime.memory.MemoryManager.verifyEmpty(MemoryManager.java:172)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8840148Zat 
org.apache.flink.runtime.taskexecutor.slot.TaskSlot.verifyMemoryFreed(TaskSlot.java:311)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8841035Zat 
org.apache.flink.runtime.taskexecutor.slot.TaskSlot.lambda$closeAsync$1(TaskSlot.java:301)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8841603Zat 
java.util.concurrent.CompletableFuture.uniRunNow(CompletableFuture.java:815) 
~[?:?]
2020-05-19T21:59:39.8842069Zat 
java.util.concurrent.CompletableFuture.uniRunStage(CompletableFuture.java:799) 
~[?:?]
2020-05-19T21:59:39.8842844Zat 
java.util.concurrent.CompletableFuture.thenRun(CompletableFuture.java:2121) 
~[?:?]
2020-05-19T21:59:39.8843828Zat 
org.apache.flink.runtime.taskexecutor.slot.TaskSlot.closeAsync(TaskSlot.java:300)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8844790Zat 
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl.freeSlotInternal(TaskSlotTableImpl.java:404)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8845754Zat 
org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl.freeSlot(TaskSlotTableImpl.java:365)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8846842Zat 
org.apache.flink.runtime.taskexecutor.TaskExecutor.freeSlotInternal(TaskExecutor.java:1589)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8847711Zat 
org.apache.flink.runtime.taskexecutor.TaskExecutor.freeSlot(TaskExecutor.java:967)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8848295Zat 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
2020-05-19T21:59:39.8848732Zat 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 ~[?:?]
2020-05-19T21:59:39.8849228Zat 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:?]
2020-05-19T21:59:39.8849669Zat 
java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
2020-05-19T21:59:39.8850656Zat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:284)
 ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
2020-05-19T21:59:39.8851589Zat

[jira] [Created] (FLINK-17823) Resolve the race condition while releasing RemoteInputChannel

2020-05-19 Thread Zhijiang (Jira)
Zhijiang created FLINK-17823:


 Summary: Resolve the race condition while releasing 
RemoteInputChannel
 Key: FLINK-17823
 URL: https://issues.apache.org/jira/browse/FLINK-17823
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.11.0
Reporter: Zhijiang
Assignee: Zhijiang
 Fix For: 1.11.0


RemoteInputChannel#releaseAllResources might be called by canceler thread. 
Meanwhile, the task thread can also call RemoteInputChannel#getNextBuffer. 
There probably cause two potential problems:
 * Task thread might get null buffer after canceler thread already released all 
the buffers, then it might cause misleading NPE in getNextBuffer.
 * Task thread and canceler thread might pull the same buffer concurrently, 
which causes unexpected exception when the same buffer is recycled twice.

The solution is to properly synchronize the buffer queue in release method to 
avoid the same buffer pulled by both canceler thread and task thread. And in 
getNextBuffer method, we add some explicit checks to avoid misleading NPE and 
hint some valid exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17824) "Resuming Savepoint" e2e stalls indefinitely

2020-05-19 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17824:
--

 Summary: "Resuming Savepoint" e2e stalls indefinitely 
 Key: FLINK-17824
 URL: https://issues.apache.org/jira/browse/FLINK-17824
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Tests
Reporter: Robert Metzger


CI; 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1887&view=logs&j=91bf6583-3fb2-592f-e4d4-d79d79c3230a&t=94459a52-42b6-5bfc-5d74-690b5d3c6de8

{code}
2020-05-19T21:05:52.9696236Z 
==
2020-05-19T21:05:52.9696860Z Running 'Resuming Savepoint (file, async, scale 
down) end-to-end test'
2020-05-19T21:05:52.9697243Z 
==
2020-05-19T21:05:52.9713094Z TEST_DATA_DIR: 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/temp-test-directory-52970362751
2020-05-19T21:05:53.1194478Z Flink dist directory: 
/home/vsts/work/1/s/flink-dist/target/flink-1.12-SNAPSHOT-bin/flink-1.12-SNAPSHOT
2020-05-19T21:05:53.2180375Z Starting cluster.
2020-05-19T21:05:53.9986167Z Starting standalonesession daemon on host fv-az558.
2020-05-19T21:05:55.5997224Z Starting taskexecutor daemon on host fv-az558.
2020-05-19T21:05:55.6223837Z Waiting for Dispatcher REST endpoint to come up...
2020-05-19T21:05:57.0552482Z Waiting for Dispatcher REST endpoint to come up...
2020-05-19T21:05:57.9446865Z Waiting for Dispatcher REST endpoint to come up...
2020-05-19T21:05:59.0098434Z Waiting for Dispatcher REST endpoint to come up...
2020-05-19T21:06:00.0569710Z Dispatcher REST endpoint is up.
2020-05-19T21:06:07.7099937Z Job (a92a74de8446a80403798bb4806b73f3) is running.
2020-05-19T21:06:07.7855906Z Waiting for job to process up to 200 records, 
current progress: 114 records ...
2020-05-19T21:06:55.5755111Z 
2020-05-19T21:06:55.5756550Z 

2020-05-19T21:06:55.5757225Z  The program finished with the following exception:
2020-05-19T21:06:55.5757566Z 
2020-05-19T21:06:55.5765453Z org.apache.flink.util.FlinkException: Could not 
stop with a savepoint job "a92a74de8446a80403798bb4806b73f3".
2020-05-19T21:06:55.5766873Zat 
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:485)
2020-05-19T21:06:55.5767980Zat 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:854)
2020-05-19T21:06:55.5769014Zat 
org.apache.flink.client.cli.CliFrontend.stop(CliFrontend.java:477)
2020-05-19T21:06:55.5770052Zat 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:921)
2020-05-19T21:06:55.5771107Zat 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:982)
2020-05-19T21:06:55.5772223Zat 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
2020-05-19T21:06:55.5773325Zat 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:982)
2020-05-19T21:06:55.5774871Z Caused by: 
java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
2020-05-19T21:06:55.5777183Zat 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
2020-05-19T21:06:55.5778884Zat 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
2020-05-19T21:06:55.5779920Zat 
org.apache.flink.client.cli.CliFrontend.lambda$stop$5(CliFrontend.java:483)
2020-05-19T21:06:55.5781175Z... 6 more
2020-05-19T21:06:55.5782391Z Caused by: 
java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint Coordinator 
is suspending.
2020-05-19T21:06:55.5783885Zat 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$stopWithSavepoint$9(SchedulerBase.java:890)
2020-05-19T21:06:55.5784992Zat 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)
2020-05-19T21:06:55.5786492Zat 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)
2020-05-19T21:06:55.5787601Zat 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
2020-05-19T21:06:55.5788682Zat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:402)
2020-05-19T21:06:55.5790308Zat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:195)
2020-05-19T21:06:55.5791664Zat 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
2020-05-19T21:06:55.5792767Zat 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
2020-05-19T21:06:55.5793756Zat 
akka.jap

Re: MODERATE for dev@flink.apache.org

2020-05-19 Thread Henry Saputra
Hi,

Looks like you are trying to send email to Apache Flink dev@ mailing list
without subscribing yet.

Please subscribe to the dev mailing list [1] to be able to see the reply
and follow up with your question.

Thanks,

Henry
On behalf of Apache Flink PMC

[1]
https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list


>
>
>
>
> -- Forwarded message --
> From: vishalovercome 
> To: dev@flink.apache.org
> Cc:
> Bcc:
> Date: Tue, 19 May 2020 22:13:39 -0700 (MST)
> Subject: Issues using StreamingFileSink (Cannot locate configuration:
> tried hadoop-metrics2-s3a-file-system.properties,
> hadoop-metrics2.properties)
> I've written a Flink job to stream updates to S3 using StreamingFileSink. I
> used the presto plugin for checkpointing and the hadoop plugin for writing
> part files to S3. I was going through my application logs and found these
> messages which I couldn't understand and would like to know if there's some
> step I might have missed.
> Even if these messages are harmless, I would like to understand what these
> mean.
> 2020-05-19 19:23:33,277 INFO
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector
>
> - Error when creating PropertyDescriptor for public final void
>
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)!
> Ignoring this property.
> *2020-05-19 19:23:33,310 WARN
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.metrics2.impl.MetricsConfig
>
> - Cannot locate configuration: tried
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties*
> 2020-05-19 19:23:33,397 INFO
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.metrics2.impl.MetricsSystemImpl
>
> - Scheduled Metric snapshot period at 10 second(s).
> 2020-05-19 19:23:33,397 INFO
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.metrics2.impl.MetricsSystemImpl
>
> - s3a-file-system metrics system started
> 2020-05-19 19:23:34,833 INFO
> org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.conf.Configuration.deprecation
>
> - fs.s3a.server-side-encryption-key is deprecated. Instead, use
> fs.s3a.server-side-encryption.key
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>
>


[jira] [Created] (FLINK-17825) HA end-to-end gets killed due to timeout

2020-05-19 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17825:
--

 Summary: HA end-to-end gets killed due to timeout
 Key: FLINK-17825
 URL: https://issues.apache.org/jira/browse/FLINK-17825
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Tests
Reporter: Robert Metzger
Assignee: Robert Metzger


CI: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=1867&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=1e2bbe5b-4657-50be-1f07-d84bfce5b1f5
{code}
2020-05-19T20:46:50.9034002Z Killed TM @ 104061
2020-05-19T20:47:05.8510180Z Killed TM @ 107775
2020-05-19T20:47:55.1181475Z Killed TM @ 108337
2020-05-19T20:48:16.7907005Z Test (pid: 89099) did not finish after 540 seconds.
2020-05-19T20:48:16.790Z Printing Flink logs and killing it:

[...]

2020-05-19T20:48:19.1016912Z 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: 
line 125: 89099 Terminated  ( cmdpid=$BASHPID; ( sleep 
$TEST_TIMEOUT_SECONDS; echo "Test (pid: $cmdpid) did not finish after 
$TEST_TIMEOUT_SECONDS seconds."; echo "Printing Flink logs and killing it:"; 
cat ${FLINK_DIR}/log/*; kill "$cmdpid" ) & watchdog_pid=$!; echo $watchdog_pid 
> $TEST_DATA_DIR/job_watchdog.pid; run_ha_test 4 ${STATE_BACKEND_TYPE} 
${STATE_BACKEND_FILE_ASYNC} ${STATE_BACKEND_ROCKS_INCREMENTAL} 
${ZOOKEEPER_VERSION} )
2020-05-19T20:48:19.1017985Z Stopping job timeout watchdog (with pid=89100)
2020-05-19T20:48:19.1018621Z 
/home/vsts/work/1/s/flink-end-to-end-tests/test-scripts/test_ha_datastream.sh: 
line 112: kill: (89100) - No such process
2020-05-19T20:48:19.1019000Z Killing JM watchdog @ 91127
2020-05-19T20:48:19.1019199Z Killing TM watchdog @ 91883
2020-05-19T20:48:19.1019424Z [FAIL] Test script contains errors.
2020-05-19T20:48:19.1019639Z Checking of logs skipped.
2020-05-19T20:48:19.1019785Z 
2020-05-19T20:48:19.1020329Z [FAIL] 'Running HA (rocks, non-incremental) 
end-to-end test' failed after 9 minutes and 0 seconds! Test exited with exit 
code 1
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)