Queryable State

2018-09-04 Thread Darshan Singh
Hi All,

I was playing with queryable state. As queryable stream can not be modified
how do I use the output of say my reduce function for further processing.

Below is 1 example. I am sure I have done it wrong :). I am using reduce
function twice or do I need to use rich reduce function and use the
queryable state there.

Thanks

public class reducetest {

public static void main(String[] args) throws Exception {


// set up streaming execution environment
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);
List nums = Arrays.asList(1,2,3,4,5,6,7,8,9,10);

ReducingStateDescriptor> descriptor =
new ReducingStateDescriptor>(
"sum", // the state name
new rf(),
TypeInformation.of(new
TypeHint>() {})); // type information
descriptor.setQueryable("reduce");

DataStream> ds1 =
env.fromCollection(nums).map(new MapFunction>() {
@Override
public Tuple2 map(Integer integer)
throws Exception {
return Tuple2.of(integer%2,integer);
}
}).keyBy(0)
;

((KeyedStream) ds1).asQueryableState("reduce",
descriptor).getStateDescriptor();


DataStream ds2 = ((KeyedStream) ds1).reduce( new rf());

//ds1.print();
ds2.print();

System.out.println(env.getExecutionPlan());

env.execute("re");
}



static class  rf implements ReduceFunction> {

@Override
public Tuple2 reduce(Tuple2 e, Tuple2 n) throws Exception {
return Tuple2.of(e.f0, e.f1 + n.f1);

}
}

}


Failed to trigger savepoint

2018-09-04 Thread Paul Lam
Hi, 

I’m using Flink 1.5.3 and failed to trigger savepoint for a Flink on YARN job. 
The stack traces shows that an exception occurred while triggering the 
checkpoint, but the normal checkpoints of the job are running well.

What could possibly be the problem? Thanks a lot!

The stack traces are as follow:

org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
1ca7d429484c64eb64fa646672389a74 failed.
at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:695)
at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$7(CliFrontend.java:673)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:670)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1040)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to 
trigger savepoint. Decline reason: An Exception occurred while triggering the 
checkpoint.
at 
org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
at 
java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
at 
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:951)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed to 
trigger savepoint. Decline reason: An Exception occurred while triggering the 
checkpoint.
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:614)
at 
java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1983)
at 
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:943)
... 21 more
Caused by: org.apache.flink.runtime.checkpoint.CheckpointTriggerException: 
Failed to trigger savepoint. Decline reason: An Exception occurred while 
triggering the checkpoint.
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:377)
at 
org.apache.flink.runtime.jobmaster.JobMaster.t

Re: Failed to trigger savepoint

2018-09-04 Thread Chesnay Schepler

You will have to take a look at the JobManager/TaskManager logs.

On 04.09.2018 12:02, Paul Lam wrote:

Hi,

I’m using Flink 1.5.3 and failed to trigger savepoint for a Flink on 
YARN job. The stack traces shows that an exception occurred while 
triggering the checkpoint, but the normal checkpoints of the job are 
running well.


What could possibly be the problem? Thanks a lot!

The stack traces are as follow:

org.apache.flink.util.FlinkException: Triggering a savepoint for the 
job 1ca7d429484c64eb64fa646672389a74 failed.
at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:695)
at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$7(CliFrontend.java:673)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)

at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:670)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1040)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)

at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed 
to trigger savepoint. Decline reason: An Exception occurred while 
triggering the checkpoint.
at 
org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955)
at 
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at 
java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
at 
java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
at 
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:951)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)

at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed 
to trigger savepoint. Decline reason: An Exception occurred while 
triggering the checkpoint.
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:614)
at 
java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1983)
at 
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:943)

... 21 more
Caused by: 
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed 
to trigger savepoint. Decline reason: An Exception occurred while 
triggering the checkpoint.
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:377)
at 
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:942)

... 21 more


Best,
Paul Lam





Cancel flink job occur exception.

2018-09-04 Thread 段丁瑞
Hi all,
  I submit a flink job through yarn-cluster mode and cancel job with 
savepoint option immediately after job status change to deployed. Sometimes i 
met this error:

org.apache.flink.util.FlinkException: Could not cancel job .
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:585)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:577)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1034)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not 
complete the operation. Number of retries has been exhausted.
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.client.program.rest.RestClusterClient.cancelWithSavepoint(RestClusterClient.java:398)
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:583)
... 6 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
Could not complete the operation. Number of retries has been exhausted.
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$1(RestClient.java:274)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:268)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:284)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at 
org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
... 1 more
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: 
Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 16 more
Caused by: java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 7 more

I check the jobmanager log, no error found. Savepoint is correct saved in 
hdfs. Yarn appliction status changed to FINISHED and FinalStatus change to 
KILLED.
I think this issue occur because RestClusterClient cannot find jobmanager 
addresss after Jobmanager(AM) has shutdown.
My flink version is 1.5.3.
Anyone could help me to resolve this issue, thanks!

devin.


Cancel flink job occur exception

2018-09-04 Thread 李瑞亮
Hi all,
  I submit a flink job through yarn-cluster mode and cancel job with 
savepoint option immediately after job status change to deployed. Sometimes i 
met this error:

org.apache.flink.util.FlinkException: Could not cancel job .
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:585)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:577)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1034)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not 
complete the operation. Number of retries has been exhausted.
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.client.program.rest.RestClusterClient.cancelWithSavepoint(RestClusterClient.java:398)
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$4(CliFrontend.java:583)
... 6 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException: 
Could not complete the operation. Number of retries has been exhausted.
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
... 1 more
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: 
Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943)
at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 16 more
Caused by: java.net.ConnectException: Connect refuse: xxx/xxx.xxx.xxx.xxx:xxx
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.flink.shaded.netty4.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at 
org.apache.flink.shaded.netty4.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:281)
... 7 more

I check the jobmanager log, no error found. Savepoint is correct saved in 
hdfs. Yarn appliction status changed to FINISHED and FinalStatus change to 
KILLED.
I think this issue occur because RestClusterClient cannot find jobmanager 
addresss after Jobmanager(AM) has shutdown.
My flink version is 1.5.3.
Anyone could help me to resolve this issue, thanks!

Best Regard!


Re: Why FlinkKafkaConsumer doesn't subscribe to topics?

2018-09-04 Thread Julio Biason
Hi Renjie,

1. For what I could grasp from Kafka docs, you can subscribe and still use
poll() to capture a specific offset. But I just read the starting point of
it and didn't go deep into it.

2. Currently, Flink 1.4.2, Kafka 0.10.1 and the FlinkKafkaConsumer010.

On Tue, Sep 4, 2018 at 12:47 AM, Renjie Liu  wrote:

> Hi, Julio:
> 1. Flink doesn't use subscribe because it needs to control partition
> assignment itself, which is important for implementing exactly once.
> 2. Can you share the versions you are using, including kafka, kafka
> client, flink?  We are also use flink kafka consumer and we can monitor it
> correctly.
>
> On Tue, Sep 4, 2018 at 3:09 AM Julio Biason 
> wrote:
>
>> Hey guys,
>>
>> We are trying to add external monitoring to our system, but we can only
>> get the lag in kafka topics while the Flink job is running -- if, for some
>> reason, the Flink job fails, we get no visibility on how big the lag is.
>>
>> (Besides that, the way Flink reports is not accurate and produces a lot
>> of -Inf, which I already discussed before.)
>>
>> While looking at the problem, we noticed that the FlinkKafkaConsumer
>> never uses `subscribe` to subscribe to the topics and that's why the values
>> are never stored back into Kafka, even when the driver itself does
>> `commitAsync`.
>>
>> Is there any reason for not subscribing to topics that I may have missed?
>>
>> --
>> *Julio Biason*, Sofware Engineer
>> *AZION*  |  Deliver. Accelerate. Protect.
>> Office: +55 51 3083 8101   |  Mobile: +55 51
>> *99907 0554*
>>
> --
> Liu, Renjie
> Software Engineer, MVAD
>



-- 
*Julio Biason*, Sofware Engineer
*AZION*  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101   |  Mobile: +55 51
*99907 0554*


Ask about running multiple jars for different stream jobs

2018-09-04 Thread Rad Rad
Hi All, 

Kindly, could I ask about running multiple jars for different stream jobs in
a cluster? Since I have two jars, each one joins different data streams. I
submitted the first one and it works fine, when I tried to submit the second
one, it failed. When I go to running tab i.e. 
http://mycluster:8081/#/running-jobs,  just the first one is running.

The same problem is found when I tried on standalone.  
My question is how I can I run two jars ( streaming jobs) in the same time
on Flink cluster?


Thanks. 
  



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Ask about running multiple jars for different stream jobs

2018-09-04 Thread Rad Rad
Hi All, 

Kindly, could I ask about running multiple jars for different stream jobs in
a cluster? Since I have two jars, each one joins different data streams. I
submitted the first one and it works fine, when I tried to submit the second
one, it failed. When I go to running tab i.e. 
http://mycluster:8081/#/running-jobs,  just the first one is running.

The same problem is found when I tried on standalone.  
My question is how I can I run two jars ( streaming jobs) in the same time
on Flink cluster?


Thanks. 
  



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


org.apache.flink.util.FlinkException: Could not cancel job

2018-09-04 Thread Chang Liu
Dear All,

I had the following issue when trying to cancel a job from CLI. I am
wondering am I in the proper way of canceling a job? Or, there is more
elegant way to do this, both in code or in CLI? Many Thanks!

BTW, I am have streaming coming from Kafka and producing to another Kafka
topic.

./bin/flink cancel b89f45024cf2e45914eaa920df95907f
Cancelling job b89f45024cf2e45914eaa920df95907f.


 The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not cancel job
b89f45024cf2e45914eaa920df95907f.
at
org.apache.flink.client.cli.CliFrontend.lambda$cancel$5(CliFrontend.java:603)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:955)
at
org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:596)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1029)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
at
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
Caused by: java.util.concurrent.ExecutionException:
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not
complete the operation. Exception is not retryable.
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at
org.apache.flink.client.program.rest.RestClusterClient.cancel(RestClusterClient.java:380)
at
org.apache.flink.client.cli.CliFrontend.lambda$cancel$5(CliFrontend.java:601)
... 6 more
Caused by: org.apache.flink.runtime.concurrent.FutureUtils$RetryException:
Could not complete the operation. Exception is not retryable.
at
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.rest.util.RestClientException: [Job could not be
found.]
at
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Job
could not be found.]
at
org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:225)
at
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:209)
at
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)
... 5 more


Best regards/祝好,

Chang Liu 刘畅


Flink1.6.0 submit job and got "No content to map due to end-of-input" Error

2018-09-04 Thread ? ??
Hi all,
 I use below way to submit jar to Flink :

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment(config.clusterIp,

config.clusterPort,

config.clusterFlinkJar);


I used Flink 1.3.2 before, and it works fine. But I upgrade it to 1.6.0, 
and I got the error below:

2018-09-04 21:38:32.039 [ERROR] [flink-rest-client-netty-19-1] 
org.apache.flink.runtime.rest.RestClient - Unexpected plain-text response:

2018-09-04 21:38:32.137 [ERROR] [flink-rest-client-netty-18-1] 
org.apache.flink.runtime.rest.RestClient - Response was not valid JSON.

org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
 No content to map due to end-of-input


Could you give me some advice to fix it?

yours,
Gongsen



Ask about running multiple jars for different stream jobs

2018-09-04 Thread Rad Rad
Hi All,

Kindly, could I ask about running multiple jars for different stream jobs in
a cluster? Since I have two jars,
1- jar 1 joins stream1⋈ stream2
2- jar 2 joins stream1⋈ stream3

 I submitted the first one and it works fine, when I tried to submit the
second one, it failed. When I go to Flink dashboard/ running tab i.e. 
http://mycluster:8081/#/running-jobs,  just the first one is running.

The same problem is found when I tried on standalone. 
My question is how I can I run two jars ( streaming jobs) in the same time
on Flink cluster?


Thanks.
  



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Why FlinkKafkaConsumer doesn't subscribe to topics?

2018-09-04 Thread Tzu-Li (Gordon) Tai
Hi Julio,

As Renjie had already mentioned, to achieve exactly-once semantics with the 
Kafka consumer, Flink needs to have control over the Kafka partition to source 
subtask assignment.

To add a bit more detail here, this is due to the fact that each subtask writes 
to Flink managed state the current offsets of partitions that it is assigned, 
and that is coordinated with Fink’s checkpoints.
If it were to use Kafka’s automatic consumer group assignments (i.e. when using 
the subscribe API), the consumer would have no control over when exactly 
partition subscriptions are reassigned across subtasks.
If I understood correctly, what you were suggesting in your last reply was to 
simply use poll() to query the offset in the case that some partition was 
reassigned to another source subtask.
This is problematic because there is no consistency guarantees between the 
committed offsets in Kafka and Fink’s checkpoints.
Committing of offsets are and should only be used as a means to expose consumer 
progress to the outside world beyond the Flink job.

Hope this provides a bit more insight.

Cheers,
Gordon

On 4 September 2018 at 2:25:38 PM, Julio Biason (julio.bia...@azion.com) wrote:

Hi Renjie,

1. For what I could grasp from Kafka docs, you can subscribe and still use 
poll() to capture a specific offset. But I just read the starting point of it 
and didn't go deep into it.

2. Currently, Flink 1.4.2, Kafka 0.10.1 and the FlinkKafkaConsumer010.

On Tue, Sep 4, 2018 at 12:47 AM, Renjie Liu  wrote:
Hi, Julio:
1. Flink doesn't use subscribe because it needs to control partition assignment 
itself, which is important for implementing exactly once.
2. Can you share the versions you are using, including kafka, kafka client, 
flink?  We are also use flink kafka consumer and we can monitor it correctly.

On Tue, Sep 4, 2018 at 3:09 AM Julio Biason  wrote:
Hey guys,

We are trying to add external monitoring to our system, but we can only get the 
lag in kafka topics while the Flink job is running -- if, for some reason, the 
Flink job fails, we get no visibility on how big the lag is.

(Besides that, the way Flink reports is not accurate and produces a lot of 
-Inf, which I already discussed before.)

While looking at the problem, we noticed that the FlinkKafkaConsumer never uses 
`subscribe` to subscribe to the topics and that's why the values are never 
stored back into Kafka, even when the driver itself does `commitAsync`.

Is there any reason for not subscribing to topics that I may have missed?

--
Julio Biason, Sofware Engineer
AZION  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101  |  Mobile: +55 51 99907 0554
--
Liu, Renjie
Software Engineer, MVAD



--
Julio Biason, Sofware Engineer
AZION  |  Deliver. Accelerate. Protect.
Office: +55 51 3083 8101  |  Mobile: +55 51 99907 0554

Re: org.apache.flink.util.FlinkException: Could not cancel job

2018-09-04 Thread Chesnay Schepler

Please check that the job ID is correct.

On 04.09.2018 15:48, Chang Liu wrote:

Dear All,

I had the following issue when trying to cancel a job from CLI. I am 
wondering am I in the proper way of canceling a job? Or, there is more 
elegant way to do this, both in code or in CLI? Many Thanks!


BTW, I am have streaming coming from Kafka and producing to another 
Kafka topic.


./bin/flink cancel b89f45024cf2e45914eaa920df95907f
Cancelling job b89f45024cf2e45914eaa920df95907f.


 The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not cancel job 
b89f45024cf2e45914eaa920df95907f.
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$5(CliFrontend.java:603)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:955)
at 
org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:596)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1029)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1096)
at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1096)
Caused by: java.util.concurrent.ExecutionException: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
not complete the operation. Exception is not retryable.
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at 
org.apache.flink.client.program.rest.RestClusterClient.cancel(RestClusterClient.java:380)
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$5(CliFrontend.java:601)

... 6 more
Caused by: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
not complete the operation. Exception is not retryable.
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:214)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.rest.util.RestClientException: [Job could not 
be found.]
at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at 
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:953)
at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)

... 4 more
Caused by: org.apache.flink.runtime.rest.util.RestClientException: 
[Job could not be found.]
at 
org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:225)
at 
org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:209)
at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:952)

... 5 more


Best regards/祝好,

Chang Liu 刘畅





Re: akka.ask.timeout setting not honored

2018-09-04 Thread Greg Finch
Hi Gary,

Turns out, the configuration warning you mentioned was the key.  The
akka.ask.timeout requires a duration unit, but the web.timeout setting is
looking for a long.  So the change I made earlier would not have applied
since it couldn't read `300s`.  Since making that change (`web.timeout:
30`), I have not been able to reproduce the error - everything starts
successfully every time.  I do have debug logging turned on for now.  If it
happens again in the next couple of days, I will send details with debug
logs.

Thanks again for your help!
Greg

On Fri, Aug 31, 2018 at 3:21 PM Gary Yao  wrote:

> Hi Greg,
>
> Unfortunately the environment information [1] is not logged. Can you set
> the
> log level for all Flink packages to DEBUG?
>
> Do you install Flink yourself on EMR, or do you use the pre-installed one?
> Can you show us the command with which you start the cluster/submit the
> job?
>
> I do not know if it is related but I found these warnings in your second
> log file:
>
> 2018-08-31 19:14:32 WARN
> org.apache.flink.configuration.Configuration  - Configuration cannot
> evaluate value 300s as a long integer number
> 2018-08-31 19:14:32 WARN
> org.apache.flink.configuration.Configuration  - Configuration cannot
> evaluate value 300s as a long integer number
>
> Best,
> Gary
>
> [1]
> https://github.com/apache/flink/blob/9ae5009b6a82248bfae99dac088c1f6e285aa70f/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L281
>
> On Fri, Aug 31, 2018 at 9:18 PM, Greg Finch 
> wrote:
>
>> Well ... that didn't take long.  The next time I tried, I got the Akka
>> timeout again.  Attached are the logs from the last attempt.  They're very
>> similar to the other logs I sent.
>>
>> On Fri, Aug 31, 2018 at 2:04 PM Greg Finch 
>> wrote:
>>
>>> Thanks Gary.  Attached is the jobmanager log.  You are correct that this
>>> is running on YARN.  I changed web.timeout as you suggested - that seems to
>>> be working the few times I tested it.  This problem comes and goes though -
>>> sometimes it starts before it times out.  I'll keep the web.timeout setting
>>> and reply again if the problem comes up again.  Thanks again for your quick
>>> response!
>>>
>>> On Fri, Aug 31, 2018 at 1:38 PM Gary Yao  wrote:
>>>
 Hi Greg,

 Can you describe the steps to reproduce the problem, or can you attach
 the
 full jobmanager logs? Because JobExecutionResultHandler appears in your
 log, I
 assume that you are starting a job cluster on YARN. Without seeing the
 complete logs, I cannot be sure what exactly happens. For now, you can
 try
 setting the config option web.timeout to a higher value.

 Best,
 Gary

 On Fri, Aug 31, 2018 at 8:01 PM, Greg Finch 
 wrote:

> I'm having a problem with akka timeout when starting my cluster.  The
> error is "Ask timed out after 1 ms.".  I have changed the
> akka.ask.timeout config setting to be 30 ms, but it still times out 
> and
> fails after 10 seconds.  I confirmed that the config is properly set by
> both checking the Job Manager configuration tab (it shows 30 ms) as
> well logging the output of AkkaUtils.getTimeout(configuration) which also
> shows 30ms.  It seems something is not honoring that configuration
> value.
>
> I did find a different thread that discussed the fact that the
> LocalStreamEnvironment will not honor this setting, but that is not my
> case.  I am running on a cluster (AWS EMR) using the regular
> StreamExecutionEnvironment.  This is Flink 1.5.2.
>
> Any ideas?
>
> ~
>
> 2018-08-31 17:37:55 INFO  
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new 
> token for : ip-10-213-139-66.ec2.internal:8041
> 2018-08-31 17:37:55 INFO  
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl  - Received new 
> token for : ip-10-213-136-25.ec2.internal:8041
> 2018-08-31 17:38:34 ERROR 
> o.a.flink.runtime.rest.handler.job.JobExecutionResultHandler  - 
> Implementation error: Unhandled exception.
> akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#-219618710]] after [1 ms]. 
> Sender[null] sent message of type 
> "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
>   at 
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
>   at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
>   at 
> scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>   at 
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
>   at 
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
>   at 
> akka.actor.LightArra

flink list and flink run commands timeout

2018-09-04 Thread Jason Kania
I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node cluster 
configured with HA. Now I am encountering an issue where the flink command line 
operations timeout. The exception generated is very poor because it only 
indicates a timeout and not the reason or what it was trying to do:
>./flink list -fWaiting for response...
 The program 
finished with the following exception:org.apache.flink.util.FlinkException: 
Failed to retrieve job list.        at 
org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:433)        
at org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:416)  
      at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)  
      at org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:413)     
   at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1028)  
      at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)    
    at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
        at 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)Caused by: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could not 
complete the operation. Exception is not retryable.        at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
       at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
        at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)        
at java.util.concurrent.FutureTask.run(FutureTask.java:266)        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
       at java.lang.Thread.run(Thread.java:748)Caused by: 
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException 
       at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)     
   at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
        ... 10 moreCaused by: java.util.concurrent.TimeoutException
The web interface shows the 2 job managers and 3 task managers that are talking 
with one another.
I have looked at the zookeeper data and it is all present.
I have tried running the command on multiple nodes and they all give the same 
error.
I looked for a verbose or debug option for the commands but found nothing.
Suggestions on this?
Thanks,
Jason

Re: Flink1.6.0 submit job and got "No content to map due to end-of-input" Error

2018-09-04 Thread vino yang
Hi Pangongsen,

Do you upgrade the Flink-related dependencies you use at the same time? In
other words, is the dependency consistent with the flink version?

Thanks, vino.

? ??  于2018年9月4日周二 下午10:07写道:

> Hi all,
>  I use below way to submit jar to Flink :
>
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.createRemoteEnvironment(config.clusterIp,
>
>   config.clusterPort,
>
>   config.clusterFlinkJar);
>
>
> I used Flink 1.3.2 before, and it works fine. But I upgrade it to
> 1.6.0, and I got the error below:
>
> 2018-09-04 21:38:32.039 [ERROR] [flink-rest-client-netty-19-1]
> org.apache.flink.runtime.rest.RestClient - Unexpected plain-text response:
>
> 2018-09-04 21:38:32.137 [ERROR] [flink-rest-client-netty-18-1]
> org.apache.flink.runtime.rest.RestClient - Response was not valid JSON.
>
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
> No content to map due to end-of-input
>
>
> Could you give me some advice to fix it?
>
> yours,
> Gongsen
>
>


Re: Ask about running multiple jars for different stream jobs

2018-09-04 Thread vino yang
Hi Rad,

Which mode is your Flink cluster? If it is Standalone or something like
YARN session, then of course you can run multiple jobs in a cluster.
In addition, can you post the exception that the second job failed?

Thanks, vino.

Rad Rad  于2018年9月4日周二 下午10:07写道:

> Hi All,
>
> Kindly, could I ask about running multiple jars for different stream jobs
> in
> a cluster? Since I have two jars,
> 1- jar 1 joins stream1⋈ stream2
> 2- jar 2 joins stream1⋈ stream3
>
>  I submitted the first one and it works fine, when I tried to submit the
> second one, it failed. When I go to Flink dashboard/ running tab i.e.
> http://mycluster:8081/#/running-jobs,  just the first one is running.
>
> The same problem is found when I tried on standalone.
> My question is how I can I run two jars ( streaming jobs) in the same time
> on Flink cluster?
>
>
> Thanks.
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Taskmanager process memory increasing always

2018-09-04 Thread Yan Zhou [FDS Science]
I have met similar issue. Yarn kills the TaskManagers, as their memory usage 
grows to the limit. I think it might be rocksdb causing the problem. Is there 
any way to debug the memory usage of rocksdb backend?


Best

Yan


From: YennieChen88 
Sent: Wednesday, August 29, 2018 6:14:11 AM
To: user@flink.apache.org
Subject: Taskmanager process memory increasing always

Hello,
My case is counting the number of successful login and failures within 1
hour, 10 min, 5 min, 3 min, 1 min, 10 second and 1 second, keyBy login ip or
device id. Based on previous counting results of different time dimensions,
predict the complicance of the next login.
After varous attempts, I chose slide windows to count, e.g. 1 hour 
window
size with 1 min window step, 10 min widonw size with 10 second window step,
5 min window with 5 second window step... Except this, I used rocksdb as
state backend, and enabled checkpoint.
But now encounter some problems.
1. The RES memory of every taskmanager process is increasing all the 
time
and can not be stable, until the process killed without any OOM exception
log.

  After several tests, I found that the process memory increase is 
related
to the key (ip or device id). If key values fix in a certain range,  process
memory can be stable. But if key values randomly changing, the memory
increasing. In fact, the key login ip and device id is random. We also found
that login reduces after the midnight, and the memory can be shortly stable.
But memory increases during the day. I ran a job 15 days ago, the memory is
still increasing.The key random changes, the memory increases, is it normal?

2. The rocksdb seems take up a lot of memory.
   If I changed rocksdb to file system state backend, the memory can 
drop
to around 30%. If there is no limit configuration, will rocksdb's used
memory increases all the time?

3. There are some taskmanagers of the flink cluster do not run any task 
(no
slot be used), but the memory is also increasing linearly after the job run
several days. What do they use memory for? I have no idea.


Hope for your reply. Thank you.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


答复: Flink1.6.0 submit job and got "No content to map due to end-of-input" Error

2018-09-04 Thread 潘 功森
Hi  Vino,

Below are dependencies I used,please have a look.

I floud it also inclued flink-connector-kafka-0.10_2.11-1.6.0.jar and 
flink-connector-kafka-0.9_2.11-1.6.0.jar, and I don’t know if it has any effect?

yours,
Gongsen
[cid:image001.png@01D4451B.3D5B1580]

发送自 Windows 10 版邮件应用


发件人: vino yang 
发送时间: Wednesday, September 5, 2018 10:35:59 AM
收件人: d...@flink.apache.org
抄送: user
主题: Re: Flink1.6.0 submit job and got "No content to map due to end-of-input" 
Error

Hi Pangongsen,

Do you upgrade the Flink-related dependencies you use at the same time? In 
other words, is the dependency consistent with the flink version?

Thanks, vino.

? ?? mailto:pangong...@hotmail.com>> 于2018年9月4日周二 
下午10:07写道:
Hi all,
 I use below way to submit jar to Flink :

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment(config.clusterIp,

config.clusterPort,

config.clusterFlinkJar);


I used Flink 1.3.2 before, and it works fine. But I upgrade it to 1.6.0, 
and I got the error below:

2018-09-04 21:38:32.039 [ERROR] [flink-rest-client-netty-19-1] 
org.apache.flink.runtime.rest.RestClient - Unexpected plain-text response:

2018-09-04 21:38:32.137 [ERROR] [flink-rest-client-netty-18-1] 
org.apache.flink.runtime.rest.RestClient - Response was not valid JSON.

org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
 No content to map due to end-of-input


Could you give me some advice to fix it?

yours,
Gongsen