[ https://issues.apache.org/jira/browse/FLINK-22086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ying zhang updated FLINK-22086: ------------------------------- Summary: Does it support rate-limiting on reading hive in flink 1.12?Read hive source lead to high loadaverage (was: Does it support rate-limiting in flink 1.12?Read hive source lead to high loadaverage) > Does it support rate-limiting on reading hive in flink 1.12?Read hive source > lead to high loadaverage > ----------------------------------------------------------------------------------------------------- > > Key: FLINK-22086 > URL: https://issues.apache.org/jira/browse/FLINK-22086 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive > Affects Versions: 1.12.0 > Reporter: ying zhang > Priority: Major > Attachments: image-2021-04-01-15-56-50-736.png, > image-2021-04-01-15-58-18-265.png, image-2021-04-01-16-07-28-469.png, > image-2021-04-01-16-08-28-442.png > > > I read hive source with flink sql batch,but I found a Exception like this: > org.apache.flink.runtime.JobException: Recovery is suppressed by > NoRestartBackoffTimeStrategy > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118) > at > org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:239) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:230) > at > org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:221) > at > org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:672) > at > org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:90) > at > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:453) > at sun.reflect.GeneratedMethodAccessor312.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:306) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:159) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: > org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > readAddress(..) failed: Connection reset by peer (connection to > '11.69.21.53/11.69.21.53:19977') > at > org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:201) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273) > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281) > at > org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907) > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.handleReadException(AbstractEpollStreamChannel.java:728) > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) > at > org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.flink.shaded.netty4.io.netty.channel.unix.Errors$NativeIoException: > readAddress(..) failed: Connection reset by peer > > > > then, I watch the monitor of the machine: > !image-2021-04-01-16-07-28-469.png! > > > > > I have the jstack log: > !image-2021-04-01-15-58-18-265.png! > > > "Source Data Fetcher for Source: > HiveSource-app.app_sdl_yinliu_search_query_log (164/250)#0" Id=815 > cpuUsage=99.96% deltaTime=201ms time=43270ms RUNNABLE > at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:412) > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:579) > at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:802) > at org.apache.hadoop.io.Text.decode(Text.java:412) > at org.apache.hadoop.io.Text.decode(Text.java:389) > at org.apache.hadoop.io.Text.toString(Text.java:280) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26) > at > org.apache.flink.table.functions.hive.conversion.HiveInspectors.toFlinkObject(HiveInspectors.java:291) > at > org.apache.flink.table.functions.hive.conversion.HiveInspectors.toFlinkObject(HiveInspectors.java:338) > at > org.apache.flink.connectors.hive.read.HiveMapredSplitReader.nextRecord(HiveMapredSplitReader.java:180) > at > org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter$HiveReader.nextRecord(HiveBulkFormatAdapter.java:336) > at > org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter$HiveReader.readBatch(HiveBulkFormatAdapter.java:319) > at > org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:67) > at > org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:101) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "Source Data Fetcher for Source: > HiveSource-app.app_sdl_yinliu_search_query_log (165/250)#0" Id=817 > cpuUsage=99.96% deltaTime=201ms time=44024ms RUNNABLE > at org.apache.hadoop.io.Text.append(Text.java:236) > at > org.apache.hadoop.hive.ql.io.orc.DynamicByteArray.setText(DynamicByteArray.java:212) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringDictionaryTreeReader.next(TreeReaderFactory.java:1724) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StringTreeReader.next(TreeReaderFactory.java:1397) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$MapTreeReader.next(TreeReaderFactory.java:2274) > at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1046) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:166) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:140) > at > org.apache.flink.connectors.hive.read.HiveMapredSplitReader.reachedEnd(HiveMapredSplitReader.java:160) > at > org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter$HiveReader.readBatch(HiveBulkFormatAdapter.java:318) > at > org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:67) > at > org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:101) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > "Source Data Fetcher for Source: > HiveSource-app.app_sdl_yinliu_search_query_log (166/250)#0" Id=816 > cpuUsage=99.96% deltaTime=201ms time=43490ms RUNNABLE > at > org.apache.flink.table.data.util.DataFormatConverters$MapConverter.toBinaryMap(DataFormatConverters.java:1279) > at > org.apache.flink.table.data.util.DataFormatConverters$MapConverter.toInternalImpl(DataFormatConverters.java:1245) > at > org.apache.flink.table.data.util.DataFormatConverters$MapConverter.toInternalImpl(DataFormatConverters.java:1196) > at > org.apache.flink.table.data.util.DataFormatConverters$DataFormatConverter.toInternal(DataFormatConverters.java:406) > at > org.apache.flink.connectors.hive.read.HiveMapredSplitReader.nextRecord(HiveMapredSplitReader.java:185) > at > org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter$HiveReader.nextRecord(HiveBulkFormatAdapter.java:336) > at > org.apache.flink.connectors.hive.read.HiveBulkFormatAdapter$HiveReader.readBatch(HiveBulkFormatAdapter.java:319) > at > org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:67) > at > org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:56) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:138) > at > org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:101) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > 1、I set 10 core and 12 GB one TaskManage, > 2、it looks like only 3 cores is full,but loadaverage goes up to 100,and > adding cores doesnot work > 3、I run 'netstat -anp | wc -l', result is 18 > 4、 !image-2021-04-01-16-08-28-442.png! > > > > > my hive storage config: > # Storage Information# Storage InformationSerDe Library: > org.apache.hadoop.hive.ql.io.orc.OrcSerdeInputFormat: > org.apache.hadoop.hive.ql.io.orc.OrcInputFormatOutputFormat: > org.apache.hadoop.hive.ql.io.orc.OrcOutputFormatCompressed: NoNum > Buckets: -1Bucket Columns: []Sort Columns: [] > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)