Hi,

Do you have an exception for the the CoGroup failure?

Best, Fabian

2017-07-26 3:32 GMT+02:00 Charith Wickramarachchi <
charith.dhanus...@gmail.com>:

> I did some more digging. It seems the CoGroup operation failed in one of
> the workers. But I do not face this issue when running other tasks.
>
> Thanks,
> Charith
>
> On Tue, Jul 25, 2017 at 2:06 PM, Charith Wickramarachchi <
> charith.dhanus...@gmail.com> wrote:
>
>> Hi All,
>>
>> I m getting an exception when running a Gelly task using Pregel model. It
>> complains that the remote task manager might be lost. But task managers
>> seem to be active based on the flink dashboard.  Also, other tasks run fine
>> without an issue.
>>
>>
>> Following is the summary of exception trace.  I have attached the full
>> trace as well. It will be great if you can provide any directions to
>> identify the issue.
>>
>>
>> Flink version: flink-1.1.3
>> Java: 1.7
>>
>> Caused by: java.io.IOException: Thread 'SortMerger Reading Thread'
>> terminated due to an exception: Connecting the channel failed: Connecting
>> to remote task manager + 'worker/127.0.1.1:44310' has failed. This might
>> indicate that the remote task manager has been lost.
>> at org.apache.flink.runtime.operators.sort.UnilateralSortMerger
>> $ThreadBase.run(UnilateralSortMerger.java:800)
>> Caused by: java.io.IOException: Connecting the channel failed: Connecting
>> to remote task manager + 'worker/127.0.1.1:44310' has failed. This might
>> indicate that the remote task manager has been lost.
>> at org.apache.flink.runtime.io.network.netty.PartitionRequestCl
>> ientFactory$ConnectingChannel.waitForChannel(PartitionReques
>> tClientFactory.java:196)
>> at org.apache.flink.runtime.io.network.netty.PartitionRequestCl
>> ientFactory$ConnectingChannel.access$000(PartitionRequestCli
>> entFactory.java:131)
>> at org.apache.flink.runtime.io.network.netty.PartitionRequestCl
>> ientFactory.createPartitionRequestClient(PartitionRequestCli
>> entFactory.java:83)
>> at org.apache.flink.runtime.io.network.netty.NettyConnectionMan
>> ager.createPartitionRequestClient(NettyConnectionManager.java:60)
>> at org.apache.flink.runtime.io.network.partition.consumer.Remot
>> eInputChannel.requestSubpartition(RemoteInputChannel.java:118)
>> at org.apache.flink.runtime.io.network.partition.consumer.Singl
>> eInputGate.requestPartitions(SingleInputGate.java:394)
>> at org.apache.flink.runtime.io.network.partition.consumer.Singl
>> eInputGate.getNextBufferOrEvent(SingleInputGate.java:413)
>> at org.apache.flink.runtime.io.network.api.reader.AbstractRecor
>> dReader.getNextRecord(AbstractRecordReader.java:87)
>> at org.apache.flink.runtime.io.network.api.reader.MutableRecord
>> Reader.next(MutableRecordReader.java:42)
>> at org.apache.flink.runtime.operators.util.ReaderIterator.next(
>> ReaderIterator.java:59)
>> at org.apache.flink.runtime.operators.sort.UnilateralSortMerger
>> $ReadingThread.go(UnilateralSortMerger.java:973)
>> at org.apache.flink.runtime.operators.sort.UnilateralSortMerger
>> $ThreadBase.run(UnilateralSortMerger.java:796)
>> Caused by: 
>> org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
>> Connecting to remote task manager + 'worker/127.0.1.1:44310' has failed.
>> This might indicate that the remote task manager has been lost.
>> at org.apache.flink.runtime.io.network.netty.PartitionRequestCl
>> ientFactory$ConnectingChannel.operationComplete(PartitionReq
>> uestClientFactory.java:219)
>> at org.apache.flink.runtime.io.network.netty.PartitionRequestCl
>> ientFactory$ConnectingChannel.operationComplete(PartitionReq
>> uestClientFactory.java:131)
>>
>> Thanks,
>> Charith
>>
>>
>> --
>> Charith Dhanushka Wickramaarachchi
>>
>> Tel  +1 213 447 4253
>> Blog  http://charith.wickramaarachchi.org/
>> <http://charithwiki.blogspot.com/>
>> Twitter  @charithwiki <https://twitter.com/charithwiki>
>>
>> This communication may contain privileged or other confidential information
>> and is intended exclusively for the addressee/s. If you are not the
>> intended recipient/s, or believe that you may have
>> received this communication in error, please reply to the sender indicating
>> that fact and delete the copy you received and in addition, you should
>> not print, copy, retransmit, disseminate, or otherwise use the
>> information contained in this communication. Internet communications
>> cannot be guaranteed to be timely, secure, error or virus-free. The
>> sender does not accept liability for any errors or omissions
>>
>
>
>
> --
> Charith Dhanushka Wickramaarachchi
>
> Tel  +1 213 447 4253
> Blog  http://charith.wickramaarachchi.org/
> <http://charithwiki.blogspot.com/>
> Twitter  @charithwiki <https://twitter.com/charithwiki>
>
> This communication may contain privileged or other confidential information
> and is intended exclusively for the addressee/s. If you are not the
> intended recipient/s, or believe that you may have
> received this communication in error, please reply to the sender indicating
> that fact and delete the copy you received and in addition, you should
> not print, copy, retransmit, disseminate, or otherwise use the
> information contained in this communication. Internet communications
> cannot be guaranteed to be timely, secure, error or virus-free. The
> sender does not accept liability for any errors or omissions
>

Reply via email to