[jira] [Comment Edited] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport

Heng Chen (JIRA) Wed, 09 Dec 2015 02:17:28 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048348#comment-15048348
 ]


Heng Chen edited comment on HBASE-14915 at 12/9/15 10:17 AM:
-------------------------------------------------------------

I found something interesting.
Each failed test in TestImportExport always has a lot of logs like below,  this 
happens when we do import hdfs files into HTable.
{code}
2015-12-05 22:09:23,634 INFO  
[asf907.gq1.ygridcore.net,60842,1449352499962_ChoreService_1] 
regionserver.HRegionServer$PeriodicMemstoreFlusher(1585): 
asf907.gq1.ygridcore.net,60842,1449352499962-MemstoreFlusherChore requesting 
flush of hbase:meta,,1.1588230740 because info has an old edit so flush to free 
WALs after random delay 5699ms
2015-12-05 22:09:24,729 INFO  
[asf907.gq1.ygridcore.net,60842,1449352499962_ChoreService_1] 
regionserver.HRegionServer$PeriodicMemstoreFlusher(1585): 
asf907.gq1.ygridcore.net,60842,1449352499962-MemstoreFlusherChore requesting 
flush of hbase:meta,,1.1588230740 because info has an old edit so flush to free 
WALs after random delay 81344ms
{code}

Relates code in HRegion#shouldFlush
{code:title=HRegion.java}
  boolean shouldFlush(final StringBuffer whyFlush) {
    ....
    for (Store s : getStores()) {
      if (s.timeOfOldestEdit() < now - modifiedFlushCheckInterval) {
        // we have an old enough edit in the memstore, flush
        whyFlush.append(s.toString() + " has an old edit so flush to free 
WALs");
        return true;
      }
    }
    return false;
  }
{code}
As log shows, it seems that PeriodicMemstoreFlusher  send a lot of same 
FlushRequest to MemStoreFlusher,  and the request will store into one queue in 
MemStoreFlusher.
So i guess memory exceeds the limit of container, so we can see log 
{code}
2015-12-05 22:08:56,440 WARN  [ContainersLauncher #5] 
nodemanager.DefaultContainerExecutor(207): Exit code from container 
container_1449352527830_0013_01_000002 is : 143
{code} 

Maybe we should avoid send same flush request or send it periodical.
I still want to figure out the deeper reason why 'there is an old edit in 
memstore',  it seems that our flush failed. 

I also notice some logs after import start. It seems something error with the 
last block in hdfs. But i not sure about it.
{code}
2015-12-02 21:22:15,974 INFO  [IPC Server handler 1 on 44218] 
blockmanagement.BlockManager(2383): BLOCK* addStoredBlock: blockMap updated: 
127.0.0.1:59938 is added to 
blk_1073742196_1372{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-083a02af-1593-4938-9651-f812ac2cb91a:NORMAL|RBW]]}
 size 0
2015-12-02 21:22:16,032 INFO  [IPC Server handler 7 on 44218] 
blockmanagement.BlockManager(2383): BLOCK* addStoredBlock: blockMap updated: 
127.0.0.1:59938 is added to 
blk_1073742197_1373{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[[DISK]DS-b35bb0ca-6546-4f47-a8cf-239960b3356a:NORMAL|RBW]]}
 size 0
{code}









was (Author: chenheng):
I found something interesting.
Each failed test in TestImportExport always has a lot of logs like below,  this 
happens when we do import hdfs files into HTable.
{code}
2015-12-05 22:09:23,634 INFO  
[asf907.gq1.ygridcore.net,60842,1449352499962_ChoreService_1] 
regionserver.HRegionServer$PeriodicMemstoreFlusher(1585): 
asf907.gq1.ygridcore.net,60842,1449352499962-MemstoreFlusherChore requesting 
flush of hbase:meta,,1.1588230740 because info has an old edit so flush to free 
WALs after random delay 5699ms
2015-12-05 22:09:24,729 INFO  
[asf907.gq1.ygridcore.net,60842,1449352499962_ChoreService_1] 
regionserver.HRegionServer$PeriodicMemstoreFlusher(1585): 
asf907.gq1.ygridcore.net,60842,1449352499962-MemstoreFlusherChore requesting 
flush of hbase:meta,,1.1588230740 because info has an old edit so flush to free 
WALs after random delay 81344ms
{code}

Relates code in HRegion#shouldFlush
{code:title=HRegion.java}
  boolean shouldFlush(final StringBuffer whyFlush) {
    ....
    for (Store s : getStores()) {
      if (s.timeOfOldestEdit() < now - modifiedFlushCheckInterval) {
        // we have an old enough edit in the memstore, flush
        whyFlush.append(s.toString() + " has an old edit so flush to free 
WALs");
        return true;
      }
    }
    return false;
  }
{code}
As log shows, it seems that PeriodicMemstoreFlusher  send a lot of same 
FlushRequest to MemStoreFlusher,  and the request will store into one queue in 
MemStoreFlusher.
So i guess memory exceeds the limit of container, so we can see log 
{code}
2015-12-05 22:08:56,440 WARN  [ContainersLauncher #5] 
nodemanager.DefaultContainerExecutor(207): Exit code from container 
container_1449352527830_0013_01_000002 is : 143
{code} 

Maybe we should avoid send same flush request or send it periodical,  thoughts? 
[~stack]


> Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport
> -----------------------------------------------------------------
>
>                 Key: HBASE-14915
>                 URL: https://issues.apache.org/jira/browse/HBASE-14915
>             Project: HBase
>          Issue Type: Sub-task
>          Components: hangingTests
>            Reporter: stack
>         Attachments: HBASE-14915-branch-1.2.patch
>
>
> This test hangs a bunch:
> Here is latest:
> https://builds.apache.org/job/HBase-1.2/418/jdk=latest1.7,label=Hadoop/consoleText



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HBASE-14915) Hanging test : org.apache.hadoop.hbase.mapreduce.TestImportExport

Reply via email to