[ 
https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497509#comment-16497509
 ] 

Xu Cang commented on HBASE-18116:
---------------------------------

The TestGlobalThrottlerTest itself is buggy.

I could make it fail by changing quota to 121 such as

*conf1.setInt(HConstants.REPLICATION_SOURCE_TOTAL_BUFFER_KEY, 121);*

and change check  to 363 (3 times, because there are 3 peers) such as,

*if (size > 363) {*

 

Then the test failed. I put some debugging log here:

We can see it exceeds the limit I set which is 363.

 

 

2018-05-31 20:28:53,023 DEBUG 
[RpcServer.replication.FPBQ.Fifo.handler=1,queue=0,port=46855] 
regionserver.ReplicationSink(239): Started replicating mutations.
2018-05-31 20:28:53,027 DEBUG 
[RpcServer.replication.FPBQ.Fifo.handler=1,queue=0,port=46855] 
regionserver.ReplicationSink(243): Finished replicating mutations.
2018-05-31 20:28:53,038 INFO 
[RS_REFRESH_PEER-regionserver/xcang-wsl:0-0.replicationSource,peer1.replicationSource.wal-reader.xcang-wsl%2C39693%2C1527823677629,peer1]
 regionserver.ReplicationSourceWALReader(387): ~~~~~~~~~!!! acquireBufferQuota 
size is 120
2018-05-31 20:28:53,038 INFO 
[RS_REFRESH_PEER-regionserver/xcang-wsl:0-1.replicationSource,peer2.replicationSource.wal-reader.xcang-wsl%2C39693%2C1527823677629,peer2]
 regionserver.ReplicationSourceWALReader(387): ~~~~~~~~~!!! acquireBufferQuota 
size is 120
2018-05-31 20:28:53,038 INFO 
[RS_REFRESH_PEER-regionserver/xcang-wsl:0-0.replicationSource,peer3.replicationSource.wal-reader.xcang-wsl%2C39693%2C1527823677629,peer3]
 regionserver.ReplicationSourceWALReader(387): ~~~~~~~~~!!! acquireBufferQuota 
size is 120
2018-05-31 20:28:53,068 INFO [Thread-437] 
regionserver.TestGlobalThrottler(143): @@@@size :*480*
2018-05-31 20:28:53,068 INFO [Thread-437] 
regionserver.TestGlobalThrottler(148): 
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@size :480
2018-05-31 20:28:53,118 INFO [Thread-437] 
regionserver.TestGlobalThrottler(143): @@@@size :*480*
2018-05-31 20:28:53,119 INFO [Thread-437] 
regionserver.TestGlobalThrottler(148): 
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@size :480
2018-05-31 20:28:53,131 DEBUG 
[RpcServer.replication.FPBQ.Fifo.handler=1,queue=0,port=46855] 
regionserver.ReplicationSink(239): Started replicating mutations.
2018-05-31 20:28:53,134 DEBUG 
[RpcServer.replication.FPBQ.Fifo.handler=1,queue=0,port=46855] 
regionserver.ReplicationSink(243): Finished replicating mutations.

 

 

 

I will send a fix patch soon. And I have another fix in 
ReplicationSourceShipper.java which fixes batchSize calculation after 
operations are done and deducting correct size from totalUsedBuffer

 

> Replication source in-memory accounting should not include bulk transfer 
> hfiles
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-18116
>                 URL: https://issues.apache.org/jira/browse/HBASE-18116
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Andrew Purtell
>            Assignee: Xu Cang
>            Priority: Major
>             Fix For: 3.0.0, 2.1.0, 1.5.0
>
>         Attachments: HBASE-18116-branch-1.patch, 
> HBASE-18116.master.001.patch, HBASE-18116.master.002.patch, 
> HBASE-18116.master.003.patch
>
>
> In ReplicationSourceWALReaderThread we maintain a global quota on enqueued 
> replication work for preventing OOM by queuing up too many edits into queues 
> on heap. When calculating the size of a given replication queue entry, if it 
> has associated hfiles (is a bulk load to be replicated as a batch of hfiles), 
> we get the file sizes and include the sum. We then apply that result to the 
> quota. This isn't quite right. Those hfiles will be pulled by the sink as a 
> file copy, not pushed by the source. The cells in those files are not queued 
> in memory at the source and therefore shouldn't be counted against the quota.
> Related, the sum of the hfile sizes are also included when checking if queued 
> work exceeds the configured replication queue capacity, which is by default 
> 64 MB. HFiles are commonly much larger than this. 
> So what happens is when we encounter a bulk load replication entry typically 
> both the quota and capacity limits are exceeded, we break out of loops, and 
> send right away. What is transferred on the wire via HBase RPC though has 
> only a partial relationship to the calculation. 
> Depending how you look at it, it makes sense to factor hfile file sizes 
> against replication queue capacity limits. The sink will be occupied 
> transferring those files at the HDFS level. Anyway, this is how we have been 
> doing it and it is too late to change now. I do not however think it is 
> correct to apply hfile file sizes against a quota for in memory state on the 
> source. The source doesn't queue or even transfer those bytes. 
> Something I noticed while working on HBASE-18027.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to