Re: Write path blocked by MetaDataEndpoint acquiring region lock

Andrew Purtell Wed, 17 Feb 2016 10:40:02 -0800

Is 1000 a good default?


On Wed, Feb 17, 2016 at 9:42 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:

> Thanks for the context Arun.
>
> For what it's worth, I greatly increased the batch size (from default
> 1,000 to 500,000), which i believe reduced contention on the lock and
> allowed ingest to catch up.
>
> On Tue, Feb 16, 2016 at 9:14 PM, Thangamani, Arun <arun.thangam...@cdk.com
> > wrote:
>
>> Sorry I had pressed Control + Enter a little earlier than I wanted to,
>> corrections inline. Thanks
>>
>> From: "Thangamani, Arun" <arun.thangam...@cdk.com>
>> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> Date: Tuesday, February 16, 2016 at 8:38 PM
>> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> Cc: James Taylor <jamestay...@apache.org>, Lars Hofhansl <
>> la...@apache.org>
>> Subject: Re: Write path blocked by MetaDataEndpoint acquiring region lock
>>
>> Hey Nick,
>>
>> Looks like you are failing to find your table in meta data cache, if you
>> don’t find it in the meta data cache, we end up rebuilding the metadata
>> from both the SYSTEM.CATALOG and SYSTEM.STATS tables.
>> The rebuilding process for the meta data is a scan on both the tables.
>>
>> So, we will end up going to zookeeper to find the region and execute the
>> scan on the region, it is an expensive operation, that explains the calls
>> to a specific region server and zookeeper
>>
>> Similar thing happens in PHOENIX-2607, but that is specifiically related
>> to timestamps, if the client timestamp is less than or equal to the stats
>> timestamp of the table, we will end up doing the above rebuilding process
>> repeatedly for every batch of insert from PhoenixMapReduceUtil (Batch size
>> default is 1000 rows). I don’t believe you have a timestamp issue, but
>> looks like you have the same repeated lookups and related scans for
>> rebuilding that happen in PHOENIX-2607
>>
>> James has suggested a workaround for the meta data cache refresh using 
>> UPDATE_CACHE_FREQUENCY
>> variable while defining the table, and it will probably help  (we are
>> trying it out for timestamp issue)
>>
>> Hope this helps.
>>
>> Thanks
>> Arun
>>
>> Please look at the following lines in MetadataCacheImpl
>> private PTable doGetTable(byte[] key, long clientTimeStamp, RowLock
>> rowLock) throws IOException, SQLException {
>>
>>     ImmutableBytesPtr cacheKey = new ImmutableBytesPtr(key);
>>     Cache<ImmutableBytesPtr, PMetaDataEntity> metaDataCache =
>>             GlobalCache.getInstance(this.env).getMetaDataCache();
>>     PTable table = (PTable)metaDataCache.getIfPresent(cacheKey);
>>     // We only cache the latest, so we'll end up building the table with 
>> every call if the
>>     // client connection has specified an SCN.
>>     // TODO: If we indicate to the client that we're returning an older 
>> version, but there's a
>>     // newer version available, the client
>>     // can safely not call this, since we only allow modifications to the 
>> latest.
>>     if (table != null && table.getTimeStamp() < clientTimeStamp) {
>>         // Table on client is up-to-date with table on server, so just return
>>         if (isTableDeleted(table)) {
>>             return null;
>>         }
>>         return table;
>>     }
>>     // Ask Lars about the expense of this call - if we don't take the lock, 
>> we still won't get
>>     // partial results
>>     // get the co-processor environment
>>     // TODO: check that key is within region.getStartKey() and 
>> region.getEndKey()
>>     // and return special code to force client to lookup region from meta.
>>     Region region = env.getRegion();
>>     /*
>>      * Lock directly on key, though it may be an index table. This will just 
>> prevent a table
>>      * from getting rebuilt too often.
>>      */
>>     final boolean wasLocked = (rowLock != null);
>>     if (!wasLocked) {
>>         rowLock = region.getRowLock(key, true);
>>         if (rowLock == null) {
>>             throw new IOException("Failed to acquire lock on " + 
>> Bytes.toStringBinary(key));
>>         }
>>     }
>>     try {
>>         // Try cache again in case we were waiting on a lock
>>         table = (PTable)metaDataCache.getIfPresent(cacheKey);
>>         // We only cache the latest, so we'll end up building the table with 
>> every call if the
>>         // client connection has specified an SCN.
>>         // TODO: If we indicate to the client that we're returning an older 
>> version, but there's
>>         // a newer version available, the client
>>         // can safely not call this, since we only allow modifications to 
>> the latest.
>>         if (table != null && table.getTimeStamp() < clientTimeStamp) {
>>             // Table on client is up-to-date with table on server, so just 
>> return
>>             if (isTableDeleted(table)) {
>>                 return null;
>>             }
>>             return table;
>>         }
>>         // Query for the latest table first, since it's not cached
>>         table = buildTable(key, cacheKey, region, 
>> HConstants.LATEST_TIMESTAMP);
>>         if (table != null && table.getTimeStamp() < clientTimeStamp) {
>>             return table;
>>         }
>>         // Otherwise, query for an older version of the table - it won't be 
>> cached
>>         return buildTable(key, cacheKey, region, clientTimeStamp);
>>     } finally {
>>         if (!wasLocked) rowLock.release();
>>     }
>> }
>>
>>
>>
>>
>>
>>
>> From: Nick Dimiduk <ndimi...@gmail.com>
>> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> Date: Tuesday, February 16, 2016 at 5:58 PM
>> To: "user@phoenix.apache.org" <user@phoenix.apache.org>
>> Cc: James Taylor <jamestay...@apache.org>, Lars Hofhansl <
>> la...@apache.org>
>> Subject: Write path blocked by MetaDataEndpoint acquiring region lock
>>
>> Hello,
>>
>> I have a high throughput ingest pipeline that's seised up. My ingest
>> application ultimately crashes, contains the following stack trace [0].
>>
>> Independently, I noticed that the RPC call time of one of the machines
>> was significantly higher than others (95pct at multiple seconds vs 10's of
>> ms). I grabbed the RS log and a couple jstacks from the process. In the
>> logs I see handler threads creating ZK connections excessively (~50 INFO
>> lines per second). The jstacks show handler threads parked while taking
>> region row locks, calling HRegion.getRowLockInternal() via
>> MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's
>> in the same MetaDataEndpointImpl area but not under lock appears to be
>> making an RPC to read the statistics table [2].
>>
>> I believe these two occurrences are related.
>>
>> My working theory is that the metaDataCache object is performing poorly
>> for some reason. This results in excessive meta data lookups, some of which
>> appear to require making an RPC call while under lock.
>>
>> What can I do to relive pressure on this rowlock? Looking at the code
>> around the lock, this looks like it's populating to a connection-level
>> cache of schema metadata. The host with the high RPC call time is hosting
>> the SYSTEM.CATALOG table region. I see some configuration settings that may
>> be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way
>> to get debug information about cache size or evictions from this Guava
>> cache instance.
>>
>> I'll be digging into this further, but I appreciate any pointers you may
>> have.
>>
>> BTW, this is Phoenix 4.6.0 + HBase 1.1.2.
>>
>> Thanks a lot,
>> -n
>>
>> [0]: client-side stack
>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted
>> calling coprocessor service
>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for
>> row \x00<schema>\x00<user_table>
>>         at
>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
>>         at
>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053)
>>         at
>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016)
>>         at
>> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289)
>>         at
>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446)
>>         at
>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389)
>>         at
>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385)
>>         at
>> org.apache.phoenix.execute.MutationState.validate(MutationState.java:369)
>>         at
>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:417)
>>         at
>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482)
>>         at
>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479)
>>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>>         at
>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479)
>>         at
>> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84)
>>
>> [1] rs handlers blocked stack
>> "B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5
>> os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition
>> [0x00007ff47a607000]
>>    java.lang.Thread.State: TIMED_WAITING (parking)
>> at sun.misc.Unsafe.park(Native Method)
>> - parking to wait for  <0x0000000708ef59c0> (a
>> java.util.concurrent.CountDownLatch$Sync)
>> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>> at
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
>> at
>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> [2]: rs handler stats rpc stack
>> "B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5
>> os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait()
>> [0x00007ff47c323000]
>>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>> at java.lang.Object.wait(Native Method)
>> at java.lang.Object.wait(Object.java:460)
>> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
>> at
>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
>> - locked <0x00000007a3f5e030> (a
>> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
>> at
>> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
>> at
>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
>> at
>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
>> at
>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> ------------------------------
>> This message and any attachments are intended only for the use of the
>> addressee and may contain information that is privileged and confidential.
>> If the reader of the message is not the intended recipient or an authorized
>> representative of the intended recipient, you are hereby notified that any
>> dissemination of this communication is strictly prohibited. If you have
>> received this communication in error, notify the sender immediately by
>> return email and delete the message and any attachments from your system.
>>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Write path blocked by MetaDataEndpoint acquiring region lock

Reply via email to