Hi, sorry to reply late. that is just a part of my hbase-site.xml. below is the full content: ************************************hbase/conf/hbase.site.xml************************** <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://broker.xxx-xxx.local:9000/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>broker.xxx-xxx.local</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>zookeeper.znode.parent</name> <value>/hbase</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>broker.xxx-xxx.local</value> </property> <property> <name>hbase.regionserver.wal.codec</name> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property> <property> <name>phoenix.transactions.enabled</name> <value>true</value> </property> <property> <name>data.tx.snapshot.dir</name> <value>/tmp/tephra/snapshots</value> </property> <property> <name>data.tx.timeout</name> <value>120</value> </property> <property> <name>phoenix.query.timeoutMs</name> <value>2800000</value> </property> <property> <name>hbase.regionserver.lease.period</name> <value>2200000</value> </property> <property> <name>hbase.rpc.timeout</name> <value>2200000</value> </property> <property> <name>hbase.client.scanner.caching</name> <value>2000</value> </property> <property> <name>hbase.client.scanner.timeout.period</name> <value>2200000</value> </property> </configuration>
*************************client: phoenix/bin/hbase-site.xml************************* <configuration> <property> <name>hbase.regionserver.wal.codec</name> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property> <property> <name>phoenix.transactions.enabled</name> <value>true</value> </property> <property> <name>data.tx.snapshot.dir</name> <value>/tmp/tephra/snapshots</value> </property> <property> <name>data.tx.timeout</name> <value>120</value> </property> <property> <name>phoenix.query.timeoutMs</name> <value>2800000</value> </property> <property> <name>hbase.regionserver.lease.period</name> <value>2200000</value> </property> <property> <name>hbase.rpc.timeout</name> <value>2200000</value> </property> <property> <name>hbase.client.scanner.caching</name> <value>2000</value> </property> <property> <name>hbase.client.scanner.timeout.period</name> <value>2200000</value> </property> </configuration> On 2018/04/09 18:01:14, Josh Elser <els...@apache.org> wrote: > The hbase-site.xml elements you shared earlier, were those your entire > hbase-site contents or just part of it? > > Make sure you have the required properties set as described on > https://phoenix.apache.org/secondary_indexing.html for your indexes. If > you're still seeing problems, you may need to increase the number of > handlers you configured HBase to use. > > While in the stuck state, you may benefit from getting a thread-dump or > two from the client and your regionserver(s). This would help in > figuring out exactly where things are stuck (like the DEBUG logs would do). > > On 4/9/18 1:30 PM, ivany...@gmail.com wrote: > > thanks for your suggestion. I found something interesting, not sure if > > that is some potential reason. That is my indexes created on my tables. > > I created a lot of indexes. After I removed all of the indexes, it seems > > things went better(no more hanging like that). So I am suspecting there is > > some incompatible or other issues in the way I set up mu cluster. > > > > Something special i used to create table: > > )c.DATA_BLOCK_ENCODING='FAST_DIFF', SALT_BUCKETS=3, > > COMPRESSION='GZ',TRANSACTIONAL=true ; > > and some indexes I created like this: > > CREATE INDEX testing_IDX_2 ON xxx.xxx (field1, field2) INCLUDE (field3, > > field4) > > > > > > > > On 2018/04/09 17:04:03, Josh Elser <els...@apache.org> wrote: > >> Have you looked at DEBUG logging client and server(HBase) side? > >> > >> The "Call exception" log messages imply that the client is repeatedly > >> trying to issue an RPC to a RegionServer and failing. This should be > >> where you focus your attention. It may be something trivial to fix > >> related to configuration/security setup. > >> > >> On 4/8/18 2:04 AM, ivany...@gmail.com wrote: > >>> Hi, I got below tricky problem: > >>> Situation: > >>> I successfully did a upsert into multiple tables with transaction > >>> enabled(and there are many index created on these table). > >>> Problem: > >>> after the fist time upsert done successfully, I tried to do the 2nd, > >>> 3rd.... and next same upsert, sometime, the 2nd works, then the 3rd > >>> upsert will get timeout exception, at this time, the whole phoenix seems > >>> hangs there and keep retrying. I tried to stop the whole hbase cluster > >>> including phoenix queryserver and tepera and restart, then when I try to > >>> connect with sqlline.py, it got hang again. > >>> > >>> hbase-site.xml setting: > >>> <property> > >>> <name>hbase.regionserver.wal.codec</name> > >>> > >>> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> > >>> </property> > >>> <property> > >>> <name>phoenix.transactions.enabled</name> > >>> <value>true</value> > >>> </property> > >>> <property> > >>> <name>data.tx.snapshot.dir</name> > >>> <value>/tmp/tephra/snapshots</value> > >>> </property> > >>> <property> > >>> <name>data.tx.timeout</name> > >>> <value>120</value> > >>> </property> > >>> <property> > >>> <name>phoenix.query.timeoutMs</name> > >>> <value>1800000</value> > >>> </property> > >>> <property> > >>> <name>hbase.regionserver.lease.period</name> > >>> <value>1200000</value> > >>> </property> > >>> <property> > >>> <name>hbase.rpc.timeout</name> > >>> <value>1200000</value> > >>> </property> > >>> <property> > >>> <name>hbase.client.scanner.caching</name> > >>> <value>1000</value> > >>> </property> > >>> <property> > >>> <name>hbase.client.scanner.timeout.period</name> > >>> <value>1200000</value> > >>> </property> > >>> > >>> > >>> > >>> Below is some queryserver log: > >>> 18/04/08 05:47:12 INFO zookeeper.ZooKeeper: Initiating client connection, > >>> connectString=xxxx.xxxx.local:2181 sessionTimeout=90000 > >>> watcher=org.apache.tephra.zookeeper.TephraZKClientService$5@6700104f > >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Opening socket connection to > >>> server xxxx.xxxx.local/127.0.0.1:2181. Will not attempt to authenticate > >>> using SASL (unknown error) > >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Socket connection > >>> established to xxxx.xxxx.local/127.0.0.1:2181, initiating session > >>> 18/04/08 05:47:12 INFO zookeeper.ClientCnxn: Session establishment > >>> complete on server xxxx.xxxx.local/127.0.0.1:2181, sessionid = > >>> 0x162a3c72c9c0012, negotiated timeout = 90000 > >>> 18/04/08 05:57:39 INFO client.RpcRetryingCaller: Call exception, > >>> tries=10, retries=35, started=38310 ms ago, cancelled=false, msg=row > >>> 'SYSTEM.CATALOG,xxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' at > >>> region=hbase:meta,,1.1588230740, > >>> hostname=xxxx.xxxx.local,16201,1523166165622, seqNum=0 > >>> 18/04/08 05:57:49 INFO client.RpcRetryingCaller: Call exception, > >>> tries=11, retries=35, started=48335 ms ago, cancelled=false, msg=row > >>> 'SYSTEM.CATALOG,xxxxxLOAD_*N**_DIM,99999999999999' on table 'hbase:meta' > >>> at region=hbase:meta,,1.1588230740, > >>> hostname=xxx.xxx.local,16201,1523166165622, seqNum=0 > >>> > >> >