If it's a Hector thing you may have better luck on the Hector user group. 

http://groups.google.com/group/hector-users

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/03/2012, at 8:33 AM, Daning Wang wrote:

> Thanks Maciej. we have default value for retryDownedHostsDelayInSeconds. I 
> think it is not about how long it checks the downed host, I suspect the 
> HostRetryService is down. Below is the very first exception, what does this 
> message mean  - " HConnectionManager returned a null client after aquisition 
> - are we shutting down?"
> 
> 
> 
> 2012-03-08 16:37:15,103 [pool-2-thread-34288]   Cassandra client acquisition 
> interrupted
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
>  Source)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
>  Source)
>         at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
>         at 
> me.prettyprint.cassandra.connection.ConcurrentHClientPool.waitForConnection(ConcurrentHClientPool.java:117)
>         at 
> me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClient(ConcurrentHClientPool.java:77)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:226)
>         at 
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
>         at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:93)
>         at 
> com.netseer.cassandra.cache.dao.CacheReader.getRows(CacheReader.java:267)
>         at 
> com.netseer.cassandra.cache.dao.CacheReader.getCache0(CacheReader.java:55)
>         at 
> com.netseer.cassandra.cache.dao.CacheDao.getCaches(CacheDao.java:85)
>         at com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:71)
>         at 
> com.netseer.cassandra.cache.dao.CacheDao.getCache(CacheDao.java:149)
>         at 
> com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:55)
>         at 
> com.netseer.cassandra.cache.service.CacheServiceImpl.getCache(CacheServiceImpl.java:28)
>         at 
> com.netseer.dsat.cache.CassandraDSATCacheImpl.get(CassandraDSATCacheImpl.java:62)
>         at 
> com.netseer.dsat.cache.CassandraDSATCacheImpl.getTimedValue(CassandraDSATCacheImpl.java:144)
>         at 
> com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:427)
>         at 
> com.netseer.dsat.serving.GenericCacheManager$4.call(GenericCacheManager.java:1)
>         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
> Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> 2012-03-08 16:37:15,104 [pool-2-thread-34288]   Failed getting remote cache 
> for key=Key String = 'http://www.my-banners.com', long key = 
> 5630311119483252185, keyType = 'PATH'
> me.prettyprint.hector.api.exceptions.HectorException: HConnectionManager 
> returned a null client after aquisition - are we shutting down?
>         at 
> me.prettyprint.cassandra.connection.ConcurrentHClientPool.borrowClient(ConcurrentHClientPool.java:83)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:226)
>         at 
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
>         at me.prettyprint.cassandra.model.CqlQuery.execute(CqlQuery.java:93)
> 
> 
> 
> On Mon, Mar 5, 2012 at 10:56 PM, Maciej Miklas <mac.mik...@googlemail.com> 
> wrote:
> Have you tried to change: 
> me.prettyprint.cassandra.service.CassandraHostConfigurator#retryDownedHostsDelayInSeconds
>  ?
> 
> Hector will ping down hosts every xx seconds and recover connection. 
> 
> Regards,
> Maciej
> 
> 
> On Mon, Mar 5, 2012 at 8:13 PM, Daning Wang <dan...@netseer.com> wrote:
> I just got this error ": All host pools marked down. Retry burden pushed out 
> to client." in a few clients recently, client could not  recover, we have to 
> restart client application.  we are using 0.8.0.3 hector.
> 
> At that time we did compaction  for a CF, it takes several hours, server was 
> busy. But I think client should recover after server load was down. 
> 
> Any bug reported about this? I did search but could not find one.
> 
> Thanks,
> 
> Daning 
> 
> 
> 

Reply via email to