Hey Mayuresh!
Can you try setting the Scan cache value to a good number (like a  few
100 or more)?

Cheers,
Himanshu


On Mon, Sep 26, 2011 at 9:01 AM, Himanshu Vashishtha
<[email protected]> wrote:
>
> Hey Mayuresh!
> Can you try setting the Scan cache value to a good number (like a  few 100 or 
> more)?
>
> Cheers,
> Himanshu
>
> On Mon, Sep 26, 2011 at 8:41 AM, Mayuresh <[email protected]> 
> wrote:
>>
>> Guys any guidance? Am not getting any clue onto how to solve this problem.
>>
>> On Mon, Sep 26, 2011 at 3:52 PM, Mayuresh <[email protected]> 
>> wrote:
>> > I increased the leases in hbase-site.xml to around 50 minutes:
>> >
>> >  <property>
>> >    <name>hbase.regionserver.lease.period</name>
>> >    <!--value>60000</value-->
>> >    <value>3000000</value>
>> >    <description>HRegion server lease period in milliseconds. Default is
>> >    60 seconds. Clients must report in within this period else they are
>> >    considered dead.</description>
>> >  </property>
>> >
>> > However I still fail with the same error:
>> >
>> > 2011-09-26 15:50:28,857 WARN org.apache.hadoop.ipc.HBaseServer:
>> > (responseTooSlow):
>> > {"processingtimems":118696,"call":"execCoprocessor([B@5b0e6f59,
>> > getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
>> > {\"timeRange\":[0,9223372036854775807],\"batch\":-1,\"startRow\":\"\",\"stopRow\":\"\",\"totalColumns\":0,\"cacheBlocks\":true,\"families\":{\"data\":[]},\"maxVersions\":1,\"caching\":-1}))","client":"137.72.240.180:54843","starttimems":1317032310155,"queuetimems":9,"class":"HRegionServer","responsesize":0,"method":"execCoprocessor"}
>> > 2011-09-26 15:50:28,872 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> > Server Responder, call execCoprocessor([B@5b0e6f59,
>> > getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3,
>> > {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
>> > from 137.72.240.180:54843: output error
>> > 2011-09-26 15:50:28,873 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> > Server handler 4 on 60020 caught:
>> > java.nio.channels.ClosedChannelException
>> >        at 
>> > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>> >        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>> >        at 
>> > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>> >        at 
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>> >        at 
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>> >        at 
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>> >        at 
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>> >
>> > What does this responseTooSlow and "output error" mean? Any clues to
>> > find whats causing this?
>> >
>> > Thanks.
>> >
>> >
>> > On Mon, Sep 26, 2011 at 2:26 PM, Mayuresh <[email protected]> 
>> > wrote:
>> >> Hi Ted,
>> >>
>> >> Yes I am aware that this isn't a good setup. I am working on
>> >> understanding the coprocessors before I can use a bigger setup.
>> >>
>> >> I am talking about the prez at
>> >> https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew
>> >> Purtell, slide 5 :
>> >>
>> >> <quote>
>> >> For long running jobs the client must periodically poll status to keep it
>> >> alive; jobs without interest will be cancelled
>> >> </quote>
>> >>
>> >>
>> >> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <[email protected]> wrote:
>> >>> Looks like my response to user@ got bounced.
>> >>>
>> >>> Himanshu is continuing the work on HBASE-3607
>> >>>
>> >>>>> Currently all the regions are on a single region server.
>> >>> I don't think the above is a good setup.
>> >>>
>> >>>>> It was mentioned to poll the job status periodically to avoid timeouts.
>> >>> Can you tell us more about the presentation ? Normally you can include 
>> >>> the
>> >>> author in the cc list of this email chain.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh 
>> >>> <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Yes the computation takes longer than 60 secs. Will setting lease
>> >>>> timeout in hbase-site.xml help?
>> >>>>
>> >>>> In one of the coprocessor presentations. It was mentioned to poll the
>> >>>> job status periodically to avoid timeouts. Any idea how to do that?
>> >>>>
>> >>>> On Sep 23, 2011 9:59 PM, "Stack" <[email protected]> wrote:
>> >>>> > Is it doing a long GC? Is the aggregation taking longer than the
>> >>>> > client 60 second timeout (You are setting lease below but I don't
>> >>>> > think that will have an effect since it server-side)?
>> >>>> >
>> >>>> > St.Ack
>> >>>> >
>> >>>> >
>> >>>> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh
>> >>>> > <[email protected]> wrote:
>> >>>> >> Hi,
>> >>>> >>
>> >>>> >> I am using the AggregationClient to perform aggregate calculation on 
>> >>>> >> a
>> >>>> >> 10 M row hbase table which is spread across around 27 regions.
>> >>>> >> Currently all the regions are on a single region server.
>> >>>> >>
>> >>>> >> Here is briefly what the code is doing:
>> >>>> >>         Leases leases = new Leases(40 * 60 * 1000, 5 * 1000);
>> >>>> >>         leases.start();
>> >>>> >>         long startTime = System.currentTimeMillis();
>> >>>> >>         System.out.println("Start time: " + startTime);
>> >>>> >>         Configuration conf = new Configuration(true);
>> >>>> >>         AggregationClient aClient = new AggregationClient(conf);
>> >>>> >>
>> >>>> >>         Scan scan = new Scan();
>> >>>> >>         scan.addColumn(TEST_FAMILY, TEST_QUALIFIER);
>> >>>> >>         final ColumnInterpreter<Long, Long> ci = new
>> >>>> >> LongColumnInterpreter();
>> >>>> >>
>> >>>> >>         double avg = aClient.avg(TEST_TABLE, ci,
>> >>>> >>                 scan);
>> >>>> >>         leases.close();
>> >>>> >>         long endTime = System.currentTimeMillis();
>> >>>> >>         System.out.println("End time: " + endTime);
>> >>>> >>         System.out.println("Avg is: " + avg);
>> >>>> >>         System.out.println("Time taken is: " + (endTime - 
>> >>>> >> startTime));
>> >>>> >>
>> >>>> >> However it is failing after some time with the following log messages
>> >>>> >> in the region server log:
>> >>>> >> 2011-09-23 14:21:25,649 DEBUG
>> >>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for
>> >>>> >> region
>> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>> >>>> >> 2011-09-23 14:21:25,650 DEBUG
>> >>>> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file
>> >>>> >> 15761056997240226 evicted 344 block(s)
>> >>>> >> 2011-09-23 14:21:25,650 DEBUG
>> >>>> >> org.apache.hadoop.hbase.regionserver.Store: closed data
>> >>>> >> 2011-09-23 14:21:25,650 INFO
>> >>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed
>> >>>> >>
>> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>> >>>> >> 2011-09-23 14:21:25,650 DEBUG
>> >>>> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
>> >>>> >> Closed region
>> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1.
>> >>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> >>>> >> Server Responder, call execCoprocessor([B@6b6ab732,
>> >>>> >>
>> >>>> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6,
>> >>>> >>
>> >>>> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1}))
>> >>>> >> from 137.72.240.177:47563: output error
>> >>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>> >>>> >> Server handler 9 on 60020 caught:
>> >>>> >> java.nio.channels.ClosedChannelException
>> >>>> >>         at
>> >>>> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>> >>>> >>         at
>> >>>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>> >>>> >>         at
>> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501)
>> >>>> >>         at
>> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876)
>> >>>> >>         at
>> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955)
>> >>>> >>         at
>> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390)
>> >>>> >>         at
>> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240)
>> >>>> >>
>> >>>> >> And the following in the zookeeper log:
>> >>>> >> 2011-09-23 14:19:50,978 INFO
>> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: Established session
>> >>>> >> 0x1329577cad70004 with negotiated timeout 180000 for client
>> >>>> >> /137.72.240.177:45431
>> >>>> >> 2011-09-23 14:21:11,028 WARN
>> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
>> >>>> >> Unable to read additional data from client sessionid
>> >>>> >> 0x1329577cad70004, likely client has closed socket
>> >>>> >> 2011-09-23 14:21:11,029 INFO
>> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
>> >>>> >> for client /137.72.240.177:45431 which had sessionid 
>> >>>> >> 0x1329577cad70004
>> >>>> >> 2011-09-23 14:24:42,000 INFO
>> >>>> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
>> >>>> >> 0x1329577cad70004, timeout of 180000ms exceeded
>> >>>> >> 2011-09-23 14:24:42,001 INFO
>> >>>> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
>> >>>> >> termination for sessionid: 0x1329577cad70004
>> >>>> >>
>> >>>> >> Any ideas on how to resolve this?
>> >>>> >>
>> >>>> >> Thanks in advance.
>> >>>> >>
>> >>>> >> Regards,
>> >>>> >> Mayuresh
>> >>>> >>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> -Mayuresh
>> >>
>> >
>> >
>> >
>> > --
>> > -Mayuresh
>> >
>>
>>
>>
>> --
>> -Mayuresh
>

Reply via email to