Hey Mayuresh! Can you try setting the Scan cache value to a good number (like a few 100 or more)?
Cheers, Himanshu On Mon, Sep 26, 2011 at 9:01 AM, Himanshu Vashishtha <[email protected]> wrote: > > Hey Mayuresh! > Can you try setting the Scan cache value to a good number (like a few 100 or > more)? > > Cheers, > Himanshu > > On Mon, Sep 26, 2011 at 8:41 AM, Mayuresh <[email protected]> > wrote: >> >> Guys any guidance? Am not getting any clue onto how to solve this problem. >> >> On Mon, Sep 26, 2011 at 3:52 PM, Mayuresh <[email protected]> >> wrote: >> > I increased the leases in hbase-site.xml to around 50 minutes: >> > >> > <property> >> > <name>hbase.regionserver.lease.period</name> >> > <!--value>60000</value--> >> > <value>3000000</value> >> > <description>HRegion server lease period in milliseconds. Default is >> > 60 seconds. Clients must report in within this period else they are >> > considered dead.</description> >> > </property> >> > >> > However I still fail with the same error: >> > >> > 2011-09-26 15:50:28,857 WARN org.apache.hadoop.ipc.HBaseServer: >> > (responseTooSlow): >> > {"processingtimems":118696,"call":"execCoprocessor([B@5b0e6f59, >> > getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3, >> > {\"timeRange\":[0,9223372036854775807],\"batch\":-1,\"startRow\":\"\",\"stopRow\":\"\",\"totalColumns\":0,\"cacheBlocks\":true,\"families\":{\"data\":[]},\"maxVersions\":1,\"caching\":-1}))","client":"137.72.240.180:54843","starttimems":1317032310155,"queuetimems":9,"class":"HRegionServer","responsesize":0,"method":"execCoprocessor"} >> > 2011-09-26 15:50:28,872 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> > Server Responder, call execCoprocessor([B@5b0e6f59, >> > getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@10b06ac3, >> > {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1})) >> > from 137.72.240.180:54843: output error >> > 2011-09-26 15:50:28,873 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> > Server handler 4 on 60020 caught: >> > java.nio.channels.ClosedChannelException >> > at >> > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) >> > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390) >> > at >> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240) >> > >> > What does this responseTooSlow and "output error" mean? Any clues to >> > find whats causing this? >> > >> > Thanks. >> > >> > >> > On Mon, Sep 26, 2011 at 2:26 PM, Mayuresh <[email protected]> >> > wrote: >> >> Hi Ted, >> >> >> >> Yes I am aware that this isn't a good setup. I am working on >> >> understanding the coprocessors before I can use a bigger setup. >> >> >> >> I am talking about the prez at >> >> https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew >> >> Purtell, slide 5 : >> >> >> >> <quote> >> >> For long running jobs the client must periodically poll status to keep it >> >> alive; jobs without interest will be cancelled >> >> </quote> >> >> >> >> >> >> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <[email protected]> wrote: >> >>> Looks like my response to user@ got bounced. >> >>> >> >>> Himanshu is continuing the work on HBASE-3607 >> >>> >> >>>>> Currently all the regions are on a single region server. >> >>> I don't think the above is a good setup. >> >>> >> >>>>> It was mentioned to poll the job status periodically to avoid timeouts. >> >>> Can you tell us more about the presentation ? Normally you can include >> >>> the >> >>> author in the cc list of this email chain. >> >>> >> >>> Cheers >> >>> >> >>> On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh >> >>> <[email protected]> >> >>> wrote: >> >>>> >> >>>> Yes the computation takes longer than 60 secs. Will setting lease >> >>>> timeout in hbase-site.xml help? >> >>>> >> >>>> In one of the coprocessor presentations. It was mentioned to poll the >> >>>> job status periodically to avoid timeouts. Any idea how to do that? >> >>>> >> >>>> On Sep 23, 2011 9:59 PM, "Stack" <[email protected]> wrote: >> >>>> > Is it doing a long GC? Is the aggregation taking longer than the >> >>>> > client 60 second timeout (You are setting lease below but I don't >> >>>> > think that will have an effect since it server-side)? >> >>>> > >> >>>> > St.Ack >> >>>> > >> >>>> > >> >>>> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh >> >>>> > <[email protected]> wrote: >> >>>> >> Hi, >> >>>> >> >> >>>> >> I am using the AggregationClient to perform aggregate calculation on >> >>>> >> a >> >>>> >> 10 M row hbase table which is spread across around 27 regions. >> >>>> >> Currently all the regions are on a single region server. >> >>>> >> >> >>>> >> Here is briefly what the code is doing: >> >>>> >> Leases leases = new Leases(40 * 60 * 1000, 5 * 1000); >> >>>> >> leases.start(); >> >>>> >> long startTime = System.currentTimeMillis(); >> >>>> >> System.out.println("Start time: " + startTime); >> >>>> >> Configuration conf = new Configuration(true); >> >>>> >> AggregationClient aClient = new AggregationClient(conf); >> >>>> >> >> >>>> >> Scan scan = new Scan(); >> >>>> >> scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); >> >>>> >> final ColumnInterpreter<Long, Long> ci = new >> >>>> >> LongColumnInterpreter(); >> >>>> >> >> >>>> >> double avg = aClient.avg(TEST_TABLE, ci, >> >>>> >> scan); >> >>>> >> leases.close(); >> >>>> >> long endTime = System.currentTimeMillis(); >> >>>> >> System.out.println("End time: " + endTime); >> >>>> >> System.out.println("Avg is: " + avg); >> >>>> >> System.out.println("Time taken is: " + (endTime - >> >>>> >> startTime)); >> >>>> >> >> >>>> >> However it is failing after some time with the following log messages >> >>>> >> in the region server log: >> >>>> >> 2011-09-23 14:21:25,649 DEBUG >> >>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for >> >>>> >> region >> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >>>> >> 2011-09-23 14:21:25,650 DEBUG >> >>>> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file >> >>>> >> 15761056997240226 evicted 344 block(s) >> >>>> >> 2011-09-23 14:21:25,650 DEBUG >> >>>> >> org.apache.hadoop.hbase.regionserver.Store: closed data >> >>>> >> 2011-09-23 14:21:25,650 INFO >> >>>> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed >> >>>> >> >> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >>>> >> 2011-09-23 14:21:25,650 DEBUG >> >>>> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: >> >>>> >> Closed region >> >>>> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >>>> >> Server Responder, call execCoprocessor([B@6b6ab732, >> >>>> >> >> >>>> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6, >> >>>> >> >> >>>> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1})) >> >>>> >> from 137.72.240.177:47563: output error >> >>>> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >>>> >> Server handler 9 on 60020 caught: >> >>>> >> java.nio.channels.ClosedChannelException >> >>>> >> at >> >>>> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) >> >>>> >> at >> >>>> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >> >>>> >> at >> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501) >> >>>> >> at >> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876) >> >>>> >> at >> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955) >> >>>> >> at >> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390) >> >>>> >> at >> >>>> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240) >> >>>> >> >> >>>> >> And the following in the zookeeper log: >> >>>> >> 2011-09-23 14:19:50,978 INFO >> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: Established session >> >>>> >> 0x1329577cad70004 with negotiated timeout 180000 for client >> >>>> >> /137.72.240.177:45431 >> >>>> >> 2011-09-23 14:21:11,028 WARN >> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException: >> >>>> >> Unable to read additional data from client sessionid >> >>>> >> 0x1329577cad70004, likely client has closed socket >> >>>> >> 2011-09-23 14:21:11,029 INFO >> >>>> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection >> >>>> >> for client /137.72.240.177:45431 which had sessionid >> >>>> >> 0x1329577cad70004 >> >>>> >> 2011-09-23 14:24:42,000 INFO >> >>>> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session >> >>>> >> 0x1329577cad70004, timeout of 180000ms exceeded >> >>>> >> 2011-09-23 14:24:42,001 INFO >> >>>> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session >> >>>> >> termination for sessionid: 0x1329577cad70004 >> >>>> >> >> >>>> >> Any ideas on how to resolve this? >> >>>> >> >> >>>> >> Thanks in advance. >> >>>> >> >> >>>> >> Regards, >> >>>> >> Mayuresh >> >>>> >> >> >>> >> >>> >> >> >> >> >> >> >> >> -- >> >> -Mayuresh >> >> >> > >> > >> > >> > -- >> > -Mayuresh >> > >> >> >> >> -- >> -Mayuresh >
