Ok i understand you now, but i think that the lines are different so , can you paste the method (full content instead of patch) into the email, i will compile and check?
Mikael.S On Thu, Feb 16, 2012 at 7:49 PM, Andrew Purtell <apurt...@apache.org> wrote: > I'm wondering if the removal and re-add of the lease is racy. We used to > just refresh the lease. > > In the patch provided I don't remove the lease and add it back, instead > just refresh it on the way out. If you apply the patch and the > LeaseExceptions go away, then we will know this works for you. I've applied > this patch to our internal build as part of tracking down what might be > spurious LeaseExceptions. I've been blaming the clients but maybe that is > wrong. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > ----- Original Message ----- > > From: Mikael Sitruk <mikael.sit...@gmail.com> > > To: user@hbase.apache.org; Andrew Purtell <apurt...@apache.org> > > Cc: > > Sent: Wednesday, February 15, 2012 11:32 PM > > Subject: Re: LeaseException while extracting data via pig/hbase > integration > > > > Andy hi > > > > Not sure what you mean by "Does something like the below help?" The > > current > > code running is pasted below, line number are sightly different than > yours. > > It seems very close to the first file (revision "a") in your extract. > > > > Mikael.S > > > > public Result[] next(final long scannerId, int nbRows) throws > IOException > > { > > String scannerName = String.valueOf(scannerId); > > InternalScanner s = this.scanners.get(scannerName); > > if (s == null) throw new UnknownScannerException("Name: " + > > scannerName); > > try { > > checkOpen(); > > } catch (IOException e) { > > // If checkOpen failed, server not running or filesystem gone, > > // cancel this lease; filesystem is gone or we're closing or > > something. > > try { > > this.leases.cancelLease(scannerName); > > } catch (LeaseException le) { > > LOG.info("Server shutting down and client tried to access missing > > scanner " + > > scannerName); > > } > > throw e; > > } > > Leases.Lease lease = null; > > try { > > // Remove lease while its being processed in server; protects > against > > case > > // where processing of request takes > lease expiration time. > > lease = this.leases.removeLease(scannerName); > > List<Result> results = new ArrayList<Result>(nbRows); > > long currentScanResultSize = 0; > > List<KeyValue> values = new ArrayList<KeyValue>(); > > for (int i = 0; i < nbRows > > && currentScanResultSize < maxScannerResultSize; i++) { > > requestCount.incrementAndGet(); > > // Collect values to be returned here > > boolean moreRows = s.next(values); > > if (!values.isEmpty()) { > > for (KeyValue kv : values) { > > currentScanResultSize += kv.heapSize(); > > } > > results.add(new Result(values)); > > } > > if (!moreRows) { > > break; > > } > > values.clear(); > > } > > // Below is an ugly hack where we cast the InternalScanner to be a > > // HRegion.RegionScanner. The alternative is to change > InternalScanner > > // interface but its used everywhere whereas we just need a bit of > > info > > // from HRegion.RegionScanner, IF its filter if any is done with > the > > scan > > // and wants to tell the client to stop the scan. This is done by > > passing > > // a null result. > > return ((HRegion.RegionScanner) s).isFilterDone() && > > results.isEmpty() ? null > > : results.toArray(new Result[0]); > > } catch (Throwable t) { > > if (t instanceof NotServingRegionException) { > > this.scanners.remove(scannerName); > > } > > throw convertThrowableToIOE(cleanup(t)); > > } finally { > > // We're done. On way out readd the above removed lease. Adding > > resets > > // expiration time on lease. > > if (this.scanners.containsKey(scannerName)) { > > if (lease != null) this.leases.addLease(lease); > > } > > } > > } > > > > On Thu, Feb 16, 2012 at 3:10 AM, Andrew Purtell <apurt...@apache.org> > > wrote: > > > >> Hmm... > >> > >> Does something like the below help? > >> > >> > >> diff --git > >> a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > >> index f9627ed..0cee8e3 100644 > >> --- > a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > >> +++ > b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java > >> @@ -2137,11 +2137,7 @@ public class HRegionServer implements > >> HRegionInterface, HBaseRPCErrorHandler, > >> } > >> throw e; > >> } > >> - Leases.Lease lease = null; > >> try { > >> - // Remove lease while its being processed in server; protects > >> against case > >> - // where processing of request takes > lease expiration time. > >> - lease = this.leases.removeLease(scannerName); > >> List<Result> results = new ArrayList<Result>(nbRows); > >> long currentScanResultSize = 0; > >> List<KeyValue> values = new ArrayList<KeyValue>(); > >> @@ -2197,10 +2193,9 @@ public class HRegionServer implements > >> HRegionInterface, HBaseRPCErrorHandler, > >> } > >> throw convertThrowableToIOE(cleanup(t)); > >> } finally { > >> - // We're done. On way out readd the above removed lease. Adding > >> resets > >> - // expiration time on lease. > >> + // We're done. On way out reset expiration time on lease. > >> if (this.scanners.containsKey(scannerName)) { > >> - if (lease != null) this.leases.addLease(lease); > >> + this.leases.renewLease(scannerName); > >> } > >> } > >> } > >> > >> > >> > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >> (via Tom White) > >> > >> > >> > >> ----- Original Message ----- > >> > From: Jean-Daniel Cryans <jdcry...@apache.org> > >> > To: user@hbase.apache.org > >> > Cc: > >> > Sent: Wednesday, February 15, 2012 10:17 AM > >> > Subject: Re: LeaseException while extracting data via pig/hbase > >> integration > >> > > >> > You would have to grep the lease's id, in your first email it was > >> > "-7220618182832784549". > >> > > >> > About the time it takes to process each row, I meant client (pig) > side > >> > not in the RS. > >> > > >> > J-D > >> > > >> > On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk > > <mikael.sit...@gmail.com> > >> > wrote: > >> >> Please see answer inline > >> >> Thanks > >> >> Mikael.S > >> >> > >> >> On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans > >> > <jdcry...@apache.org>wrote: > >> >> > >> >>> On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk > >> > <mikael.sit...@gmail.com> > >> >>> wrote: > >> >>> > hi, > >> >>> > Well no, i can't figure out what is the problem, but > > i saw > >> > that someone > >> >>> > else had the same problem (see email: > > "LeaseException despite > >> > high > >> >>> > hbase.regionserver.lease.period") > >> >>> > What can i tell is the following: > >> >>> > Last week the problem was consistent > >> >>> > 1. I updated hbase.regionserver.lease.period=300000 (5 > > mins), > >> > restarted > >> >>> the > >> >>> > cluster and still got the problem, the map got this > > exception > >> > event > >> >>> before > >> >>> > the 5 mins, (some after 1 min and 20 sec) > >> >>> > >> >>> That's extremely suspicious. Are you sure the setting is > > getting > >> > picked > >> >>> up? :) I hope so :-) > >> >>> > >> >>> You should be able to tell when the lease really expires by > > simply > >> >>> grepping for the number in the region server log, it should > > give you a > >> >>> good idea of what your lease period is. > >> >>> greeping on which value? the lease configured here:300000? > > It does > >> not > >> >>> return anything, also tried in current execution where some > > were ok > >> and > >> >>> some were not > >> >>> > >> >>> 2. The problem occurs only on job that will extract a large > > number of > >> >>> > columns (>150 cols per row) > >> >>> > >> >>> What's your scanner caching set to? Are you spending a > > lot of time > >> >>> processing each row? from the job configuration generated by > > pig i can > >> > see > >> >>> caching set to 1, regarding the processing time of each row i > > have no > >> > clue > >> >>> how many time it spent. the data for each row is 150 columns > > of 2k > >> > each. > >> >>> This is approx 5 block to bring. > >> >>> > >> >>> > 3. The problem never occurred when only 1 map per server > > is > >> > running (i > >> >>> have > >> >>> > 8 CPU with hyper-threaded enabled = 16, so using only 1 > > map per > >> > machine > >> >>> is > >> >>> > just a waste), (at this stage I was thinking perhaps > > there is a > >> >>> > multi-threaded problem) > >> >>> > >> >>> More mappers would pull more data from the region servers so > > more > >> >>> concurrency from the disks, using more mappers might just > > slow you > >> >>> down enough that you hit the issue. > >> >>> > >> >> Today i ran with 8 mappers and some failed and some didn't (2 > > of 4), > >> > they > >> >> got the lease exception after 5 mins, i will try to check the > >> >> logs/sar/metric files for additional info > >> >> > >> >>> > >> >>> > > >> >>> > This week i got a sightly different behavior, after > > having > >> > restarted the > >> >>> > servers. The extract were able to ran ok in most of the > > runs even > >> > with 4 > >> >>> > maps running (per servers), i got only once the > > exception but the > >> > job was > >> >>> > not killed as other runs last week > >> >>> > >> >>> If the client got an UnknownScannerException before the > > timeout > >> >>> expires (the client also keeps track of it, although it may > > have a > >> >>> different configuration), it will recreate the scanner. > >> >>> > >> >> No this is not the case. > >> >> > >> >>> > >> >>> Which reminds me, are your regions moving around? If so, and > > your > >> >>> clients don't know about the high timeout, then they > > might let the > >> >>> exception pass on to your own code. > >> >>> > >> >> Region are presplited ahead, i do not have any region split > > during the > >> run, > >> >> region size is set of 8GB, storefile is around 3.5G > >> >> > >> >> The test was run after major compaction, so the number of store > > file > >> is 1 > >> >> per RS/family > >> >> > >> >> > >> >>> > >> >>> J-D > >> >>> > >> >> > >> >> > >> >> > >> >> -- > >> >> Mikael.S > >> > > >> > > > > > > > > -- > > Mikael.S > > > -- Mikael.S