Re: LeaseException while extracting data via pig/hbase integration

Mikael Sitruk Thu, 16 Feb 2012 14:23:47 -0800

Ok i understand you now, but i think that the lines are different so , can
you paste the method (full content instead of patch) into the email, i will
compile and check?


Mikael.S

On Thu, Feb 16, 2012 at 7:49 PM, Andrew Purtell <apurt...@apache.org> wrote:

> I'm wondering if the removal and re-add of the lease is racy. We used to
> just refresh the lease.
>
> In the patch provided I don't remove the lease and add it back, instead
> just refresh it on the way out. If you apply the patch and the
> LeaseExceptions go away, then we will know this works for you. I've applied
> this patch to our internal build as part of tracking down what might be
> spurious LeaseExceptions. I've been blaming the clients but maybe that is
> wrong.
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> ----- Original Message -----
> > From: Mikael Sitruk <mikael.sit...@gmail.com>
> > To: user@hbase.apache.org; Andrew Purtell <apurt...@apache.org>
> > Cc:
> > Sent: Wednesday, February 15, 2012 11:32 PM
> > Subject: Re: LeaseException while extracting data via pig/hbase
> integration
> >
> > Andy hi
> >
> > Not sure what you mean by "Does something like the below help?" The
> > current
> > code running is pasted below, line number are sightly different than
> yours.
> > It seems very close to the first file (revision "a") in your extract.
> >
> > Mikael.S
> >
> >   public Result[] next(final long scannerId, int nbRows) throws
> IOException
> > {
> >     String scannerName = String.valueOf(scannerId);
> >     InternalScanner s = this.scanners.get(scannerName);
> >     if (s == null) throw new UnknownScannerException("Name: " +
> > scannerName);
> >     try {
> >       checkOpen();
> >     } catch (IOException e) {
> >       // If checkOpen failed, server not running or filesystem gone,
> >       // cancel this lease; filesystem is gone or we're closing or
> > something.
> >       try {
> >         this.leases.cancelLease(scannerName);
> >       } catch (LeaseException le) {
> >         LOG.info("Server shutting down and client tried to access missing
> > scanner " +
> >           scannerName);
> >       }
> >       throw e;
> >     }
> >     Leases.Lease lease = null;
> >     try {
> >       // Remove lease while its being processed in server; protects
> against
> > case
> >       // where processing of request takes > lease expiration time.
> >       lease = this.leases.removeLease(scannerName);
> >       List<Result> results = new ArrayList<Result>(nbRows);
> >       long currentScanResultSize = 0;
> >       List<KeyValue> values = new ArrayList<KeyValue>();
> >       for (int i = 0; i < nbRows
> >           && currentScanResultSize < maxScannerResultSize; i++) {
> >         requestCount.incrementAndGet();
> >         // Collect values to be returned here
> >         boolean moreRows = s.next(values);
> >         if (!values.isEmpty()) {
> >           for (KeyValue kv : values) {
> >             currentScanResultSize += kv.heapSize();
> >           }
> >           results.add(new Result(values));
> >         }
> >         if (!moreRows) {
> >           break;
> >         }
> >         values.clear();
> >       }
> >       // Below is an ugly hack where we cast the InternalScanner to be a
> >       // HRegion.RegionScanner. The alternative is to change
> InternalScanner
> >       // interface but its used everywhere whereas we just need a bit of
> > info
> >       // from HRegion.RegionScanner, IF its filter if any is done with
> the
> > scan
> >       // and wants to tell the client to stop the scan. This is done by
> > passing
> >       // a null result.
> >       return ((HRegion.RegionScanner) s).isFilterDone() &&
> > results.isEmpty() ? null
> >           : results.toArray(new Result[0]);
> >     } catch (Throwable t) {
> >       if (t instanceof NotServingRegionException) {
> >         this.scanners.remove(scannerName);
> >       }
> >       throw convertThrowableToIOE(cleanup(t));
> >     } finally {
> >       // We're done. On way out readd the above removed lease.  Adding
> > resets
> >       // expiration time on lease.
> >       if (this.scanners.containsKey(scannerName)) {
> >         if (lease != null) this.leases.addLease(lease);
> >       }
> >     }
> >   }
> >
> > On Thu, Feb 16, 2012 at 3:10 AM, Andrew Purtell <apurt...@apache.org>
> > wrote:
> >
> >>  Hmm...
> >>
> >>  Does something like the below help?
> >>
> >>
> >>  diff --git
> >>  a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> >>  index f9627ed..0cee8e3 100644
> >>  ---
> a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> >>  +++
> b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> >>  @@ -2137,11 +2137,7 @@ public class HRegionServer implements
> >>  HRegionInterface, HBaseRPCErrorHandler,
> >>         }
> >>         throw e;
> >>       }
> >>  -    Leases.Lease lease = null;
> >>       try {
> >>  -      // Remove lease while its being processed in server; protects
> >>  against case
> >>  -      // where processing of request takes > lease expiration time.
> >>  -      lease = this.leases.removeLease(scannerName);
> >>         List<Result> results = new ArrayList<Result>(nbRows);
> >>         long currentScanResultSize = 0;
> >>         List<KeyValue> values = new ArrayList<KeyValue>();
> >>  @@ -2197,10 +2193,9 @@ public class HRegionServer implements
> >>  HRegionInterface, HBaseRPCErrorHandler,
> >>         }
> >>         throw convertThrowableToIOE(cleanup(t));
> >>       } finally {
> >>  -      // We're done. On way out readd the above removed lease.  Adding
> >>  resets
> >>  -      // expiration time on lease.
> >>  +      // We're done. On way out reset expiration time on lease.
> >>         if (this.scanners.containsKey(scannerName)) {
> >>  -        if (lease != null) this.leases.addLease(lease);
> >>  +        this.leases.renewLease(scannerName);
> >>         }
> >>       }
> >>     }
> >>
> >>
> >>
> >>  Best regards,
> >>
> >>      - Andy
> >>
> >>  Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>  (via Tom White)
> >>
> >>
> >>
> >>  ----- Original Message -----
> >>  > From: Jean-Daniel Cryans <jdcry...@apache.org>
> >>  > To: user@hbase.apache.org
> >>  > Cc:
> >>  > Sent: Wednesday, February 15, 2012 10:17 AM
> >>  > Subject: Re: LeaseException while extracting data via pig/hbase
> >>  integration
> >>  >
> >>  > You would have to grep the lease's id, in your first email it was
> >>  > "-7220618182832784549".
> >>  >
> >>  > About the time it takes to process each row, I meant client (pig)
> side
> >>  > not in the RS.
> >>  >
> >>  > J-D
> >>  >
> >>  > On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk
> > <mikael.sit...@gmail.com>
> >>  > wrote:
> >>  >>  Please see answer inline
> >>  >>  Thanks
> >>  >>  Mikael.S
> >>  >>
> >>  >>  On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans
> >>  > <jdcry...@apache.org>wrote:
> >>  >>
> >>  >>>  On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk
> >>  > <mikael.sit...@gmail.com>
> >>  >>>  wrote:
> >>  >>>  > hi,
> >>  >>>  > Well no, i can't figure out what is the problem, but
> > i saw
> >>  > that someone
> >>  >>>  > else had the same problem (see email:
> > "LeaseException despite
> >>  > high
> >>  >>>  > hbase.regionserver.lease.period")
> >>  >>>  > What can i tell is the following:
> >>  >>>  > Last week the problem was consistent
> >>  >>>  > 1. I updated hbase.regionserver.lease.period=300000 (5
> > mins),
> >>  > restarted
> >>  >>>  the
> >>  >>>  > cluster and still got the problem, the map got this
> > exception
> >>  > event
> >>  >>>  before
> >>  >>>  > the 5 mins, (some after 1 min and 20 sec)
> >>  >>>
> >>  >>>  That's extremely suspicious. Are you sure the setting is
> > getting
> >>  > picked
> >>  >>>  up? :) I hope so :-)
> >>  >>>
> >>  >>>  You should be able to tell when the lease really expires by
> > simply
> >>  >>>  grepping for the number in the region server log, it should
> > give you a
> >>  >>>  good idea of what your lease period is.
> >>  >>>   greeping on which value? the lease configured here:300000?
> > It does
> >>  not
> >>  >>>  return anything, also tried in current execution where some
> > were ok
> >>  and
> >>  >>>  some were not
> >>  >>>
> >>  >>>  2. The problem occurs only on job that will extract a large
> > number of
> >>  >>>  > columns (>150 cols per row)
> >>  >>>
> >>  >>>  What's your scanner caching set to? Are you spending a
> > lot of time
> >>  >>>  processing each row? from the job configuration generated by
> > pig i can
> >>  > see
> >>  >>>  caching set to 1, regarding the processing time of each row i
> > have no
> >>  > clue
> >>  >>>  how many time it spent. the data for each row is 150 columns
> > of 2k
> >>  > each.
> >>  >>>  This is approx 5 block to bring.
> >>  >>>
> >>  >>>  > 3. The problem never occurred when only 1 map per server
> > is
> >>  > running (i
> >>  >>>  have
> >>  >>>  > 8 CPU with hyper-threaded enabled = 16, so using only 1
> > map per
> >>  > machine
> >>  >>>  is
> >>  >>>  > just a waste), (at this stage I was thinking perhaps
> > there is a
> >>  >>>  > multi-threaded problem)
> >>  >>>
> >>  >>>  More mappers would pull more data from the region servers so
> > more
> >>  >>>  concurrency from the disks, using more mappers might just
> > slow you
> >>  >>>  down enough that you hit the issue.
> >>  >>>
> >>  >>  Today i ran with 8 mappers and some failed and some didn't (2
> > of 4),
> >>  > they
> >>  >>  got the lease exception after 5 mins, i will try to check the
> >>  >>  logs/sar/metric files for additional info
> >>  >>
> >>  >>>
> >>  >>>  >
> >>  >>>  > This week i got a sightly different behavior, after
> > having
> >>  > restarted the
> >>  >>>  > servers. The extract were able to ran ok in most of the
> > runs even
> >>  > with 4
> >>  >>>  > maps running (per servers), i got only once the
> > exception but the
> >>  > job was
> >>  >>>  > not killed as other runs last week
> >>  >>>
> >>  >>>  If the client got an UnknownScannerException before the
> > timeout
> >>  >>>  expires (the client also keeps track of it, although it may
> > have a
> >>  >>>  different configuration), it will recreate the scanner.
> >>  >>>
> >>  >>  No this is not the case.
> >>  >>
> >>  >>>
> >>  >>>  Which reminds me, are your regions moving around? If so, and
> > your
> >>  >>>  clients don't know about the high timeout, then they
> > might let the
> >>  >>>  exception pass on to your own code.
> >>  >>>
> >>  >>  Region are presplited ahead, i do not have any region split
> > during the
> >>  run,
> >>  >>  region size is set of 8GB, storefile is around 3.5G
> >>  >>
> >>  >>  The test was run after major compaction, so the number of store
> > file
> >>  is 1
> >>  >>  per RS/family
> >>  >>
> >>  >>
> >>  >>>
> >>  >>>  J-D
> >>  >>>
> >>  >>
> >>  >>
> >>  >>
> >>  >>  --
> >>  >>  Mikael.S
> >>  >
> >>
> >
> >
> >
> > --
> > Mikael.S
> >
>



-- 
Mikael.S

Re: LeaseException while extracting data via pig/hbase integration

Reply via email to