Re: Limits on HBase

Ryan Rawson Thu, 14 Oct 2010 18:42:05 -0700

If you have a single row that approaches then exceeds the size of a
region, eventually you will end up having that row as a single region,
with the region encompassing only that one region.


The reason for HBase and bigtable is that the overhead that HDFS
has... every file in HDFS uses a size of RAM that is not dependent on
the size of the file.  Meaning the more files you have, that are
small, you use more and more RAM and run out of namenode scalability.
So HBase exists to store smaller values. There is some overhead. Thus
once you start putting in larger values, you might as well avoid the
overhead and go straight to/from HDFS.

-ryan


On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun
<[email protected]> wrote:
> Let me ask this question from another angle:
>
> The first question is ---
> if I have millions of column in a column family in the same row, such that
> the sum of the key-value pairs exceeds 256MB, what will happen?
>
> example:
> I have a column with key of 256bytes, and the value of 2K, then let's assume
> (256 + timestampe size + 2056) ~=2.5k,
> then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in
> this column family at this row.
>
> Anyone has comments on the math I gave above?
>
>
> The second question is --
> By the way, if I do not turn on the LZO, is my data also compressed (by the
> system)? -- if so, then the above number will increase a couple of times,
> but still there exists a number for the limit of how many columns I can put
> in a row.
>
> The third question is --
> If I do turn on LZO, does that mean the value get compressed first, and then
> the HBase mechanism further compress the key-value pair?
>
> Thanks,
> Sean
>
>
> On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <[email protected]> wrote:
>
>> You can go way beyond the max region split / split size.  HBase will never
>> split the region once it is a single row, even if beyond the split size.
>>
>> Also, if you're using large values, you should have region sizes much
>> larger than the default.  It's common to run with 1-2GB regions in many
>> cases.
>>
>> What you may have seen are recommendations that if your cell values are
>> approaching the default block size on HDFS (64MB), you should consider
>> putting the data directly into HDFS rather than HBase.
>>
>> JG
>>
>> > -----Original Message-----
>> > From: William Kang [mailto:[email protected]]
>>  > Sent: Tuesday, September 07, 2010 7:36 PM
>> > To: [email protected]; [email protected]
>> > Subject: Re: Limits on HBase
>> >
>> > Hi,
>> > Thanks for your reply. How about the row size? I read that a row should
>> > not
>> > be larger than the hdfs file on region server which is 256M in default.
>> > Is
>> > it right? Many thanks.
>> >
>> >
>> > William
>> >
>> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[email protected]>
>> > wrote:
>> >
>> > > In addition to what Jon said please be aware that if compression is
>> > > specified in the table schema, it happens at the store file level --
>> > > compression happens after write I/O, before read I/O, so if you
>> > transmit a
>> > > 100MB object that compresses to 30MB, the performance impact is that
>> > of
>> > > 100MB, not 30MB.
>> > >
>> > > I also try not to go above 50MB as largest cell size, for the same
>> > reason.
>> > > I have tried storing objects larger than 100MB but this can cause out
>> > of
>> > > memory issues on busy regionservers no matter the size of the heap.
>> > When/if
>> > > HBase RPC can send large objects in smaller chunks, this will be less
>> > of an
>> > > issue.
>> > >
>> > > Best regards,
>> > >
>> > >    - Andy
>> > >
>> > > Why is this email five sentences or less?
>> > > http://five.sentenc.es/
>> > >
>> > >
>> > > --- On Mon, 9/6/10, Jonathan Gray <[email protected]> wrote:
>> > >
>> > > > From: Jonathan Gray <[email protected]>
>> > > > Subject: RE: Limits on HBase
>> > > > To: "[email protected]" <[email protected]>
>> > > > Date: Monday, September 6, 2010, 4:10 PM
>> > > > I'm not sure what you mean by
>> > > > "optimized cell size" or whether you're just asking about
>> > > > practical limits?
>> > > >
>> > > > HBase is generally used with cells in the range of tens of
>> > > > bytes to hundreds of kilobytes.  However, I have used
>> > > > it with cells that are several megabytes, up to about
>> > > > 50MB.  Up at that level, I have seen some weird
>> > > > performance issues.
>> > > >
>> > > > The most important thing is to be sure to tweak all of your
>> > > > settings.  If you have 20MB cells, you need to be sure
>> > > > to increase the flush size beyond 64MB and the split size
>> > > > beyond 256MB.  You also need enough memory to support
>> > > > all this large object allocation.
>> > > >
>> > > > And of course, test test test.  That's the easiest way
>> > > > to see if what you want to do will work :)
>> > > >
>> > > > When you run into problems, e-mail the list.
>> > > >
>> > > > As far as row size is concerned, the only issue is that a
>> > > > row can never span multiple regions so a given row can only
>> > > > be in one region and thus be hosted on one server at a
>> > > > time.
>> > > >
>> > > > JG
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: William Kang [mailto:[email protected]]
>> > > > > Sent: Monday, September 06, 2010 1:57 PM
>> > > > > To: hbase-user
>> > > > > Subject: Limits on HBase
>> > > > >
>> > > > > Hi folks,
>> > > > > I know this question may have been asked many times,
>> > > > but I am wondering
>> > > > > if
>> > > > > there is any update on the optimized cell size (in
>> > > > megabytes) and row
>> > > > > size
>> > > > > (in megabytes)? Many thanks.
>> > > > >
>> > > > >
>> > > > > William
>> > > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>

Re: Limits on HBase

Reply via email to