Let me ask this question from another angle: The first question is --- if I have millions of column in a column family in the same row, such that the sum of the key-value pairs exceeds 256MB, what will happen?
example: I have a column with key of 256bytes, and the value of 2K, then let's assume (256 + timestampe size + 2056) ~=2.5k, then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in this column family at this row. Anyone has comments on the math I gave above? The second question is -- By the way, if I do not turn on the LZO, is my data also compressed (by the system)? -- if so, then the above number will increase a couple of times, but still there exists a number for the limit of how many columns I can put in a row. The third question is -- If I do turn on LZO, does that mean the value get compressed first, and then the HBase mechanism further compress the key-value pair? Thanks, Sean On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <[email protected]> wrote: > You can go way beyond the max region split / split size. HBase will never > split the region once it is a single row, even if beyond the split size. > > Also, if you're using large values, you should have region sizes much > larger than the default. It's common to run with 1-2GB regions in many > cases. > > What you may have seen are recommendations that if your cell values are > approaching the default block size on HDFS (64MB), you should consider > putting the data directly into HDFS rather than HBase. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[email protected]] > > Sent: Tuesday, September 07, 2010 7:36 PM > > To: [email protected]; [email protected] > > Subject: Re: Limits on HBase > > > > Hi, > > Thanks for your reply. How about the row size? I read that a row should > > not > > be larger than the hdfs file on region server which is 256M in default. > > Is > > it right? Many thanks. > > > > > > William > > > > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[email protected]> > > wrote: > > > > > In addition to what Jon said please be aware that if compression is > > > specified in the table schema, it happens at the store file level -- > > > compression happens after write I/O, before read I/O, so if you > > transmit a > > > 100MB object that compresses to 30MB, the performance impact is that > > of > > > 100MB, not 30MB. > > > > > > I also try not to go above 50MB as largest cell size, for the same > > reason. > > > I have tried storing objects larger than 100MB but this can cause out > > of > > > memory issues on busy regionservers no matter the size of the heap. > > When/if > > > HBase RPC can send large objects in smaller chunks, this will be less > > of an > > > issue. > > > > > > Best regards, > > > > > > - Andy > > > > > > Why is this email five sentences or less? > > > http://five.sentenc.es/ > > > > > > > > > --- On Mon, 9/6/10, Jonathan Gray <[email protected]> wrote: > > > > > > > From: Jonathan Gray <[email protected]> > > > > Subject: RE: Limits on HBase > > > > To: "[email protected]" <[email protected]> > > > > Date: Monday, September 6, 2010, 4:10 PM > > > > I'm not sure what you mean by > > > > "optimized cell size" or whether you're just asking about > > > > practical limits? > > > > > > > > HBase is generally used with cells in the range of tens of > > > > bytes to hundreds of kilobytes. However, I have used > > > > it with cells that are several megabytes, up to about > > > > 50MB. Up at that level, I have seen some weird > > > > performance issues. > > > > > > > > The most important thing is to be sure to tweak all of your > > > > settings. If you have 20MB cells, you need to be sure > > > > to increase the flush size beyond 64MB and the split size > > > > beyond 256MB. You also need enough memory to support > > > > all this large object allocation. > > > > > > > > And of course, test test test. That's the easiest way > > > > to see if what you want to do will work :) > > > > > > > > When you run into problems, e-mail the list. > > > > > > > > As far as row size is concerned, the only issue is that a > > > > row can never span multiple regions so a given row can only > > > > be in one region and thus be hosted on one server at a > > > > time. > > > > > > > > JG > > > > > > > > > -----Original Message----- > > > > > From: William Kang [mailto:[email protected]] > > > > > Sent: Monday, September 06, 2010 1:57 PM > > > > > To: hbase-user > > > > > Subject: Limits on HBase > > > > > > > > > > Hi folks, > > > > > I know this question may have been asked many times, > > > > but I am wondering > > > > > if > > > > > there is any update on the optimized cell size (in > > > > megabytes) and row > > > > > size > > > > > (in megabytes)? Many thanks. > > > > > > > > > > > > > > > William > > > > > > > > > > > > > > > > > > > >
