Re: FilterList not working as expected

2011-02-18 Thread Bill Graham
Just to follow up, this appears to be a bug. I've created a JIRA. https://issues.apache.org/jira/browse/HBASE-3550 On Fri, Feb 18, 2011 at 10:57 AM, Bill Graham wrote: > Hi, > > I'm unable to get ColumnPrefixFilter working when I use it in a > FilterList and I'm wondering if this is a bug or a m

Re: Passing config information to a Coprocessor

2011-02-18 Thread Andrew Purtell
Completely up to the designer. Could be via Configuration (hbase-site.xml). Could be an API added via Endpoint / dynamic RPC. Could be table or column descriptor attributes ({HTD,HCD}.{get,set}Value()). Could be via some embedded library. I would suggest static configuration via table and/or co

Passing config information to a Coprocessor

2011-02-18 Thread Jason Rutherglen
How does one pass configuration parameters to a Coprocessor?

Re: Cluster Size/Node Density

2011-02-18 Thread Todd Lipcon
On Fri, Feb 18, 2011 at 12:10 PM, Jean-Daniel Cryans wrote: > The bigger the heap the longer the GC pause of the world when fragmentation requires it, 8GB is "safer". > On my boxes, a stop-the-world on 8G heap is already around 80 seconds... pretty catastrophic. Of course we've bumped the ZK tim

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
The bigger the heap the longer the GC pause of the world when fragmentation requires it, 8GB is "safer". In 0.90.1 you can try enabling the new memstore allocator that seems to do a really good job, checkout the jira first: https://issues.apache.org/jira/browse/HBASE-3455 J-D On Fri, Feb 18, 201

Re: Cluster Size/Node Density

2011-02-18 Thread Ted Dunning
Actually, having a smaller heap will decrease the risk of a catastrophic GC. It probably wil also increase the likelihood of a full GC. Having a larger heap will let you go long without a full GC, but with a very large heap a full GC may take your region server off-line long enough to be consider

Re: Cluster Size/Node Density

2011-02-18 Thread Chris Tarnas
Thank you , ad that bring me to my next question... What is the current recommendation on the max heap size for Hbase if RAM on the server is not an issue? Right now I am at 8GB and have no issues, can I safely do 12GB? The servers have plenty of RAM (48GB) so that should not be an issue - I j

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
That's what I usually recommend, the bigger the flushed files the better. On the other hand, you only have so much memory to dedicate to the MemStore... J-D On Fri, Feb 18, 2011 at 11:50 AM, Chris Tarnas wrote: > Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you > ha

Re: Not running balancer because processing dead regionserver(s)

2011-02-18 Thread Jean-Daniel Cryans
The master should finish processing those dead servers at some point and it seems it's not happening? Unfortunately without the log nobody can'tell why. If you can post the complete log in pastebin or put it on a web server then we could take a look. J-D On Fri, Feb 18, 2011 at 12:39 AM, Yi Liang

Re: HBase 0.90.0 region servers dying

2011-02-18 Thread Jean-Daniel Cryans
Just to make sure, you did check in the .out file after a failure right? J-D On Thu, Feb 17, 2011 at 10:14 PM, Enis Soztutar wrote: > Hi, > > Thanks everyone for the answers. > I had already  increase the file descriptors to 32768. The region servers > and the zookeeper processes are dying, but

Re: Cluster Size/Node Density

2011-02-18 Thread Chris Tarnas
Would it be a good idea to raise the hbase.hregion.memstore.flush.size if you have really large regions? -chris On Feb 18, 2011, at 11:43 AM, Jean-Daniel Cryans wrote: > Less regions, but it's often a good thing if you have a lot of data :) > > It's probably a good thing to bump the HDFS block

Re: Tall versus wide tables in Hbase

2011-02-18 Thread Jean-Daniel Cryans
This has been discussed recently on the mailing list, see those two threads for example: http://search-hadoop.com/m/amq9c1OaV9z1/wide+tall+hbase+table&subj=Insert+into+tall+table+50+faster+than+wide+table and http://search-hadoop.com/m/zbKmE14o0Js/wide+tall+hbase+table&subj=Re+Parent+child+relat

Re: Cluster Size/Node Density

2011-02-18 Thread Jean-Daniel Cryans
Less regions, but it's often a good thing if you have a lot of data :) It's probably a good thing to bump the HDFS block size to 128 or 256MB since you know you're going to have huge-ish files. But anyway regarding penalties, I can't think of one that clearly comes out (unless you use a very smal

Re: Cluster Size/Node Density

2011-02-18 Thread Jason Rutherglen
> We are also using a 5Gb region size to keep our region > counts in the 100-200 range/node per Jonathan Grey's recommendation. So there isn't a penalty incurred from increasing the max region size from 256MB to 5GB? On Fri, Feb 18, 2011 at 10:12 AM, Wayne wrote: > We have managed to get a litt

Re: Caching HBase connection

2011-02-18 Thread Jean-Daniel Cryans
The connection is kept open for the lifetime of the JVM. It's also good to keep HTable's opened, one per thread per table, as the real connections are done in a utility class inside HBaseConnectionManager (which you don't have to worry about). J-D On Fri, Feb 18, 2011 at 11:06 AM, Nanheng Wu wro

Caching HBase connection

2011-02-18 Thread Nanheng Wu
I am using a HBase as backend for a service. I want to somehow cache the connection to HBase so each request doesn't need to pay the cost of making the connection. I am already cacheing the HTable object, is that enough or is there a better way? And how long can the connection be held onto? Thanks!

FilterList not working as expected

2011-02-18 Thread Bill Graham
Hi, I'm unable to get ColumnPrefixFilter working when I use it in a FilterList and I'm wondering if this is a bug or a mis-usage on my part. If I set ColumnPrefixFilter directly on the Scan object all works fine. The following code shows an example of scanning a table with a column descriptor 'inf

Re: Unit test (junit) very slow

2011-02-18 Thread Jean-Daniel Cryans
There's probably (and I'm 99% sure) a DNS timeout happening when resolving your machine's hostname. Review your DNS settings. J-D On Fri, Feb 18, 2011 at 10:53 AM, Fabiano D. Beppler wrote: > Hi, > > I am running a very simple JUnit test with HBase and the test takes a lot of > time to run when

Unit test (junit) very slow

2011-02-18 Thread Fabiano D. Beppler
Hi, I am running a very simple JUnit test with HBase and the test takes a lot of time to run when the computer is online (ie., connected to a wifi network). When the computer is offline it runs a lot faster. Online it takes more than 169 seconds to run Offline it takes "only" 19 seconds to run W

Re: Cluster Size/Node Density

2011-02-18 Thread Wayne
We have managed to get a little more than 1k QPS to date with 10 nodes. Honestly we are not quite convinced that disk i/o seeks are our biggest bottleneck. Of course they should be...but waiting for RPC connections, network latency, thrift etc. all play into the time to get reads. The std dev. of r

Re: Scanning over key values > timestamp?

2011-02-18 Thread Jason Rutherglen
Ryan, thanks, I think a full scan'll be fine as it's a one time event on startup/recovery, and I am curious either way. On Fri, Feb 18, 2011 at 10:08 AM, Ryan Rawson wrote: > There is minimal/no underlying efficiency. It's basically a full > table/region scan with a filter to discard the unintere

Re: Scanning over key values > timestamp?

2011-02-18 Thread Ryan Rawson
There is minimal/no underlying efficiency. It's basically a full table/region scan with a filter to discard the uninteresting values. We have various timestamp filtering techniques to avoid reading from files, eg: if you specify a time range [100,200) and a hfile only contains [0,50) we'll not incl

Re: Scanning over key values > timestamp?

2011-02-18 Thread Jason Rutherglen
Thanks Ted! Is there some underlying efficiency to this, or will it be scanning all of the rows underneath? On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu wrote: > From Scan.java: >  * To only retrieve columns within a specific range of version timestamps, >  * execute {@link #setTimeRange(long, long)

Re: HBase setup problem

2011-02-18 Thread Ted Dunning
You might do well to build a host file so that you can make the host names stable over time. On Fri, Feb 18, 2011 at 2:29 AM, kushum sharma wrote: > Hi, > I've deployed hbase on 5 nodes cluster of amazon ec2 successfully and was > working fine. > The next day when I logon I changed the configura

Re: Multiple versions scanning

2011-02-18 Thread Joseph Boyd
scan.setMaxVersions(Integer.MAX_VALUE); // or some other integer On Thu, Feb 17, 2011 at 9:28 PM, Subhash Bhushan wrote: > Hi, > > I am using a prefix mechanism for storing row keys and I have a separate > table to store my secondary index. > Attached is a snapshot of the schema. > > Whe

Re: Scanning over key values > timestamp?

2011-02-18 Thread Ted Yu
>From Scan.java: * To only retrieve columns within a specific range of version timestamps, * execute {@link #setTimeRange(long, long) setTimeRange}. On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > For search integration we need to, on server reboot scan

Scanning over key values > timestamp?

2011-02-18 Thread Jason Rutherglen
For search integration we need to, on server reboot scan over key values since the last Lucene commit, and add them to the index. Is there an efficient way to do this?

HBase setup problem

2011-02-18 Thread kushum sharma
Hi, I've deployed hbase on 5 nodes cluster of amazon ec2 successfully and was working fine. The next day when I logon I changed the configuration file(regionservers list,slaves, masters,dfs rootdir etc) in hadoop and hbase /conf directory for the new IP address which is dynamic for ec2 then the hba

Not running balancer because processing dead regionserver(s)

2011-02-18 Thread Yi Liang
Hi all, We have a hbase cluster with 10 region servers running HBase 0.90.0 + CDH3. We're now importing big data into HBase. During the process, 2 servers crashed, but after restaring them, they're no longer assigned with any region, while regions on other servers keep splitting when more data in

Not running balancer because processing dead regionserver(s)

2011-02-18 Thread Yi Liang
Hi all, We have a hbase cluster with 10 region servers running HBase 0.90.0 + CDH3. We're now importing big data into HBase. During the process, 2 servers crashed, but after restaring them, they're no longer assigned with any region, while regions on other servers keep splitting when more data in

Tall versus wide tables in Hbase

2011-02-18 Thread Usman Waheed
Hi, I would like to setup an Hbase table that would provide users the ability to perform selects only (get and scans). We don't have a need for users to perform inserts or updates at the moment. But yes i will have to load/insert the data into the tables before users can perform selects.