Re: Suspected memory leak

2011-12-01 Thread bijieshan
Thank you all. I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. And we have known the content of the problem memory section, all the records contains the info li

Re: Problem in configuring pseudo distributed mode

2011-12-01 Thread Mohammad Tariq
Hello Christopher, I don't have 127.0.1.1 in my hosts file. Regards,     Mohammad Tariq On Thu, Dec 1, 2011 at 6:42 PM, Christopher Dorner wrote: > Your hosts file (/etc/hosts) should contain only sth like > 127.0.0.1 localhost > Or > 127.0.0.1 > > It should not contain sth like > 127.0.

OLAP-ish incremental BI capabilities for hbase, looking for collaborators.

2011-12-01 Thread Dmitriy Lyubimov
Hello, We (Inadco) are looking for users and developers to engage in our open project code named, for the lack of better name, "hbase-lattice" in order to mutually benefit and eventually develop a mature Hbase-based BI real time OLAP-ish solution. The basic premise is to use Cuboid Lattice -like

Re: Atomicity questions

2011-12-01 Thread lars hofhansl
ZK is mostly for orchestrating between the master and regionservers. - Original Message - From: Mohit Anchlia To: user@hbase.apache.org; lars hofhansl Cc: Sent: Thursday, December 1, 2011 3:57 PM Subject: Re: Atomicity questions Thanks that makes it more clear. I also looked at mvcc

Re: Atomicity questions

2011-12-01 Thread Mohit Anchlia
Thanks that makes it more clear. I also looked at mvcc code as you pointed out. So I am wondering where ZK is used specifically. On Thu, Dec 1, 2011 at 3:37 PM, lars hofhansl wrote: > Nope, not using ZK, that would not scale down to the cell level. > You'll probably have to stare at the code in

Re: Atomicity questions

2011-12-01 Thread lars hofhansl
Nope, not using ZK, that would not scale down to the cell level. You'll probably have to stare at the code in MultiVersionConsistencyControlfor a while (I know I had to). The basic flow of a write operation is this: 1. lock the row 2. persist change to the write ahead log 3. get a "writenumber"

Re: regions and tables

2011-12-01 Thread Jean-Daniel Cryans
Excellent question. I would say that if you are planning to have thousands of tables with the same schema then instead you should use one table with prefixed rows. The 20 regions / region server is a general guideline that works best in the single tenant case, meaning that you have only 1 table a

Re: Atomicity questions

2011-12-01 Thread Stack
On Thu, Dec 1, 2011 at 3:03 PM, Mohit Anchlia wrote: > Thanks. I'll try and take a look, but I haven't worked with zookeeper > before. Does it use zookeeper for any of ACID functionality? > No. St.Ack

Re: hbase sandbox at ImageShack.

2011-12-01 Thread Stack
On Thu, Dec 1, 2011 at 2:34 PM, Jack Levin wrote: > Hello All.   I've setup an hbase (0.90.4) sandbox running on servers > where we have some excess capacity.  Feel free to play with it, e.g. > create tables, run load tests, benchmarks, essentially do whatever you > want, just don't put your produ

Re: Atomicity questions

2011-12-01 Thread Mohit Anchlia
Thanks. I'll try and take a look, but I haven't worked with zookeeper before. Does it use zookeeper for any of ACID functionality? On Thu, Dec 1, 2011 at 2:55 PM, lars hofhansl wrote: > Hi Mohit, > > the best way to study this is to look at MultiVersionConsistencyControl.java > (since you are as

Re: Atomicity questions

2011-12-01 Thread lars hofhansl
Hi Mohit, the best way to study this is to look at MultiVersionConsistencyControl.java (since you are asking how this handled internally). In a nutshell this ensures that read operations don't see writes that are not completed, by (1) defining a thread read point that is rolled forward only af

hbase sandbox at ImageShack.

2011-12-01 Thread Jack Levin
Hello All. I've setup an hbase (0.90.4) sandbox running on servers where we have some excess capacity. Feel free to play with it, e.g. create tables, run load tests, benchmarks, essentially do whatever you want, just don't put your production services there, because while we do have it up due to

Re: Suspected memory leak

2011-12-01 Thread Kihwal Lee
Adding to the excellent write-up by Jonathan: Since finalizer is involved, it takes two GC cycles to collect them. Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details. Koji trie

Atomicity questions

2011-12-01 Thread Mohit Anchlia
I have some questions about ACID after reading this page, http://hbase.apache.org/acid-semantics.html - Atomicity point 5 : row must either be "a=1,b=1,c=1" or "a=2,b=2,c=2" and must not be something like "a=1,b=2,c=1". How is this internally handled in hbase such that above is possible?

Re: regions and tables

2011-12-01 Thread Sam Seigal
So is it fair to say that the number of tables one can create is also bounded by the number of regions that the cluster can support ? For example, given 5 region servers and keeping 20 regions / region server - with 5 tables, I am restricted to only being able to scale a single table to 20 region

Re: Scan Metrics in Ganglia

2011-12-01 Thread Doug Meil
This can be a bit tricky because of the scan caching, for example... http://hbase.apache.org/book.html#rs_metrics 12.4.2.14. hbase.regionserver.requests Total number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Sc

Re: Suspected memory leak

2011-12-01 Thread Stack
Make sure its not the issue that Jonathan Payne identifiied a while back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# St.Ack

Scan Metrics in Ganglia

2011-12-01 Thread sagar naik
Hi, I can see metrics for get calls (number of get , avg time for get) However, I could not do so for scan calls Please let me know how can I measure Thanks -Sagar

Re: Constant error when putting large data into HBase

2011-12-01 Thread Jean-Daniel Cryans
Here's my take on the issue. > I monitored the > process and when any node fails, it has not used all the heaps yet. > So it is not a heap space problem. I disagree. Unless you load a region server heap with more data than there's heap available (loading batches of humongous rows for example), it

Re: Strategies for aggregating data in a HBase table

2011-12-01 Thread Jean-Daniel Cryans
Or you could just prefix the row keys. Not sure if this is needed natively, or as a tool on top of HBase. Hive for example could do exactly that for you when Hive partitions are implemented for HBase. J-D On Wed, Nov 30, 2011 at 1:34 PM, Sam Seigal wrote: > What about "partitioning" at a table l

Re: hbase-regionserver1: bash: {HBASE_HOME}/bin/hbase-daemon.sh: No such file or directory

2011-12-01 Thread Jean-Daniel Cryans
So since I don't see the rest of the log I'll have to assume that the region server was never able to connect to the master. Connection refused could be a firewall, start the master and then try to telnet from the other machines to master:6. J-D On Thu, Dec 1, 2011 at 6:45 AM, Vamshi Krishna

RE: Suspected memory leak

2011-12-01 Thread Vladimir Rodionov
You can create several heap dumps of JVM process in question and compare heap allocations To create heap dump: jmap pid To analize: 1. jhat 2. visualvm 3. any commercial profiler One note: -Xmn12G ??? How long is your minor collections GC pauses? Best regards, Vladimir Rodionov Principal Platf

Re: Performance characteristics of scans using timestamp as the filter

2011-12-01 Thread Doug Meil
Scans work on startRow/stopRow... http://hbase.apache.org/book.html#scan ... you can also select by timestamp *within the startRow/stopRow selection*, but this isn't intended to quickly select rows by timestamp irrespective of their keys. On 12/1/11 9:03 AM, "Srikanth P. Shreenivas" wrote:

Re: hbase-regionserver1: bash: {HBASE_HOME}/bin/hbase-daemon.sh: No such file or directory

2011-12-01 Thread Vamshi Krishna
I found in the logs of region server machines, i found this error (on both regionserver machines) 2011-11-30 14:43:42,447 INFO org.apache.hadoop.ipc.HbaseRPC: Server at hbase-master/10.0.1.54:60020 could not be reached after 1 tries, giving up. *2011-11-30 14:44:37,762* WARN org.apache.hadoop.hbas

RE: Performance characteristics of scans using timestamp as the filter

2011-12-01 Thread Srikanth P. Shreenivas
So, will it be safe to assume that Scan queries with TimeRange will perform well and will read only necessary portions of the tables instead of doing full table scan? I have run into a situation, wherein I would like to find out all rows that got create/updated on during a time range. I was hop

Re: regions and tables

2011-12-01 Thread Doug Meil
To expand on what Lars said, there is an example of how this is layed out on disk... http://hbase.apache.org/book.html#trouble.namenode.disk ... regions distribute the table, so two different tables will be distributed by separate sets of regions. On 12/1/11 3:14 AM, "Lars George" wrote: >H

Re: Unable to create version file

2011-12-01 Thread Lars George
Could you please pastebin your Hadoop, HBase and ZooKeeper config files? Lars On Dec 1, 2011, at 11:23 AM, Mohammad Tariq wrote: > Today when I issued bin/start-hbase.sh I ran into the following error - > > Thu Dec 1 15:47:30 IST 2011 Starting master on ubuntu > ulimit -n 1024 > 2011-12-01 15

Re: Problem in configuring pseudo distributed mode

2011-12-01 Thread Christopher Dorner
Your hosts file (/etc/hosts) should contain only sth like 127.0.0.1 localhost Or 127.0.0.1 It should not contain sth like 127.0.1.1 localhost. And i think u need to reboot after changing it. Hope that helps. Regards, Christopher Am 01.12.2011 13:24 schrieb "Mohammad Tariq" : > Hello list, > >

Problem in configuring pseudo distributed mode

2011-12-01 Thread Mohammad Tariq
Hello list, Even after following the directions provided by you guys and Hbase book and several other blogs and posts I am not able to run Hbase in pseudo distributed mode.And I think there is some problem with the hosts file.I would highly appreciate if someone who has done it properly could

Unable to create version file

2011-12-01 Thread Mohammad Tariq
Today when I issued bin/start-hbase.sh I ran into the following error - Thu Dec 1 15:47:30 IST 2011 Starting master on ubuntu ulimit -n 1024 2011-12-01 15:47:31,158 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT

Re: Constant error when putting large data into HBase

2011-12-01 Thread Lars George
Hi Ed, You need to be more precise I am afraid. First of all what does "some node always dies" mean? Is the process gone? Which process is gone? And the "error" you pasted is a WARN level log that *might* indicate some trouble, but is *not* the reason the "node has died". Please elaborate. Also

Constant error when putting large data into HBase

2011-12-01 Thread edward choi
Hi, I've had a problem that has been killing for some days now. I am using CDH3 update2 version of Hadoop and Hbase. When I do a large amount of bulk loading into Hbase, some node always die. It's not just one particular node. But one of many nodes fail to serve eventually. I set 4 gigs of heap sp

Re: regions and tables

2011-12-01 Thread Lars George
Hi Sam, You need to handle them all separately. The note - I assume - was solely explaining the fact that the "load" of a region server is defined by the number of regions it hosts, not the number of tables. If you want to precreate the regions for one or more than one table is the same work: c