Re: Compaction problem

2013-03-25 Thread ramkrishna vasudevan
Wha is the rate at which you are flushing? Frequent flushes will cause more files and compaction may happen frequently but with lesser time. If the flush size is increased to a bigger value then you will end up more time in the compaction because the entire file has to be read and rewritten. After

Re: Compaction problem

2013-03-25 Thread tarang dawer
Hi i tried the following parameters also export HBASE_REGIONSERVER_OPTS="-Xmx2g -Xms2g -Xmn256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" hbase.regionserv

RE: Getting less write throughput due to more number of columns

2013-03-25 Thread Anoop Sam John
When the number of columns (qualifiers) are more yes it can impact the performance. In HBase every where the storage will be in terms of KVs. The key will be some thing like rowkey+cfname+columnname+TS... So when u have 26 cells in a put then there will be repetition of many bytes in the key.(O

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread ramkrishna vasudevan
Hi Pankaj Is it possible for you to profile the RS when this happens? Either may be like the Thrift adds an overhead or it should be some where the code is spending more time. As you said there may be a slight decrease in performance of the put because now more values has to go in but should not

Crash when run two jobs at the same time with same Hbase table

2013-03-25 Thread GuoWei
Dear, When I run two MR Jobs which will read same Hbase table and write to another same Hbase table at the same time. The result is one job successful finished. And another job crashed. And The following shows the error log. Please help me to find out why ? <2013-03-25 15:50:34,026> - map

答复: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread 谢良
Maybe we should adopt some ideas from RDBMS ? In MySQL area: Innodb storage engine has a buffer pool(just like current block cache), caches both compressed and uncompressed pages in latest innodb version, it brings about adaptive LRU algorithm, see http://dev.mysql.com/doc/innodb/1.1/en/innodb-co

Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Ted Yu
Looks like this is what you were looking for: HBASE-4063 Improve TableInputFormat to allow application to configure the number of mappers Cheers On Mon, Mar 25, 2013 at 7:33 PM, Lu, Wei wrote: > Hi, Michael, > > Yes, I read some stuff in blogs and I did pre-split + large max region > file size

RE: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Lu, Wei
Hi, Michael, Yes, I read some stuff in blogs and I did pre-split + large max region file size to avoid on line split. Also set region size large to reduce region server heap size, so I don't what to manually split. Let me make it clear. The problem I faced is to spawn more than one map task fo

Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
HBASE-8195 has been opened. But I don't think this is related to the issue Jian is facing ;) 2013/3/25 Ted Yu : > I agree. > > Log a JIRA ? > > On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> I think this should be removed from there since it has been

Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
I agree. Log a JIRA ? On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > I think this should be removed from there since it has been removed > from the code. > > 2013/3/25 Ted Yu : > > I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.

Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
I think this should be removed from there since it has been removed from the code. 2013/3/25 Ted Yu : > I found the entry in src/main/resources/hbase-default.xml of 0.94 branch. > > On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> It's not even in 0.95

Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
I found the entry in src/main/resources/hbase-default.xml of 0.94 branch. On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > It's not even in 0.95 neither in trunk... > > It was supposed to be added by HBASE-5547... > > But now, it's hardcoded. > > In HConsta

Re: hbase/.archive doesn't exist

2013-03-25 Thread Jean-Marc Spaggiari
It's not even in 0.95 neither in trunk... It was supposed to be added by HBASE-5547... But now, it's hardcoded. In HConstant, you have public static final String HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in HFileArchiveUtil. 5547 was supposed to make that configurable by reading this p

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Jean-Marc Spaggiari
For a total of 1.5kb with 4 columns = 384 bytes/column bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100 -num_keys 100 13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664, cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms] Current: [keys/s=4097,

Re: java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Enis Söztutar
Hi, >From the logs, it seems you are running into the same problem I have reported last week: https://issues.apache.org/jira/browse/HBASE-8143 There are some mitigation strategies outlined in that jira. It would be good if you can confirm: - How many regions in the region server - How many open

Re: Incorrect Root region server

2013-03-25 Thread Ted Yu
What version of HBase are you using ? Did table truncation report any problem ? Are the truncated tables usable ? Cheers On Mon, Mar 25, 2013 at 3:56 PM, Mohit Anchlia wrote: > I am seeing a wierd issue where zk is going to "primarymaster" (hostname) > as a ROOT region. This host doesn't exist.

Re: hbase/.archive doesn't exist

2013-03-25 Thread Ted Yu
Do you use table archiving / snapshotting ? If not, the following DEBUG output should not be a concern. On a related note, I was searching for where the following config is used in 0.94 source code but was unable to find any: hbase.table.archive.directory .archive Per-table directo

Incorrect Root region server

2013-03-25 Thread Mohit Anchlia
I am seeing a wierd issue where zk is going to "primarymaster" (hostname) as a ROOT region. This host doesn't exist. Everything was working ok until I ran truncate on few tables. Does anyone know what might be the issue?

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Yes Ted, we have been observing Thrift API to clearly outperform Java native Hbase API, due to binary communication protocol, at higher loads. Tariq, the specs of the machine on which we are performing these tests are as given below. Processor : i3770K, 8 logical cores (4 physical, with 2 logic

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Enis Söztutar
> With very large heaps and a GC that can handle them (perhaps the G1 GC), another option which might be worth experimenting with is a value (KV) cache independent of the block cache which could be enabled on a per-table basis Thanks Andy for bringing this up. We've had some discussions some time a

hbase/.archive doesn't exist

2013-03-25 Thread Jian Fang
Hi, I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps printing the following message: 2013-03-25 19:32:15,469 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase

RE: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Liyin Tang
Hi Enis, Good ideas ! And hbase community is driving on these 2 items. 1) [HBASE-7404]: L1/L2 block cache 2) [HBASE-5263] Preserving cached data on compactions through cache-on-write Thanks a lot Liyin From: Enis Söztutar [enis@gmail.com] Sent: Monday,

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
bq. These records are being written using batch mutation with thrift API This is an important information, I think. Batch mutation through Java API would incur lower overhead. On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra wrote: > Firstly, Thanks a lot Jean and Ted for your extended help, very

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Mohammad Tariq
Hello Pankaj, What is the configuration which you are using?Also, the H/W specs? Maybe tuning some of these would make things faster. Although amount of data being inserted is small, the amount of metadata being generated would be higher. Now, you have to generate the key+qualifier+TS triplet

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Andrew Purtell
> With very large heaps, maybe keeping around the compressed blocks in a secondary cache makes sense? That's an interesting idea. > A compaction will trash the cache. But maybe we can track keyvalues (inside cached blocks are cached) for the files in the compaction, and mark the blocks of the res

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Firstly, Thanks a lot Jean and Ted for your extended help, very much appreciate it. Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is distributed across these columns. Jean, some columns are storing as small as a single byte value, while few of the columns are storing as

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Jean-Marc Spaggiari
I just ran some LoadTest to see if I can reproduce that. bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100 -num_keys 100 13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172, cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms] Current: [keys/s=441

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
Final clarification: bq. I am writing 1.5 KB of data per row across 40 columns. So your schema is not sparse - you were writing to (all) 40 columns in second case. Thanks On Mon, Mar 25, 2013 at 11:03 AM, Pankaj Misra wrote: > Yes Ted, you are right, we are having table regions pre-split, and w

Re: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Enis Söztutar
Thanks Liyin for sharing your use cases. Related to those, I was thinking of two improvements: - AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs in its block cache, failing over the compressed one if decompressed one gets evicted. With very large heaps, maybe keeping arou

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Yes Ted, you are right, we are having table regions pre-split, and we see that both regions are almost evenly filled in both the tests. This does not seem to be a regression though, since we were getting good write rates when we had lesser number of columns. Thanks and Regards Pankaj Misra __

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
Copying Ankit who raised the same question soon after Pankaj's initial question. On one hand I wonder if this was a regression in 0.94.5 (though unlikely). Did the region servers receive (relatively) same write load for the second test case ? I assume you have pre-split your tables in both cases.

RE: HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Hi Ted, Sorry for missing that detail, we are using HBase version 0.94.5 Regards Pankaj Misra From: Ted Yu [yuzhih...@gmail.com] Sent: Monday, March 25, 2013 10:29 PM To: user@hbase.apache.org Subject: Re: HBase Writes With Large Number of Columns If yo

Getting less write throughput due to more number of columns

2013-03-25 Thread Ankit Jain
Hi All, I am writing a records into HBase. I ran the performance test on following two cases: Set1: Input record contains 26 columns and record size is 2Kb. Set2: Input record contain 1 column and record size is 2Kb. In second case I am getting 8MBps more performance than step. are the large n

Re: HBase Writes With Large Number of Columns

2013-03-25 Thread Ted Yu
If you give us the version of HBase you're using, that would give us some more information to help you. Cheers On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra wrote: > Hi, > > The issue that I am facing is around the performance drop of Hbase, when I > was having 20 columns in a column family Vs n

HBase Writes With Large Number of Columns

2013-03-25 Thread Pankaj Misra
Hi, The issue that I am facing is around the performance drop of Hbase, when I was having 20 columns in a column family Vs now when I am having 40 columns in a column family. The number of columns have doubled and the ingestion/write speed has also dropped by half. I am writing 1.5 KB of data p

Re: HBase M/R with M/R and HBase not on same cluster

2013-03-25 Thread Michael Segel
Just out of curiosity... Why do you want to run the job on Cluster A that reads from Cluster B but writes to Cluster A? Wouldn't it be easier to run the job on Cluster B and inside the Mapper.setup() you create your own configuration for your second cluster for output? On Mar 24, 2013, at 7

Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Michael Segel
I think the problem is that Wei has been reading some stuff in blogs and that's why he has such a large region size to start with. So if he manually splits the logs, drops the region size to something more appropriate... Or if he unloads the table, drops the table, recreates the table with a

Re: java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Ted Yu
What version of HBase are you using ? Did you enable short circuit read in hadoop ? Thanks On Mon, Mar 25, 2013 at 4:52 AM, Dhanasekaran Anbalagan wrote: > Hi Guys, > > I have problem with Hbase server it's says java.lang.OutOfMemoryError: > Direct buffer memory > I new to Hbase How to solve th

RE: Does HBase RegionServer benefit from OS Page Cache

2013-03-25 Thread Liyin Tang
Block cache is for uncompressed data while OS page contains the compressed data. Unless the request pattern is full-table sequential scan, the block cache is still quite useful. I think the size of the block cache should be the amont of hot data we want to retain within a compaction cycle, which

java.lang.OutOfMemoryError: Direct buffer memory

2013-03-25 Thread Dhanasekaran Anbalagan
Hi Guys, I have problem with Hbase server it's says java.lang.OutOfMemoryError: Direct buffer memory I new to Hbase How to solve this issue. This is my stake trace http://paste.ubuntu.com/5646088/ -Dhanasekaran. Did I learn something today? If not, I wasted it.

Re: hbase increments and hadoop attempts

2013-03-25 Thread Bryan Beaudreault
Increments are not idempotent, so yes you will double increment the set of increments that succeeded in the first attempt(s). If you care about that you're better off not using the Increment interface and instead having 2 jobs: one that does a Get of the current value and adds the offset then pass

hbase increments and hadoop attempts

2013-03-25 Thread prakash kadel
hi everyone, when i launch my mapreduce jobs to increment counters in hbase i sometimes have maps with multiple attempts like: attempt_201303251722_0161_m_74_0 attempt_201303251722_0161_m_74_1 if there are multiple attempts running and if the first one gets completed successful,

Re: Compaction timing and recovery from failure

2013-03-25 Thread ramkrishna vasudevan
My question is this. If a compaction fails due to a regionserver loss mid-compaction, does the regionserver that picks up the region continue where the first left off? Or does it have to start from scratch? -> The answer to this is, it works from the beginning again. Regards Ram On Mon, Mar 25,

Compaction timing and recovery from failure

2013-03-25 Thread Brennon Church
Everyone, I recently had a couple compactions, minors that were promoted to majors, take 8 and 10 minutes each. I eventually killed the regionserver underneath them as I'd never seen compactions last that long before. In looking through the logs from the regionserver that was killed and wat

Re: ‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Jean-Marc Spaggiari
Hi Wei, Have you looked at MAX_FILESIZE? If your table is 1TB size, and you have 10 RS and want 12 regions per server, you can setup this to 1TB/(10x12) and you will get at least all those regions (and even a bit more). JM 2013/3/25 Lu, Wei : > We are facing big region size but small region num

Re: Evenly splitting the table

2013-03-25 Thread Michael Segel
@Aaron, You said you're using a salt, which would imply that your number is random and not derived from the base key. (Where base key is the key prior to being hashed. ) Is that the case, or do you mean that Kiji is taking the first two bytes of the hash and prepending it to the key? On M

‘split’ start/stop key range of large table regions for more map tasks

2013-03-25 Thread Lu, Wei
We are facing big region size but small region number of a table. 10 region servers, each has only one region with size over 10G, map slot of each task tracker is 12. We are planning to ‘split’ start/stop key range of large table regions for more map tasks, so that we can better make usage of m

region servers becomes dead after sometime below is log of zookeeper

2013-03-25 Thread gaurhari dass
client /107.108.188.11:38371 2013-03-25 12:09:55,213 INFO org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection from /107.108.188.11:38373 2013-03-25 12:09:55,214 INFO org.apache.zookeeper.server.ZooKeeperServer: Client attempting to establish new session at /107.108.188.11:3