Wha is the rate at which you are flushing? Frequent flushes will cause more
files and compaction may happen frequently but with lesser time.
If the flush size is increased to a bigger value then you will end up more
time in the compaction because the entire file has to be read and rewritten.
After
Hi
i tried the following parameters also
export HBASE_REGIONSERVER_OPTS="-Xmx2g -Xms2g -Xmn256m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
hbase.regionserv
When the number of columns (qualifiers) are more yes it can impact the
performance. In HBase every where the storage will be in terms of KVs. The key
will be some thing like rowkey+cfname+columnname+TS...
So when u have 26 cells in a put then there will be repetition of many bytes in
the key.(O
Hi Pankaj
Is it possible for you to profile the RS when this happens? Either may be
like the Thrift adds an overhead or it should be some where the code is
spending more time.
As you said there may be a slight decrease in performance of the put
because now more values has to go in but should not
Dear,
When I run two MR Jobs which will read same Hbase table and write to another
same Hbase table at the same time. The result is one job successful finished.
And another job crashed. And The following shows the error log.
Please help me to find out why ?
<2013-03-25 15:50:34,026> - map
Maybe we should adopt some ideas from RDBMS ?
In MySQL area:
Innodb storage engine has a buffer pool(just like current block cache), caches
both
compressed and uncompressed pages in latest innodb version, it brings
about adaptive LRU algorithm, see
http://dev.mysql.com/doc/innodb/1.1/en/innodb-co
Looks like this is what you were looking for:
HBASE-4063 Improve TableInputFormat to allow application to configure the
number of mappers
Cheers
On Mon, Mar 25, 2013 at 7:33 PM, Lu, Wei wrote:
> Hi, Michael,
>
> Yes, I read some stuff in blogs and I did pre-split + large max region
> file size
Hi, Michael,
Yes, I read some stuff in blogs and I did pre-split + large max region file
size to avoid on line split. Also set region size large to reduce region server
heap size, so I don't what to manually split.
Let me make it clear. The problem I faced is to spawn more than one map task
fo
HBASE-8195 has been opened.
But I don't think this is related to the issue Jian is facing ;)
2013/3/25 Ted Yu :
> I agree.
>
> Log a JIRA ?
>
> On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> I think this should be removed from there since it has been
I agree.
Log a JIRA ?
On Mon, Mar 25, 2013 at 6:25 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> I think this should be removed from there since it has been removed
> from the code.
>
> 2013/3/25 Ted Yu :
> > I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
I think this should be removed from there since it has been removed
from the code.
2013/3/25 Ted Yu :
> I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
>
> On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
>> It's not even in 0.95
I found the entry in src/main/resources/hbase-default.xml of 0.94 branch.
On Mon, Mar 25, 2013 at 6:06 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:
> It's not even in 0.95 neither in trunk...
>
> It was supposed to be added by HBASE-5547...
>
> But now, it's hardcoded.
>
> In HConsta
It's not even in 0.95 neither in trunk...
It was supposed to be added by HBASE-5547...
But now, it's hardcoded.
In HConstant, you have public static final String
HFILE_ARCHIVE_DIRECTORY = ".archive"; and it's used in
HFileArchiveUtil. 5547 was supposed to make that configurable by
reading this p
For a total of 1.5kb with 4 columns = 384 bytes/column
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:384:100
-num_keys 100
13/03/25 14:54:45 INFO util.MultiThreadedAction: [W:100] Keys=991664,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4218, latency=23 ms]
Current: [keys/s=4097,
Hi,
>From the logs, it seems you are running into the same problem I have
reported last week: https://issues.apache.org/jira/browse/HBASE-8143
There are some mitigation strategies outlined in that jira. It would be
good if you can confirm:
- How many regions in the region server
- How many open
What version of HBase are you using ?
Did table truncation report any problem ?
Are the truncated tables usable ?
Cheers
On Mon, Mar 25, 2013 at 3:56 PM, Mohit Anchlia wrote:
> I am seeing a wierd issue where zk is going to "primarymaster" (hostname)
> as a ROOT region. This host doesn't exist.
Do you use table archiving / snapshotting ?
If not, the following DEBUG output should not be a concern.
On a related note, I was searching for where the following config is used
in 0.94 source code but was unable to find any:
hbase.table.archive.directory
.archive
Per-table directo
I am seeing a wierd issue where zk is going to "primarymaster" (hostname)
as a ROOT region. This host doesn't exist. Everything was working ok until
I ran truncate on few tables. Does anyone know what might be the issue?
Yes Ted, we have been observing Thrift API to clearly outperform Java native
Hbase API, due to binary communication protocol, at higher loads.
Tariq, the specs of the machine on which we are performing these tests are as
given below.
Processor : i3770K, 8 logical cores (4 physical, with 2 logic
> With very large heaps and a GC that can handle them (perhaps the G1 GC),
another option which might be worth experimenting with is a value (KV)
cache independent of the block cache which could be enabled on a per-table
basis
Thanks Andy for bringing this up. We've had some discussions some time a
Hi,
I am running HBase 0.94.6 with Hadoop 2.0.2-alpha. The log keeps printing
the following message:
2013-03-25 19:32:15,469 DEBUG
org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and
gc'd 0 unreferenced parent region(s)
2013-03-25 19:33:05,683 DEBUG org.apache.hadoop.hbase
Hi Enis,
Good ideas ! And hbase community is driving on these 2 items.
1) [HBASE-7404]: L1/L2 block cache
2) [HBASE-5263] Preserving cached data on compactions through cache-on-write
Thanks a lot
Liyin
From: Enis Söztutar [enis@gmail.com]
Sent: Monday,
bq. These records are being written using batch mutation with thrift API
This is an important information, I think.
Batch mutation through Java API would incur lower overhead.
On Mon, Mar 25, 2013 at 11:40 AM, Pankaj Misra
wrote:
> Firstly, Thanks a lot Jean and Ted for your extended help, very
Hello Pankaj,
What is the configuration which you are using?Also, the H/W specs?
Maybe tuning some of these would make things faster. Although amount
of data being inserted is small, the amount of metadata being generated
would be higher. Now, you have to generate the key+qualifier+TS triplet
> With very large heaps, maybe keeping around the compressed blocks in a
secondary cache makes sense?
That's an interesting idea.
> A compaction will trash the cache. But maybe we can track keyvalues (inside
cached blocks are cached) for the files in the compaction, and mark the
blocks of the res
Firstly, Thanks a lot Jean and Ted for your extended help, very much appreciate
it.
Yes Ted I am writing to all the 40 columns and 1.5 Kb of record data is
distributed across these columns.
Jean, some columns are storing as small as a single byte value, while few of
the columns are storing as
I just ran some LoadTest to see if I can reproduce that.
bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 4:512:100
-num_keys 100
13/03/25 14:18:25 INFO util.MultiThreadedAction: [W:100] Keys=997172,
cols=3,8m, time=00:03:55 Overall: [keys/s= 4242, latency=23 ms]
Current: [keys/s=441
Final clarification:
bq. I am writing 1.5 KB of data per row across 40 columns.
So your schema is not sparse - you were writing to (all) 40 columns in
second case.
Thanks
On Mon, Mar 25, 2013 at 11:03 AM, Pankaj Misra
wrote:
> Yes Ted, you are right, we are having table regions pre-split, and w
Thanks Liyin for sharing your use cases.
Related to those, I was thinking of two improvements:
- AFAIK, MySQL keeps the compressed and uncompressed versions of the blocs
in its block cache, failing over the compressed one if decompressed one
gets evicted. With very large heaps, maybe keeping arou
Yes Ted, you are right, we are having table regions pre-split, and we see that
both regions are almost evenly filled in both the tests.
This does not seem to be a regression though, since we were getting good write
rates when we had lesser number of columns.
Thanks and Regards
Pankaj Misra
__
Copying Ankit who raised the same question soon after Pankaj's initial
question.
On one hand I wonder if this was a regression in 0.94.5 (though unlikely).
Did the region servers receive (relatively) same write load for the second
test case ? I assume you have pre-split your tables in both cases.
Hi Ted,
Sorry for missing that detail, we are using HBase version 0.94.5
Regards
Pankaj Misra
From: Ted Yu [yuzhih...@gmail.com]
Sent: Monday, March 25, 2013 10:29 PM
To: user@hbase.apache.org
Subject: Re: HBase Writes With Large Number of Columns
If yo
Hi All,
I am writing a records into HBase. I ran the performance test on following
two cases:
Set1: Input record contains 26 columns and record size is 2Kb.
Set2: Input record contain 1 column and record size is 2Kb.
In second case I am getting 8MBps more performance than step.
are the large n
If you give us the version of HBase you're using, that would give us some
more information to help you.
Cheers
On Mon, Mar 25, 2013 at 9:55 AM, Pankaj Misra wrote:
> Hi,
>
> The issue that I am facing is around the performance drop of Hbase, when I
> was having 20 columns in a column family Vs n
Hi,
The issue that I am facing is around the performance drop of Hbase, when I was
having 20 columns in a column family Vs now when I am having 40 columns in a
column family. The number of columns have doubled and the ingestion/write speed
has also dropped by half. I am writing 1.5 KB of data p
Just out of curiosity...
Why do you want to run the job on Cluster A that reads from Cluster B but
writes to Cluster A?
Wouldn't it be easier to run the job on Cluster B and inside the Mapper.setup()
you create your own configuration for your second cluster for output?
On Mar 24, 2013, at 7
I think the problem is that Wei has been reading some stuff in blogs and that's
why he has such a large region size to start with.
So if he manually splits the logs, drops the region size to something more
appropriate...
Or if he unloads the table, drops the table, recreates the table with a
What version of HBase are you using ?
Did you enable short circuit read in hadoop ?
Thanks
On Mon, Mar 25, 2013 at 4:52 AM, Dhanasekaran Anbalagan
wrote:
> Hi Guys,
>
> I have problem with Hbase server it's says java.lang.OutOfMemoryError:
> Direct buffer memory
> I new to Hbase How to solve th
Block cache is for uncompressed data while OS page contains the compressed
data. Unless the request pattern is full-table sequential scan, the block cache
is still quite useful. I think the size of the block cache should be the amont
of hot data we want to retain within a compaction cycle, which
Hi Guys,
I have problem with Hbase server it's says java.lang.OutOfMemoryError:
Direct buffer memory
I new to Hbase How to solve this issue.
This is my stake trace
http://paste.ubuntu.com/5646088/
-Dhanasekaran.
Did I learn something today? If not, I wasted it.
Increments are not idempotent, so yes you will double increment the set of
increments that succeeded in the first attempt(s). If you care about that
you're better off not using the Increment interface and instead having 2
jobs: one that does a Get of the current value and adds the offset then
pass
hi everyone,
when i launch my mapreduce jobs to increment counters in hbase i
sometimes have maps
with multiple attempts like:
attempt_201303251722_0161_m_74_0
attempt_201303251722_0161_m_74_1
if there are multiple attempts running and if the first one gets
completed successful,
My question is this. If a compaction fails due to a regionserver loss
mid-compaction, does the regionserver that picks up the region continue
where the first left off? Or does it have to start from scratch?
-> The answer to this is, it works from the beginning again.
Regards
Ram
On Mon, Mar 25,
Everyone,
I recently had a couple compactions, minors that were promoted to
majors, take 8 and 10 minutes each. I eventually killed the
regionserver underneath them as I'd never seen compactions last that
long before. In looking through the logs from the regionserver that was
killed and wat
Hi Wei,
Have you looked at MAX_FILESIZE? If your table is 1TB size, and you
have 10 RS and want 12 regions per server, you can setup this to
1TB/(10x12) and you will get at least all those regions (and even a
bit more).
JM
2013/3/25 Lu, Wei :
> We are facing big region size but small region num
@Aaron,
You said you're using a salt, which would imply that your number is random and
not derived from the base key. (Where base key is the key prior to being
hashed. )
Is that the case, or do you mean that Kiji is taking the first two bytes of the
hash and prepending it to the key?
On M
We are facing big region size but small region number of a table. 10 region
servers, each has only one region with size over 10G, map slot of each task
tracker is 12. We are planning to ‘split’ start/stop key range of large table
regions for more map tasks, so that we can better make usage of m
client /107.108.188.11:38371
2013-03-25 12:09:55,213 INFO
org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket
connection from /107.108.188.11:38373
2013-03-25 12:09:55,214 INFO org.apache.zookeeper.server.ZooKeeperServer:
Client attempting to establish new session at /107.108.188.11:3
48 matches
Mail list logo