I tried flushing the table, not a specific region.
-eran
On Mon, May 9, 2011 at 20:03, Stack wrote:
> On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote:
> > OK, I tried it, truncated the table and ran inserts for about a day. Now
> I
> > tried flushing the table but I get a "Region is not on
On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote:
> OK, I tried it, truncated the table and ran inserts for about a day. Now I
> tried flushing the table but I get a "Region is not online" error, although
> all the servers are up, no regions are in transition and as far as I can
> tell all the re
OK, I tried it, truncated the table and ran inserts for about a day. Now I
tried flushing the table but I get a "Region is not online" error, although
all the servers are up, no regions are in transition and as far as I can
tell all the regions seem up. I can even read rows which are supposedly in
J-D,
I'll try what you suggest but it is worth pointing out that my data set has
over 300M rows, however in my read test I am random reading out of a subset
that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the
table).
-eran
On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryan
On Tue, May 3, 2011 at 6:20 AM, Eran Kutner wrote:
> Flushing, at least when I try it now, long after I stopped writing, doesn't
> seem to have any effect.
Bummer.
>
> In my log I see this:
> 2011-05-03 08:57:55,384 DEBUG
> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB
Flushing, at least when I try it now, long after I stopped writing, doesn't
seem to have any effect.
In my log I see this:
2011-05-03 08:57:55,384 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB,
free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, hits=7576
It might be the slow memstore issue... after inserting your dataset
issue a flush on your table in the shell, wait a few seconds, then
start reading. Someone else on the mailing list recently saw this type
of issue.
Regarding the block caching logging, here's what I see in my logs:
2011-05-02 10:
Since the attachment didn't make it, here it is again:
http://shortText.com/jp73moaesx
-eran
On Wed, Apr 27, 2011 at 16:51, Eran Kutner wrote:
> Hi Josh,
>
> The connection pooling code is attached AS IS (with all the usual legal
> disclaimers), note that you will have to modify it a bit to ge
Hi Josh,
The connection pooling code is attached AS IS (with all the usual legal
disclaimers), note that you will have to modify it a bit to get it to
compile because it depends on some internal libraries we use. In particular,
DynamicAppSettings and Log are two internal classes that do what their
I must say the more I play with it the more baffled I am with the
results. I ran the read test again today after not touching the
cluster for a couple of days and now I'm getting the same high read
numbers (10-11K reads/sec per server with some server reaching even
15K r/s) if I read 1, 10, 100 or
On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner wrote:
> Hi J-D,
> I don't think it's a Thrift issue. First, I use the TBufferedTransport
> transport, second, I implemented my own connection pool so the same
> connections are reused over and over again,
Hey! I'm using C#->Hbase and high on my list
On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner wrote:
> I tested again on a clean table using 100 insert threads each, using a
> separate keyspace within the test table. Every row had just one column
> with 128 bytes of data.
>
> With one server and one region I got about 2300 inserts per second.
>
Hi J-D,
I don't think it's a Thrift issue. First, I use the TBufferedTransport
transport, second, I implemented my own connection pool so the same
connections are reused over and over again, so there is no overhead
for opening and closing connections (I've verified that using
Wireshark), third, if
Hey Eran,
Glad you could go back to debugging performance :)
The scalability issues you are seeing are unknown to me, it sounds
like the client isn't pushing it enough. It reminded me of when we
switched to using the native Thrift PHP extension instead of the
"normal" one and we saw huge speedups
Hi J-D,
After stabilizing the configuration, with your great help, I was able
to go back to the the load tests. I tried using IRC, as you suggested,
to continue this discussion but because of the time difference (I'm
GMT+3) it is quite difficult to find a time when people are present
and I am avail
Inline.
J-D
> I assume the block cache tunning key you talk about is
> "hfile.block.cache.size", right? If it is only 20% by default than
> what is the rest of the heap used for? Since there are no fancy
> operations like joins and since I'm not using memory tables the only
> thing I can think of
I assume the block cache tunning key you talk about is
"hfile.block.cache.size", right? If it is only 20% by default than
what is the rest of the heap used for? Since there are no fancy
operations like joins and since I'm not using memory tables the only
thing I can think of is the memstore right?
Inline.
J-D
> Hi J-D,
> I can't paste the entire file because it's 126K. Trying to attach it
> now as zip, lets see if that has more luck.
In the jstack you posted, all the Gets were hitting HDFS which is
probably why it's slow. Until you can get something like HDFS-347 in
your Hadoop you'll hav
Watch out when pre-splitting. Your key distribution may not be as uniform
as you might think. This particularly happens when keys are represented in
some printable form. Base 64, for instance only populates a small fraction
of the base 256 key space.
On Tue, Mar 29, 2011 at 10:54 AM, Jean-Danie
Hey Eran,
Usually this mailing list doesn't accept attachements (or it works for
voodoo reasons) so you'd be better off pastebin'ing them.
Some thoughts:
- Inserting into a new table without pre-splitting it is bound to be a
red herring of bad performance. Please pre-split it with methods such
a
Running the client on more than one server doesn't change the overall
results, the total number of requests just get distributed across the
two clients.
I tried two things, inserting rows with one column each and inserting
rows with 100 columns each, in both cases the data was 1K per column,
so it
This does sound pretty slow.
Using YCSB, I have seen insert rates of about 10,000 x 1kB records per
second with two
datanodes and one namenode using Hbase over HDFS. That isn't using thrift,
though.
On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner wrote:
> I started with a basic insert operation.
On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner wrote:
> I started with a basic insert operation. Inserting rows with one
> column with 1KB of data each.
> Initially, when the table was empty I was getting around 300 inserts
> per second with 50 writing threads. Then, when the region split and a
> se
Hi,
I'm running some performance tests on a cluster with 5 member servers
(not counting the masters of all kinds), each node running a data
node, a region server and a thrift server.
Each server has 2 quad core CPUs and 16GB of RAM. The data set I'm
using is built of 50 sets of consecutive keys wit
24 matches
Mail list logo