Re: Performance test results

2011-05-09 Thread Eran Kutner
I tried flushing the table, not a specific region. -eran On Mon, May 9, 2011 at 20:03, Stack wrote: > On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote: > > OK, I tried it, truncated the table and ran inserts for about a day. Now > I > > tried flushing the table but I get a "Region is not on

Re: Performance test results

2011-05-09 Thread Stack
On Mon, May 9, 2011 at 9:31 AM, Eran Kutner wrote: > OK, I tried it, truncated the table and ran inserts for about a day. Now I > tried flushing the table but I get a "Region is not online" error, although > all the servers are up, no regions are in transition and as far as I can > tell all the re

Re: Performance test results

2011-05-09 Thread Eran Kutner
OK, I tried it, truncated the table and ran inserts for about a day. Now I tried flushing the table but I get a "Region is not online" error, although all the servers are up, no regions are in transition and as far as I can tell all the regions seem up. I can even read rows which are supposedly in

Re: Performance test results

2011-05-04 Thread Eran Kutner
J-D, I'll try what you suggest but it is worth pointing out that my data set has over 300M rows, however in my read test I am random reading out of a subset that contains only 0.5M rows (5000 rows in each of the 100 key ranges in the table). -eran On Tue, May 3, 2011 at 23:29, Jean-Daniel Cryan

Re: Performance test results

2011-05-03 Thread Jean-Daniel Cryans
On Tue, May 3, 2011 at 6:20 AM, Eran Kutner wrote: > Flushing, at least when I try it now, long after I stopped writing, doesn't > seem to have any effect. Bummer. > > In my log I see this: > 2011-05-03 08:57:55,384 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB

Re: Performance test results

2011-05-03 Thread Eran Kutner
Flushing, at least when I try it now, long after I stopped writing, doesn't seem to have any effect. In my log I see this: 2011-05-03 08:57:55,384 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=3.39 GB, free=897.87 MB, max=4.27 GB, blocks=54637, accesses=89411811, hits=7576

Re: Performance test results

2011-05-02 Thread Jean-Daniel Cryans
It might be the slow memstore issue... after inserting your dataset issue a flush on your table in the shell, wait a few seconds, then start reading. Someone else on the mailing list recently saw this type of issue. Regarding the block caching logging, here's what I see in my logs: 2011-05-02 10:

Re: Performance test results

2011-04-27 Thread Eran Kutner
Since the attachment didn't make it, here it is again: http://shortText.com/jp73moaesx -eran On Wed, Apr 27, 2011 at 16:51, Eran Kutner wrote: > Hi Josh, > > The connection pooling code is attached AS IS (with all the usual legal > disclaimers), note that you will have to modify it a bit to ge

Re: Performance test results

2011-04-27 Thread Eran Kutner
Hi Josh, The connection pooling code is attached AS IS (with all the usual legal disclaimers), note that you will have to modify it a bit to get it to compile because it depends on some internal libraries we use. In particular, DynamicAppSettings and Log are two internal classes that do what their

Re: Performance test results

2011-04-27 Thread Eran Kutner
I must say the more I play with it the more baffled I am with the results. I ran the read test again today after not touching the cluster for a couple of days and now I'm getting the same high read numbers (10-11K reads/sec per server with some server reaching even 15K r/s) if I read 1, 10, 100 or

Re: Performance test results

2011-04-26 Thread Josh
On Tue, Apr 26, 2011 at 3:34 AM, Eran Kutner wrote: > Hi J-D, > I don't think it's a Thrift issue. First, I use the TBufferedTransport > transport, second, I implemented my own connection pool so the same > connections are reused over and over again, Hey! I'm using C#->Hbase and high on my list

Re: Performance test results

2011-04-26 Thread Stack
On Thu, Apr 21, 2011 at 5:13 AM, Eran Kutner wrote: > I tested again on a clean table using 100 insert threads each, using a > separate keyspace within the test table. Every row had just one column > with 128 bytes of data. > > With one server and one region I got about 2300 inserts per second. >

Re: Performance test results

2011-04-26 Thread Eran Kutner
Hi J-D, I don't think it's a Thrift issue. First, I use the TBufferedTransport transport, second, I implemented my own connection pool so the same connections are reused over and over again, so there is no overhead for opening and closing connections (I've verified that using Wireshark), third, if

Re: Performance test results

2011-04-21 Thread Jean-Daniel Cryans
Hey Eran, Glad you could go back to debugging performance :) The scalability issues you are seeing are unknown to me, it sounds like the client isn't pushing it enough. It reminded me of when we switched to using the native Thrift PHP extension instead of the "normal" one and we saw huge speedups

Re: Performance test results

2011-04-21 Thread Eran Kutner
Hi J-D, After stabilizing the configuration, with your great help, I was able to go back to the the load tests. I tried using IRC, as you suggested, to continue this discussion but because of the time difference (I'm GMT+3) it is quite difficult to find a time when people are present and I am avail

Re: Performance test results

2011-03-31 Thread Jean-Daniel Cryans
Inline. J-D > I assume the block cache tunning key you talk about is > "hfile.block.cache.size", right? If it is only 20% by default than > what is the rest of the heap used for? Since there are no fancy > operations like joins and since I'm not using memory tables the only > thing I can think of

Re: Performance test results

2011-03-31 Thread Eran Kutner
I assume the block cache tunning key you talk about is "hfile.block.cache.size", right? If it is only 20% by default than what is the rest of the heap used for? Since there are no fancy operations like joins and since I'm not using memory tables the only thing I can think of is the memstore right?

Re: Performance test results

2011-03-29 Thread Jean-Daniel Cryans
Inline. J-D > Hi J-D, > I can't paste the entire file because it's 126K. Trying to attach it > now as zip, lets see if that has more luck. In the jstack you posted, all the Gets were hitting HDFS which is probably why it's slow. Until you can get something like HDFS-347 in your Hadoop you'll hav

Re: Performance test results

2011-03-29 Thread Ted Dunning
Watch out when pre-splitting. Your key distribution may not be as uniform as you might think. This particularly happens when keys are represented in some printable form. Base 64, for instance only populates a small fraction of the base 256 key space. On Tue, Mar 29, 2011 at 10:54 AM, Jean-Danie

Re: Performance test results

2011-03-29 Thread Jean-Daniel Cryans
Hey Eran, Usually this mailing list doesn't accept attachements (or it works for voodoo reasons) so you'd be better off pastebin'ing them. Some thoughts: - Inserting into a new table without pre-splitting it is bound to be a red herring of bad performance. Please pre-split it with methods such a

Re: Performance test results

2011-03-29 Thread Eran Kutner
Running the client on more than one server doesn't change the overall results, the total number of requests just get distributed across the two clients. I tried two things, inserting rows with one column each and inserting rows with 100 columns each, in both cases the data was 1K per column, so it

Re: Performance test results

2011-03-28 Thread Ted Dunning
This does sound pretty slow. Using YCSB, I have seen insert rates of about 10,000 x 1kB records per second with two datanodes and one namenode using Hbase over HDFS. That isn't using thrift, though. On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner wrote: > I started with a basic insert operation.

Re: Performance test results

2011-03-28 Thread Stack
On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner wrote: > I started with a basic insert operation. Inserting rows with one > column with 1KB of data each. > Initially, when the table was empty I was getting around 300 inserts > per second with 50 writing threads. Then, when the region split and a > se

Performance test results

2011-03-28 Thread Eran Kutner
Hi, I'm running some performance tests on a cluster with 5 member servers (not counting the masters of all kinds), each node running a data node, a region server and a thrift server. Each server has 2 quad core CPUs and 16GB of RAM. The data set I'm using is built of 50 sets of consecutive keys wit