RE: more regionservers does not improve performance

2012-10-11 Thread Pankaj Misra
OK, Looks like I missed out reading that part in your original mail. Did you try some of the compaction tweaks and configurations as explained in the following link for your data? http://hbase.apache.org/book/regions.arch.html#compaction Also, how much data are your putting into the regions, an

RE: more regionservers does not improve performance

2012-10-11 Thread Jonathan Bishop
Pankaj, Thanks for the reply. Actually, I am using MD5 hashing to evenly spread the keys among the splits, so I don’t believe there is any hotspot. In fact, when I monitory the web UI for HBase I see a very even load on all the regionservers. Jon Sent from my Windows 8 PC

ANN: HBase 0.94.2 is available for download

2012-10-11 Thread lars hofhansl
The HBase Team is pleased to announce the release of HBase 0.94.2. Download it from your favorite Apache mirror [1]. HBase 0.94.2 is a bug fix release and has 117 issues resolved against it, including some performance improvements. 0.94.2 is the current stable release of HBase. All previous 0.92

答复: hmaster down cause by zookeeper?

2012-10-11 Thread 谢良
Hi Xiang, It's not the root cause, if you skim through sendBuffer impl in NIOServerCnxn.java, you'll find there's a catch statement finally to log all exception, no throw again. IMHO, the hbase master log file is the right place you need to dive:) 发件人:

答复: [Stand alone - distributed mode] HBase master isn't initializing completely

2012-10-11 Thread 谢良
Is there any WARNING/ERROR in HDFS logfile ? Please ensure zk&hdfs are in healthy status firstly Could you provide your hbse version, it'll be great:) 发件人: techbuddy [techbuddy...@gmail.com] 发送时间: 2012年10月12日 5:11 收件人: user@hbase.apache.org 主题: [Stand alone

RE: more regionservers does not improve performance

2012-10-11 Thread Pankaj Misra
Hi Jonathan, What seems to me is that, while doing the split across all 40 mappers, the keys are not randomized enough to leverage multiple regions and the pre-split strategy. This may be happening because all the 40 mappers may be trying to write onto a single region for sometime, making it a

Re: Force the number of map tasks in MR?

2012-10-11 Thread Jean-Marc Spaggiari
Hi Bryan, J-D replied in another thread. The issue was because of a misconfiguration on the mapred side. I was facing only the local job tracker.. that's why only 2 tasks was running at a time. I re-configured the cluster and it's now working very well. Next step is to build my own mapreduce for

Re: Force the number of map tasks in MR?

2012-10-11 Thread Bryan Beaudreault
JM, Are you trying to use HTableInputFormat to scan HBase from map reduce? If so, there is a map task per region so you should have 25 regions. If only 2 are running at once thats a problem with your hadoop setup. Is your job running in a pool with only 2 slots available? If not HTableInputFor

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
I saw you message a bit to late ;) I have connected to all the nodes one by one to restart the deamon. Now I can see that I have 6 nodes in the Hadoop Map/Reduce Administration page! I had one in the past, the master. I thought it was normal. After restarting I had 10 nodes. So I have also restar

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
On Thu, Oct 11, 2012 at 2:46 PM, Jean-Marc Spaggiari wrote: > And that on my master: > hadoop@node3:~$ /usr/local/jdk1.7.0_05/bin/jps > 2219 NameNode > 2630 Jps > 30362 JobTracker > 2652 DataNode > 30273 TaskTracker > 2392 SecondaryNameNode Ah it's like I thought. > > I will update all the mapre

[Stand alone - distributed mode] HBase master isn't initializing completely

2012-10-11 Thread techbuddy
Hi, I've a standalone Hbase cluster configured in a distributed mode (i.e. the ZK, Master and the RegServer all running in separate JVMs on the same host). The HBase master doesn't seem to be intializing successfully. This has started happening since I encountered the stop-hbase script going int

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
Ok. I see. I have that on all the nodes: mapred.job.tracker localhost:9001 dfs.datanode.max.xcievers 4096 And that on my master: hadoop@node3:~$ /usr/local/jdk1.7.0_05/bin/jps 2219 NameNode 2630 Jps 30362 JobTracker 2652 DataNode 3027

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
Ok so you actually have 12M rows. One thing that surprised me in your config is: > mapred.job.tracker > localhost:9001 Is it the same config on every node? If so, and your master node also counts as a slave node (region server, datanode, tasktracker), then you probably only real

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
Thanks for your support. Here is the pastbin: http://pastebin.com/VM41hK9X And here is the config file: hadoop@node3:~/hadoop-1.0.3$ cat conf/mapred-site.xml mapred.job.tracker localhost:9001 dfs.datanode.max.xcievers 4096 mapre

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
On Thu, Oct 11, 2012 at 1:53 PM, Jean-Marc Spaggiari wrote: > 2 tasks at the same time, for a total of 25 tasks at the end. This really sounds like the local job runner. > > Maybe as you are saying, I'm not facing the good jobtracker? I'm > running the command line on the master server. What I

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
2 tasks at the same time, for a total of 25 tasks at the end. Maybe as you are saying, I'm not facing the good jobtracker? I'm running the command line on the master server. If I look at the map tasks, I can see that: Input Split Locations /default-rack/node1 With differents values depending on

Re: Force the number of map tasks in MR?

2012-10-11 Thread Jonathan Bishop
JM, The number of map tasks will be limited by the number of input splits available. Assuming you are reading files, that is. Also, you need to reboot your cluster for those setting to take effect. Hope this helps, Jon Bishop On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari < jean-m...@spa

Re: Force the number of map tasks in MR?

2012-10-11 Thread Kevin O'dell
Lets combine this with JD's request and work off of that thread. Can we work off of that thread and follow up with LocalJobRunner question? On Thu, Oct 11, 2012 at 4:44 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > But this is the limit per tasktracker, right? > > And I have 6 node

Re: Force the number of map tasks in MR?

2012-10-11 Thread Jean-Marc Spaggiari
But this is the limit per tasktracker, right? And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12 tasks? Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ I just tried with the setting below (changing 2 by 6) but I'm getting the same result. JM 2012/10/11 Kevin O'd

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
2 tasks total or that are running at the same time? If latter, it just means that you are using the local job tracker instead of your job tracker because HBase couldn't find your MR config. J-D On Thu, Oct 11, 2012 at 1:36 PM, Jean-Marc Spaggiari wrote: > Hi J-D, > > I have about 20M rows over 2

Re: Force the number of map tasks in MR?

2012-10-11 Thread Kevin O'dell
J-M, It should be in the mapred-site.xml the values are mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum. This is the default in CDH4 mapreduce.tasktracker.map.tasks.maximum 2 The maximum number of map tasks that will be run simultaneously by a task tra

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
Hi J-D, I have about 20M rows over 25 regions on 6 nodes. So that mean I should see something like 6 tasks or even 25, right? And not just 2? Keys are 128 byte long. Value is 1 byte. I tried also to update mapreduce.tasktracker.map.tasks.maximum but this is "the number of map tasks that should be

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
On Thu, Oct 11, 2012 at 1:20 PM, Jean-Marc Spaggiari wrote: > I'm now using thsi command line and it's working fine (except for the > number of tasks). > HADOOP_CLASSPATH=`/home/hbase/hbase-0.94.0/bin/hbase > classpath`:`/home/hadoop/hadoop-1.0.3/bin/hadoop classpath` > /home/hadoop/hadoop-1.0.3/b

Re: Force the number of map tasks in MR?

2012-10-11 Thread Jean-Marc Spaggiari
I don't know. I did not touched that. Where can I found this information? 2012/10/11 Kevin O'dell : > What are you max tasks set to? > > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari < > jean-m...@spaggiari.org> wrote: > >> Hi, >> >> Is there a way to force the number of map tasks in a MR?

Re: Force the number of map tasks in MR?

2012-10-11 Thread Kevin O'dell
What are you max tasks set to? On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Hi, > > Is there a way to force the number of map tasks in a MR? > > I have a 25 regions table splitted over 6 nodes. But the MR is running > the tasks only 2 by 2. > > Is there

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
No, the line in the book is correct. I was trying to use another one command line (See in the first post). I think it was not working because of some _HOME issues which are deprecated. I'm now using thsi command line and it's working fine (except for the number of tasks). HADOOP_CLASSPATH=`/home/h

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Daniel Cryans
On Thu, Oct 11, 2012 at 1:09 PM, Stack wrote: > It doesn't work before the table name? Let us know J-M so we can > update "14.1.12. RowCounter" in the book. FWIW I did a test locally and it worked. That's also how I expect GenericOptionsParser to behave. J-D

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Stack
On Thu, Oct 11, 2012 at 10:43 AM, Jean-Marc Spaggiari wrote: > :( > > That's where the "-D" column name is coming :( > > I tried to move it to few places before and it was not working. That's > the only place where it's not crashing right from the launch. > > If you place it after the "rowcoun

Force the number of map tasks in MR?

2012-10-11 Thread Jean-Marc Spaggiari
Hi, Is there a way to force the number of map tasks in a MR? I have a 25 regions table splitted over 6 nodes. But the MR is running the tasks only 2 by 2. Is there a way to force it to run one task on each regionserver serving at least one region? Why is the MR waiting for 2 taskes to complete b

RE: connect to the region from coprocessor

2012-10-11 Thread Wei Tan
Thank you Anoop and that is a very helpful suggestion. Best Regards, Wei From: Anoop Sam John To: "user@hbase.apache.org" , "hbase-u...@hadoop.apache.org" , Date: 10/10/2012 05:22 AM Subject:RE: connect to the region from coprocessor Hi To your prePut() method you a

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
:( That's where the "-D" column name is coming :( I tried to move it to few places before and it was not working. That's the only place where it's not crashing right from the launch. If you place it after the "rowcounter", it takes it as the table name. If you place it before, it takes it as

Re: Temporal in Hbase?

2012-10-11 Thread Shumin Wu
Anoop and Ramkrishna, Your answers combined solved my problem! I tried the approach this morning. Without making my own customer filter, only 20+ LOC completed my mission! Thanks for your help! Anoop: "FYI a FilterList can contain another filter list So if you have a query like col1=? AND ( col2

Re: NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Kevin O'dell
Jean-Marc, If you remove your -D flag does you command run successfully? I always forget there this goes as well, but It should be one of the two: ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-0.94.1.jar -Dhbase.client.scanner.caching=10 rowcounter work_proposed ${HADOOP_HOME}/bin/hadoop

Re: MapReduce vs hosts (Cannot resolve the host name)

2012-10-11 Thread Jean-Marc Spaggiari
Hi St.Ack, Thanks for your reply. Finally, seems that this is not blocking the application. It's stated as "ERROR"s but it's still running after that. After some time, I'm getting some results. I don't have any DNS on my cluster. All servers have the same host file and static IPs. Is it mandator

NoSuchColumnFamilyException with rowcounter

2012-10-11 Thread Jean-Marc Spaggiari
Hi, When I'm trying to run RowCounter, I'm getting the error below., 12/10/11 13:09:58 INFO mapred.JobClient: Task Id : attempt_201209151131_0022_m_13_0, Status : FAILED org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyEx

Re: MapReduce vs hosts (Cannot resolve the host name)

2012-10-11 Thread Stack
On Thu, Oct 11, 2012 at 6:17 AM, Jean-Marc Spaggiari wrote: > Any idea where I can start to look at? > Sounds like DNS -- forward and reverse lookups -- work on the machine you are launching your job from but not out on your cluster. Check DNS on the cluster members? St.Ack

Re: HBase 0.92.1 questions...

2012-10-11 Thread Jean-Daniel Cryans
Inline. J-D On Tue, Oct 9, 2012 at 11:51 AM, Kevin Lyda wrote: > In reading the docs I learned that hdck in 0.92.2 has some additional > -fix* options, and -fixAssignments and -fixMeta seem like they might > fix this. I also got the impression that one could run the 0.92.2 > version of hdck on a

string versus binary row ids - Co Processor aggregation performance?

2012-10-11 Thread hbase user
We are in the midst of working on a complete overhaul from mysql to hbase. >From what I can read, it really does not matter if you use string increments versus a binary row id. I have been reading a lot about hotspots in the cluster, but I was hoping someone could shed some light on the do's and d

Re: HBase Key Design : Doubt

2012-10-11 Thread Jean-Marc Spaggiari
No, you're right. But if you just want to keep "500" as the value, you just have to set the number of version to 1 for your table... If you just want to keep 100, then you can insert with a revert timestamp, so the last cell inserted will be hidden by the previous one. JM 2012/10/11, Narayanan

Re: HBase Key Design : Doubt

2012-10-11 Thread Narayanan K
Hi, I have 2 column families A and B in table T1. put 'T1', 'R1', 'A:qualf1',100 put 'T1', R1', 'B:qualf2', 200 As per my understanding the above is one row and one single version each for the 2 column families. If I do a put 'T1', 'R1', 'A:qualf1', 500, then there is another version for the ro

MapReduce vs hosts (Cannot resolve the host name)

2012-10-11 Thread Jean-Marc Spaggiari
Hi, I'm facing a small issue, most probably configuration related, that I'm not able to solve. I'm trying to run the rowcounter. Here is the command line: export HADOOP_HOME=/home/hadoop/hadoop-1.0.3/; export HBASE_HOME=/home/hbase/hbase-0.94.0/; HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpa

RE: HBase table - distinct values

2012-10-11 Thread Nitin Pawar
You may try define a hive table with hbase storage handler n then query it ..though response time will be slow based on how much data you have On Oct 11, 2012 4:19 PM, wrote: > Hi Anoop, > Thanks a lot for your reply, > Actually our requirment is just to count the distinct deptno from emp >

RE: HBase table - distinct values

2012-10-11 Thread raviprasad . t
Hi Anoop, Thanks a lot for your reply, Actually our requirment is just to count the distinct deptno from emp ( Hbase table), We are running various pentaho jobs and we need to test the validity of the results, for that we need the below query. We need a query to select distinct deptno fro

Re: HBase table - distinct values

2012-10-11 Thread yutoo yanio
you can create a table with key=deptno or every value that you need distinct value of it. scan of this table shows the distinct values. On Thu, Oct 11, 2012 at 8:22 AM, Ramkrishna.S.Vasudevan < ramkrishna.vasude...@huawei.com> wrote: > Are you planning to use region splits? Can the rowkey have th

RE: key design

2012-10-11 Thread Anoop Sam John
>we just search a user-id over rang of "time stamp" In that case you can go with your 1st approach IMO "1.key=userid-timestamp and column:=content" >we have 200,000,000 user-id and i think user-id is good for lead position of >the key. is it ok? Yes it is... -Anoop- _

Re: key design

2012-10-11 Thread yutoo yanio
we have 200,000,000 user-id and i think user-id is good for lead position of the key. is it ok? what about search performance? which approach has better result? On Wed, Oct 10, 2012 at 11:21 PM, Shumin Wu wrote: > The Definitive Guide has a good discussion in Chapter 9 Tall-Narrow vs. > Flat-W

RE: How well does HBase run on low/medium memory/cpu clusters?

2012-10-11 Thread Anoop Sam John
>But perhaps I don't know enough. Is HBase typically CPU bound? Memory bound? Disk bound? I would say HBase (RegionServers) are more memory bound. -Anoop- From: David Parks [davidpark...@yahoo.com] Sent: Thursday, October 11, 2012 12:34 PM To: user@hbase.

RE: How well does HBase run on low/medium memory/cpu clusters?

2012-10-11 Thread David Parks
Ah, the question I have isn't about schema design. What exists as multiple tables in MySQL would become one table probably in HBase. My comment about "joining" a 7M and a 15M row table in MySQL is because of our daily "scan" to update that range of 7M rows. In MySQL, that's a CSV import followed by