Use single cluster or two clusters for log analysis and HBase?

2011-11-28 Thread jingguo yao
I want to set up Hadoop clusters. There are two workloads. One is log analysis which is using MapReduce to process big log files in HDFS. The other is HBase which is used to serve random table queries. I have two choices to set up my Hadoop clusters. One is to use one Hadoop cluster. Log analysis

RE: hbase0.90.2 with Hadoop 0.20.x

2011-11-28 Thread Jinyan Xu
I first install hadoop-0.20.2 and compile Hadoop-0.20-append then replace follow http://www.michael-noll.com/blog/2011/04/14/building-an-hadoop-0-20-x-version-for-hbase-0-90-2/#building-hadoop-0-20-append-from-branch-0-20-append Next I compile the hbase0.90.2 according to this page http://shank

Re: need feedback on PerformanceEvaluation with presplit option test code

2011-11-28 Thread Stack
On Mon, Nov 28, 2011 at 12:35 PM, Sujee Maniyam wrote: > I see the TestTable is created with splits.  But when I run 'randomWrite' > test (in MR mode)  majority of the 'requests' are going to only one region > server. One regionserver or one region only? Is your PE2 running as a mapreduce job?

Re: hbase-regionserver1: bash: {HBASE_HOME}/bin/hbase-daemon.sh: No such file or directory

2011-11-28 Thread Vamshi Krishna
Hi Lars, i am not using cygwin, i am using 3 ubuntu-10.04 machines. Finally that problem i mentioned got resolved i.e now i can see the following after i run bin/start-hbase.sh on my master machine, hbase-master: starting zookeeper, logging to /home/hduser/Documents/HBASE_SOFTWARE/hbase-0.90.4/bin

Re: HRegionserver daemon is not running on region server node

2011-11-28 Thread Vamshi Krishna
Thankyou suraj, beacuase of discussing on that issue with you, i came to know many other things also which i need to take care of during hbase setup. Finally that problem i mentioned got resolved i.e now i can see the following after i run bin/start-hbase.sh on my master machine, hbase-master: sta

Re: hbase0.90.2

2011-11-28 Thread Stack
On Mon, Nov 28, 2011 at 5:04 PM, Ted Yu wrote: > You maybe seeing this problem: > https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/22/console > Thanks for fixing Ted. St.Ack

Re: hbase0.90.2 with Hadoop 0.20.x

2011-11-28 Thread Stack
On Mon, Nov 28, 2011 at 6:30 PM, Jinyan Xu wrote: > Hi all, > > When I start hbase, message print, but under hadoop rootdir there are no   > hadoop-mapred*.jar:, hadoop-common*.jar, hadoop-hdfs*.jar. > How did you install hbase and what version are you looking at and with what version of hadoop a

Re: getting row info without data

2011-11-28 Thread Mikael Sitruk
I'll try that, thanks Mikael.S On Tue, Nov 29, 2011 at 1:45 AM, lars hofhansl wrote: > Seems like KeyOnlyFilter is what is needed here. > > It'll filter the value, but leave the entire key (rowKey, CF, column, TS, > type) in place. > > > Note that scanning with KeyOnlyFilter is not necessarily f

Re: Hbase and Eclipse on ubuntu

2011-11-28 Thread Rohit Kelkar
If you mean you want to execute the program that you have written in eclipse by connecting to a hbase cluster then the following simple lines of code should help you. Configuration hconf = HBaseConfiguration.create(); hconf.addResource("resources/config.xml"); hconf.set("hbase.zookeeper.quorum", "

Re: delete operation with timestamp

2011-11-28 Thread Shrijeet Paliwal
Hi Lars, >>You could look at the code :) Did exactly that. Just wanted to be sure that I am not missing any insight. >>Typically you won't add many columns with different time stamps as part of the same put... You are right, though, it is not strictly needed. Understood now. Thanks for bearing wi

Re: delete operation with timestamp

2011-11-28 Thread lars hofhansl
You could look at the code :) The time stamps that count are the ones on the KeyValues maintained in the put's familyMap (the set of KVs mapped to CFs). In fact the put's TS is just a convenience used as default TS for the added KVs, it is not used at the server. Typically you won't add many c

hbase0.90.2 with Hadoop 0.20.x

2011-11-28 Thread Jinyan Xu
Hi all, When I start hbase, message print, but under hadoop rootdir there are no hadoop-mapred*.jar:, hadoop-common*.jar, hadoop-hdfs*.jar. hadoop@hadoop-virtual-machine:/usr/local/hbase$ bin/start-hbase.sh cat: /usr/local/hbase/bin/../target/cached_classpath.txt: No such file or directory ls

Re: delete operation with timestamp

2011-11-28 Thread Shrijeet Paliwal
Lars, Thank you for writing. It does make sense. >>So if you trigger a Put operations from the client and you change (say) 3 columns, the server will insert 3 KeyValues into the Memstore all of which carry >>the TS of the Put. What if I construct the Put object by calling three calls to 'add' with

Re: delete operation with timestamp

2011-11-28 Thread lars hofhansl
Hi Shrijeet, you have to distinguish between the storage format and the client side objects. KeyValue is an outlier (of sorts) as it is used on both server and client). Timestamps are per cell (KeyValue). A Put object is something you create on the client to describe a put operation to be perf

Re: hbase0.90.2

2011-11-28 Thread Ted Yu
You maybe seeing this problem: https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/22/console On Mon, Nov 28, 2011 at 4:55 PM, Jinyan Xu wrote: > Hi all, > > I compile hbase0.90.2 failed on ubuntu 11.10 , why ? > > This is the procedure: > $git clone https://github.com/apache/h

hbase0.90.2

2011-11-28 Thread Jinyan Xu
Hi all, I compile hbase0.90.2 failed on ubuntu 11.10 , why ? This is the procedure: $git clone https://github.com/apache/hbase.git $cd hbase $mvn compile -Dsnappy Thanks! The information and any attached documents contained in this message may be confidential

Re: delete operation with timestamp

2011-11-28 Thread Shrijeet Paliwal
Slightly offtopic, sorry. While we have attention on timestamps may I ask why HBase maintains a timestamp at row level (initialized with LATEST_TIMESTAMP)? In other words timestamp has meaning in context of a cell and HBase keeps it at that level, then why keep one TS at row level. Going further,

Re: thrift and hbase

2011-11-28 Thread Ted Yu
With HBASE-1744, support for thrift is better. But that is in TRUNK only. On Mon, Nov 28, 2011 at 3:41 PM, Rita wrote: > Hello, > > I am planning to use thrift with python and curious what are its > limitations against the defacto Java API? Is it possible to do everything > with it or what are

Re: delete operation with timestamp

2011-11-28 Thread lars hofhansl
Hi Yi, the reason is that nothing is ever changed in-place in HBase, only new files are created (with the exception of the WAL, which is appended to, and some special scenario like atomic increment and atomic appends, where older version of the cells are removed from the memstore). That caters v

Re: getting row info without data

2011-11-28 Thread lars hofhansl
Seems like KeyOnlyFilter is what is needed here. It'll filter the value, but leave the entire key (rowKey, CF, column, TS, type) in place. Note that scanning with KeyOnlyFilter is not necessarily faster, the only part saved is shipping the value to the client. -- Lars - Original Message

thrift and hbase

2011-11-28 Thread Rita
Hello, I am planning to use thrift with python and curious what are its limitations against the defacto Java API? Is it possible to do everything with it or what are its limitations? -- --- Get your facts first, then you can distort them as you please.--

Re: getting row info without data

2011-11-28 Thread Mikael Sitruk
Yes i need the column names. Writing to 2 table will have too much payload. I have very strict requirement on latency/throughput having additional round trip just for getting the meta data of a row is too much. Similarly to http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.h

Re: getting row info without data

2011-11-28 Thread Michel Segel
Doesn't sound like it... He mentions column names... sounds like he would be better off writing to two tables. One that stores only the column name and one that stores the data in each column. Sent from a remote device. Please excuse any typos... Mike Segel On Nov 28, 2011, at 11:54 AM, Stack

Re: Region Server crash

2011-11-28 Thread Jahangir Mohammed
version? https://issues.apache.org/jira/browse/HBASE-4222 Is this helpful? Thanks, Jahangir. On Mon, Nov 28, 2011 at 2:56 PM, arun sirimalla wrote: > Hi, > > I have three region servers running on datanodes, one of the region server > crashes when try to insert with below error and the other

Region Server crash

2011-11-28 Thread arun sirimalla
Hi, I have three region servers running on datanodes, one of the region server crashes when try to insert with below error and the other two region servers are running without any errors WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_-2411272549088965456_2503 bad datanode[0]

need feedback on PerformanceEvaluation with presplit option test code

2011-11-28 Thread Sujee Maniyam
Hi All I have added a presplit option to PerformanceEvaluation class. I see the TestTable is created with splits. But when I run 'randomWrite' test (in MR mode) majority of the 'requests' are going to only one region server. Other region servers are busy as well, but catering to small number o

Re: [austin-cug] Organization Meeting - Austin ACM Special Interest Group on Knowledge Discovery and Data Mining

2011-11-28 Thread Doug Meil
Hi there- I'm happy for your new group, but can you guys take the hbase user dist-list off this conversation, please? On 11/28/11 2:27 PM, "Craig Dupree" wrote: >David, > >Please slow down, and let the rest of us have a chance to catch up >with you. You've gone from a simple idea - a g

Re: [austin-cug] Organization Meeting - Austin ACM Special Interest Group on Knowledge Discovery and Data Mining

2011-11-28 Thread Craig Dupree
David, Please slow down, and let the rest of us have a chance to catch up with you. You've gone from a simple idea - a group of us getting together to work on Big Data programming projects - to something that will require bylaws, and probably a trip or two to a lawyer. Or maybe lawyers if one

Re: How HBase implements delete operations

2011-11-28 Thread lars hofhansl
Cool! Maybe we can relate that to the client API as well... On the client this is controlled using the Delete object. o creating a Delete object for a row without specifying anything else will place a family delete marker for each CF. o columns for specific CFs can be deleted by using deleteFam

Organization Meeting - Austin ACM Special Interest Group on Knowledge Discovery and Data Mining

2011-11-28 Thread David Boney
An organization meeting for forming an Austin chapter of the ACM Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD) will be held Tuesday, November 29, 2011 at 7:00 pm at CoSpace. This was formerly advertised as Austin Hackers Dojo - Big Data Machine Learning. The meeting

Re: Partial Key Scans and Thrift

2011-11-28 Thread Stack
On Sun, Nov 27, 2011 at 9:47 PM, Greg Pelly wrote: > Hi, > > I have a PHP client accessing HBase through thrift. I posted this on > Thrift's user list and they told me to post it here. I'm a Java developer > by the way, I am doing the server side work, just letting you know so you > don't feel lik

Re: getting row info without data

2011-11-28 Thread Stack
On Mon, Nov 28, 2011 at 8:54 AM, Mikael Sitruk wrote: > Hi > > I would like to know if it is possible to retrieve the columns name and not > the whole content of rows. > The reason for such request is that the columns store high volumn of data > (2K) each (and i store 900 columns per key). > Retri

Zookeeper Connection Issues in Pseudo Distributed Mode

2011-11-28 Thread Sid Kumar
I installed Hbase to run in pseudo distributed mode and was able to start the shell, but when I try to create a table I get this error saying - ERROR: org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately. This could be a

getting row info without data

2011-11-28 Thread Mikael Sitruk
Hi I would like to know if it is possible to retrieve the columns name and not the whole content of rows. The reason for such request is that the columns store high volumn of data (2K) each (and i store 900 columns per key). Retrieving the whole row and not the "Description/Metadata" of the row is

Re: How HBase implements delete operations

2011-11-28 Thread Doug Meil
Thanks Lars, I'll update the docs with this. On 11/27/11 6:31 PM, "lars hofhansl" wrote: >That is correct. > > > From: yonghu >To: user@hbase.apache.org; lars hofhansl >Sent: Sunday, November 27, 2011 12:34 PM >Subject: Re: How HBase implements delete opera

Hbase and Eclipse on ubuntu

2011-11-28 Thread silvia90
Hi, i'm trying to connect eclipse with hbase on ubuntu, but i can't find any guide for do it. Some one can explain me how i can do it or link me a tutorial? Thanks. Silvia -- View this message in context: http://old.nabble.com/Hbase-and-Eclipse-on-ubuntu-tp32878253p32878253.html Sent from the H

Hbase and Eclipse on ubuntu

2011-11-28 Thread silvia90
Hi, i'm trying to connect eclipse with hbase on ubuntu, but i can't find any guide for do it. Some one can explain me how i can do it or link me a tutorial? Thanks. Silvia -- View this message in context: http://old.nabble.com/Hbase-and-Eclipse-on-ubuntu-tp32878247p32878247.html Sent from the H

Re: HRegionserver daemon is not running on region server node

2011-11-28 Thread Suraj Varma
Ok. Can you run dos2unix against both your HBASE_HOME/bin and HBASE_HOME/conf directory? After this, restart your cluster and see if you are getting the same issue. --Suraj On Sun, Nov 27, 2011 at 10:58 PM, Vamshi Krishna wrote: > Hi, > 1)No, hbase is running as same user i.e hduser, in all ma

Re: Problem in host resolving

2011-11-28 Thread Dejan Menges
It can be, but in this case, you need to troubleshoot first why Zookeeper is not running, as it's acting as an interface between Hadoop and HBase. On Mon, Nov 28, 2011 at 3:13 PM, Mohammad Tariq wrote: > Is there any possibility that this is happening because of improper > forward and reverse DN

Re: Problem in host resolving

2011-11-28 Thread Mohammad Tariq
Is there any possibility that this is happening because of improper forward and reverse DNS resolving Regards,     Mohammad Tariq On Mon, Nov 28, 2011 at 7:39 PM, Mohammad Tariq wrote: > Hello :) > >   I am not starting ZooKeeper manually and yes, I am using bin/start-hbase.sh > > Regards,

Re: Problem in host resolving

2011-11-28 Thread Mohammad Tariq
Hello :) I am not starting ZooKeeper manually and yes, I am using bin/start-hbase.sh Regards,     Mohammad Tariq On Mon, Nov 28, 2011 at 7:36 PM, Dejan Menges wrote: > Hi again :) > > Looks to me like ZooKeeper is not started? > > Are you starting and managing it manually or trough HBase?

Re: Problem in host resolving

2011-11-28 Thread Dejan Menges
Hi again :) Looks to me like ZooKeeper is not started? Are you starting and managing it manually or trough HBase? How are you starting HBase, using $HBASE_HOME/bin/start-hbase.sh script or manually? Tnx, Dejan On Mon, Nov 28, 2011 at 3:00 PM, Mohammad Tariq wrote: > These are the contents of

Re: Problem in host resolving

2011-11-28 Thread Mohammad Tariq
These are the contents of datanode log file - 2011-11-28 19:27:50,669 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [] STAR

Re: hbase-regionserver1: bash: {HBASE_HOME}/bin/hbase-daemon.sh: No such file or directory

2011-11-28 Thread Lars George
Hi, Did you add the list of servers to the regionservers file in the $HBASE_HOME/conf/ dir? Are you using Cygwin? Or what else is your environment? Lars On Nov 26, 2011, at 7:37 AM, Vamshi Krishna wrote: > Hi i am running hbase on 3 machines, on one node master and regionserver, > on other two

Re: Problem in host resolving

2011-11-28 Thread Mohammad Tariq
Hi Dejan, Here is the o/p of jps - solr@ubuntu:~$ jps 14792 NameNode 17899 HMaster 15014 DataNode 18001 Jps 15251 SecondaryNameNode Regards,     Mohammad Tariq On Mon, Nov 28, 2011 at 7:11 PM, Dejan Menges wrote: > Hi Mohammad, > > Looks to me like your hosts file is OK, but HDFS/Namenode is

Re: Problem in host resolving

2011-11-28 Thread Dejan Menges
Hi Mohammad, Looks to me like your hosts file is OK, but HDFS/Namenode is not running but it's trying to connect to Namenode on port 9000? Can you list your local java processes with 'jps' here and check your Namenode/Datanode logs? Tnx, Dejan On Mon, Nov 28, 2011 at 2:36 PM, Mohammad Tariq wr

Problem in host resolving

2011-11-28 Thread Mohammad Tariq
Could anyone who has used Hbase in pseudo-distributed mode shere his/her hosts file???I am getting following error - Mon Nov 28 19:03:20 IST 2011 Starting master on ubuntu ulimit -n 32768 2011-11-28 19:03:21,038 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:zookeeper.version=

Re: major compaction

2011-11-28 Thread Lars George
Your best bet - short of tailing the logs - seems to use the compactionQueue metric, that is available through Ganglia and JMX. It should go back to zero when all compactions are done. Lars On Nov 27, 2011, at 1:41 PM, Rita wrote: > Hello, > > When I do a major compaction of a table (1 billio

Error while running Hbase in pseudo-distributed mode

2011-11-28 Thread Mohammad Tariq
Hello, I am trying to learn hbase and in the process I tried hbase in standalone mode and it was a success.But when I tried pseudo distributed mode I ran into few problems.Here is the content of the master log file. Could anyone tell me how to solve this issue ??? Mon Nov 28 15:38:45 IST 201

Strategies for aggregating data in a HBase table

2011-11-28 Thread Steinmaurer Thomas
Hello, this has been already discussed a bit in the past, but I'm trying to refresh this thread as this is an important design issue in our HBase evaluation. Basically, the result of our evaluation was that we gonna be happy with what Hadoop/HBase offers for managing our measurement/sensor