How many data versions should I keep in HBase?

2012-04-09 Thread Davey Yan
HI, In my business case, it is unnecessary to keep more then one version of data. The application code will never try to get/scan older versions. Should I set the MAX_VERSIONS => 1 for every table, instead of the default 3 ? The hbase book online said: Compression will boost performance by reduc

Re: Add client complexity or use a coprocessor?

2012-04-09 Thread Jacques
What about maintaining a bloom filter in addition to an increment to minimize double counting? You couldn't do atomic without some custom work but it would get u mostly there. If you wanted to be fancy you could actually maintain the bloom as a bunch of separate colums to avoid update contention.

Re: Add client complexity or use a coprocessor?

2012-04-09 Thread Tom Brown
Andy, I am a big fan of the Increment class. Unfortunately, I'm not doing simple increments for the viewer count. I will be receiving duplicate messages from a particular client for a specific cube cell, and don't want them to be counted twice (my stats don't have to be 100% accurate, but the expe

Re: Schema Updates: what do you do today?

2012-04-09 Thread Alex Baranau
I think I saw one effort of creating a nice tool for doing that long time ago... Aha, here it is: https://github.com/larsgeorge/hbase-schema-manager. Might be outdated.. Lars? As for us, we do changes really rarely (usually have one table with one columnfamily in it), so one-off shell scripts work

Re: hbase map/reduce questions

2012-04-09 Thread arnaud but
thank you very much, i will take a look at these links but i think that i understand in fact I did not know the getlocation roles in the distrubtion of the map task. Le 09/04/2012 19:45, Suraj Varma a écrit : Take a look at InputSplit: http://grepcode.com/file/repository.cloudera.com/content/r

Re: HMaster shutdown when a DNS address cannot be solved

2012-04-09 Thread Mikael Sitruk
Last minute update, the patch solved the problem. I'm able to see my table the cluster is up now. Thanks Mikael.S On Mon, Apr 9, 2012 at 10:54 PM, Mikael Sitruk wrote: > Sorry for the late response, the issue pointed by Suraj seems similar, > i'll try the patch a let you know. > > Amandeep sorry

Re: HMaster shutdown when a DNS address cannot be solved

2012-04-09 Thread Mikael Sitruk
Sorry for the late response, the issue pointed by Suraj seems similar, i'll try the patch a let you know. Amandeep sorry for the dev (I still post this issue to dev because of Stack), i'll pay attention to that next time. Regarding restoring the DNS is seems to me a wrong solution, I used a FQDNS

Re: Schema Updates: what do you do today?

2012-04-09 Thread Ian Varley
Thanks, Andy. Yeah, a tool that compares a schema definition with a running cluster, and gives you a way to apply changes (without offlining, where possible), would be pretty sweet. Anybody else think so? Or, do you have tools you've already written for this? Seems like a common need (we also n

Re: Add client complexity or use a coprocessor?

2012-04-09 Thread Andrew Purtell
If it helps, yes this is possible: > Can I observe updates to a > particular table and replace the provided data with my own? (The > client calls "put" with the actual user ID, my co-processor replaces > it with a computed value, so the actual user ID never gets stored in > HBase). Since your opt

Blog post: HBaseWD: Avoid RegionServer Hotspotting Despite Sequential Keys

2012-04-09 Thread Alex Baranau
Hello, Just wanted to share blog post about avoiding non-rare RegionServer hotspotting problem when writing records with sequential keys which was discussed several times on this ML. http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential

Re: hbase map/reduce questions

2012-04-09 Thread Suraj Varma
Take a look at InputSplit: http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-737/org/apache/hadoop/mapreduce/InputSplit.java#InputSplit.getLocations%28%29 Then take a look at how TableSplit is implemented (getLocations method in p

Re: Sync latency

2012-04-09 Thread Todd Lipcon
Hi Placido, Check dmesg for scsi controller issues on all the nodes? Sometimes dead/dying disks, or bad firmware can cause 30+ second pauses -Todd On Mon, Apr 9, 2012 at 1:47 AM, Placido Revilla wrote: > Sorry, that's not the problem. In my logs block reporting never takes more > than 50 ms to

Re: Schema Updates: what do you do today?

2012-04-09 Thread Andrew Purtell
Manual schema changes via one-off shell scripts. What I would like to do is write code that gets the HTD, checks if all of the schema structure and features are as they should be, and, if not, makes the necessary modifications without taking the table offline.(I typically write code like that

Schema Updates: what do you do today?

2012-04-09 Thread Ian Varley
All: I'm doing a little research into various ways to apply schema modifications to an HBase cluster. Anybody care to share with the list what you currently do? E.g. - Connect via the HBase shell and manually issue commands ("create", "disable", "alter", etc.) - Write one-off scripts that do

Re: Speeding up HBase read response

2012-04-09 Thread Jack Levin
Yes, from %util you can see that your disks are working at 100% pretty much. Which means you can't push them go any faster. So the solution is to add more disks, add faster disks, add nodes and disks. This type of overload should not be related to HBASE, but rather to your hardware setup. -Jac

Re: Sync latency

2012-04-09 Thread Placido Revilla
Sorry, that's not the problem. In my logs block reporting never takes more than 50 ms to process, even when I'm experiencing sync pauses of 30 seconds. The dataset is currently small (1.2 TB), as the cluster has been running live for a couple of months only and I have only slightly over 11K blocks

Re: hbase table size

2012-04-09 Thread Ioan Eugen Stan
2012/4/7 mete : > Hello folks, > > i am trying to import a CSV file that is around 10 gb into HBASE. After the > import, i check the size of the folder with the hadoop fs -du command, and > it is a little above 100 gigabytes in size. > I did not confgure any compression or anything.  I have both tr

Re: HBase connection refused

2012-04-09 Thread Cosmin Lehene
It looks like HBase can't connect to the Hadoop NameNode. Check that the NameNode is running http://localhost:50070/dfshealth.jsp and see the port that it's running on 192.168.15.20:54310 (The hdfs-site.xml configuration file has the "fs.default.name" property that specifies the interface and port

Re: Speeding up HBase read response

2012-04-09 Thread ijanitran
Hi, results of iostat are pretty much very similar on all nodes: Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util xvdap10.00 0.00 294.000.00 9.27 0.0064.54 21.97 75.44 3.40 100.10 Device: rrq

HBase connection refused

2012-04-09 Thread shaharyar khan
Hi I am using hadoop and HBase.When i tried to start hadoop, It started fine but when I tried to start HBase it shows exception in log files. In log file hadoop is refusing the connection on port 54310 of localhost. Logs are given below: Mon Apr 9 12:28:15 PKT 2012 Starting master on hbase ulimit

Re: hbase map/reduce questions

2012-04-09 Thread arnaud but
ok thanks, > Yes - if you do a custom split, and have sufficient map slots in your > cluster if I understand well even if the lines are stored on only two nodes of my luster I can distribute the "map tasks" on the other nodes? eg i have 10 nodes in the cluster i done a custom split that split

Re: Store List of data items in Hbase.

2012-04-09 Thread arnaud but
Le 09/04/2012 08:46, Ram a écrit : Im trying to store a list,collection of data objects in Hbase. For example ,a User table where a the userId is the Rowkey and column family Contacts with column Contacts:EmailIds where EmailIds is a list of emails as {a...@example.com,bpqrs-Re5JQEeQqe/9co4lrsz..

Re: HMaster shutdown when a DNS address cannot be solved

2012-04-09 Thread shashwat shriparv
Since your domain has changed once try to ssh to the new short name you have added, it will ask to add it to known host just yes to that question, and check if you can ping to the name you have now, On Mon, Apr 9, 2012 at 5:19 AM, Amandeep Khurana wrote: > +user > (bcc: dev) > > Mikael, > > Such