RE: Scan vs Put vs Get

2012-06-27 Thread Anoop Sam John
Hi How many Gets you batch together in one call? Is this equal to the Scan#setCaching () that u are using? If both are same u can be sure that the the number of NW calls is coming almost same. Also you are giving random keys in the Gets. The scan will be always sequential. Seems in your ge

Re: Coprocessors on specific servers

2012-06-27 Thread Lars George
Yes exactly, this plus what Mohammad says, use the internal scanner to get just the data from the region once you are in the coprocessor code. There is an example of that in the book as well, here the online repo: https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/coprocesso

Re: Coprocessors on specific servers

2012-06-27 Thread fding hbase
HTable api document shows: Map coprocessorExec( Class protocol, byte[] startKey, byte[] endKey, Batch.Call callable) throws IOException, Throwable; startKey && endKey bounds the region of your interest. On Thu, Jun 28, 2012 at 5:43 AM, Mohammad Tariq wrote: > Aman, Lars, > >

What is the real semantic of the parameter "hbase.client.retries.number"

2012-06-27 Thread shixing
I have seen there are serveral use of the parameter "* hbase.client.retries.number*" to be used. And I find some cases used like this: *for (int i =0; i < hbase.client.retries.number; i++) {* *doSomething();* *}* And in *doSomething() *there are also *hbase.client.retries.number* retries, so

Can I use non kerberos HDFS for AccessControl HBase base on kerberos?

2012-06-27 Thread shixing
As I known, if I config the kerberos for HBase, the hdfs should also use kerberos. Now I just want to use the AccessControl based on the kerberos, it means just the "security" module to use kerberos for authorization hbase client user and not all the hbase use kerberos. -- Best wishes! My Friend

RE: direct Hfile Read and Writes

2012-06-27 Thread Anoop Sam John
When there is a need of bulk loading huge amount of data into HBase at one time, it will be better go with the direct HFile write. Here 1st using the MR framework HFiles are directly written (Into HDFS).. For this HBase provides the utility classes and the ImportTSV tool itself. Then using the In

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Ted Yu
I created HBASE-6287 for porting HBASE-5941 to trunk. Jeff: What version of HBase are you using ? Since HBASE-5941 is an improvement, a vote may be raised for porting it to other branches. On Wed, Jun 27, 2012 at 4:15 PM, Jeff Whiting wrote: >

Scan vs Put vs Get

2012-06-27 Thread Jean-Marc Spaggiari
Hi, I have a small piece of code, for testing, which is putting 1B lines in an existing table, getting 3000 lines and scanning 1. The table is one family, one column. Everything is done randomly. Put with Random key (24 bytes), fixed family and fixed column names with random content (24 byte

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Jeff Whiting
Looking at HBASE-6284 it seems that deletes are not batched at the regionserver level so that is the reason for the performance degradation. Additionally HBASE-5941 with the locks is also contributing to the performance degradation. So until those changes get into an hbase release I just have

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Ted Yu
The JIRA was HBASE-5941 On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote: > There was some difference in the way locks are taken for batched deletes > and puts. This was fixed for 89. > > I wonder if the same could be the issue here. > > Sent from my iPhone > > On Jun 27, 2012, at 2:04 PM

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Ted Yu
Amit: Can you point us to the JIRA or changelist in 0.89-fb ? Thanks On Wed, Jun 27, 2012 at 3:05 PM, Amitanand Aiyer wrote: > There was some difference in the way locks are taken for batched deletes > and puts. This was fixed for 89. > > I wonder if the same could be the issue here. > > Sent

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Amitanand Aiyer
There was some difference in the way locks are taken for batched deletes and puts. This was fixed for 89. I wonder if the same could be the issue here. Sent from my iPhone On Jun 27, 2012, at 2:04 PM, "Jeff Whiting" wrote: > I'm struggling to understand why my deletes are taking longer than

Re: Coprocessors on specific servers

2012-06-27 Thread Mohammad Tariq
Would it be useful to use InternalScanner in such a scenario?? Regards,     Mohammad Tariq On Thu, Jun 28, 2012 at 3:13 AM, Mohammad Tariq wrote: > Aman, Lars, > >              If I already know in advance that a particular region > holds the data of my interest, then how can I use Coprocessor

Re: Coprocessors on specific servers

2012-06-27 Thread Mohammad Tariq
Aman, Lars, If I already know in advance that a particular region holds the data of my interest, then how can I use Coprocessor to operate on that region only and not on all the regions of a particular table??Thank you. Regards,     Mohammad Tariq On Wed, Jun 27, 2012 at 11:57 PM,

Re: How to free space

2012-06-27 Thread Cyril Scetbon
It seems that reducing the number of versions kept by column family enable the freeing of space. Regards Cyril SCETBON On Jun 27, 2012, at 8:03 PM, Amandeep Khurana wrote: > Cyril, > > Did you notice the space on the hbase directory in HDFS change at all? It > takes time to complete the major

Re: Slow row deletion performance in comparison to insertion

2012-06-27 Thread Ted Yu
bq. if I batch the deletes into one big one at the end (rather than while I'm scanning) That's what you should do. See also HBASE-6284 where an optimization, HRegion#doMiniBatchDelete(), is under development. On Wed, Jun 27, 2012 at 2:03 PM, Jeff Whiting wrote: > I'm struggling to understand wh

Slow row deletion performance in comparison to insertion

2012-06-27 Thread Jeff Whiting
I'm struggling to understand why my deletes are taking longer than my inserts. My understanding is that a delete is just an insertion of a tombstone. And I'm deleting the entire row. I do a simple loop (pseudo code) and insert the 100 byte rows: for (int i=0; i < 5; i++) { puts.append

Re: Best practices for custom filter class distribution?

2012-06-27 Thread Evan Pollan
Thanks Amandeep -- I hadn't seen the FilterList. That should be able to get me most of the way there by simply "indexing" and chaining together DependentColumnFilters. I think I'm going to try to avoid reconfiguring the cluster and restarting the region servers at all costs. On Wed, Jun 27, 20

Re: Best practices for custom filter class distribution?

2012-06-27 Thread Scott Cinnamond
Agree with above comment on FilterList. You can create an "expression tree" of seemingly any depth by nesting FilterList and HBase seems to navigate and process this very nicely for both row and column filters. On Wed, Jun 27, 2012 at 2:33 PM, Michael Segel wrote: > One way.., > > Create an NFS m

Re: datanode timeout

2012-06-27 Thread shashwat shriparv
Hey increase the number of open file setting... Regards Shashwat Shriparv On Mon, Jun 25, 2012 at 10:21 PM, Stack wrote: > On Mon, Jun 25, 2012 at 9:00 AM, Frédéric Fondement > wrote: > > 2012-06-25 10:25:30,646 ERROR > > org.apache.hadoop.hdfs.server.datanode.DataNode: > > DatanodeRegistratio

Re: Best practices for custom filter class distribution?

2012-06-27 Thread Michael Segel
One way.., Create an NFS mountable directory for your cluster and mount on all of the DNs. You can either place a symbolic link in /usr/lib/hadoop/lib or add the jar to the classpath in /etc/hadoop/conf/hadoop-env.sh (Assuming Cloudera) On Jun 27, 2012, at 12:47 PM, Evan Pollan wrote: > What'

Re: Coprocessors on specific servers

2012-06-27 Thread Lars George
Hi Mohammad, Not sure I follow. :( Coprocessor is not MapReduce. MapReduce already takes care to run your code local to the data. Coprocessors can be seen like lightweight Map-only MapReduce job. You need to share a few more details for us to be able to help. Thanks, Lars On Jun 26, 2012, a

Re: Internalscanner problem in HBase

2012-06-27 Thread Jean-Daniel Cryans
As you may have concluded from reading the code (since you found InternalScanner), the classes that implement the interface are scoped on the region to which they belong and not the table. Only the client has a full view of a table, and it's what you should be using. Hope this helps, J-D On Tue,

Re: HBase Schema Design for clickstream data

2012-06-27 Thread Amandeep Khurana
That's not a whole lot of information to give you recommendations about the schema. However, at a high level, you should think about structuring your row keys such that you minimize the requirement for scans and can get the required data based on the row keys. So, putting the user in the row k

Re: Howto CopyTable from 0.90 to 0.92 ?

2012-06-27 Thread Eran Kutner
Thanks J-D. It does help, I've been pulling my hair out trying to figure out what am I doing wrong. -eran On Wed, Jun 27, 2012 at 9:07 PM, Jean-Daniel Cryans wrote: > It's not a matter of changing rs.class and rs.impl, it's actually the > same here. The difference is that the RPC protocol chan

Re: HBase Schema Design for clickstream data

2012-06-27 Thread Mohit Anchlia
Analysis include: Visitor level Session level - visitors could have multiple levels Page hits, conversions - popular pages, sequence of pages hit in one session Orders purchased - mostly determined by URL and query parameters How should I go about designing schema? Thanks Sent from my iPad On

Re: Howto CopyTable from 0.90 to 0.92 ?

2012-06-27 Thread Jean-Daniel Cryans
It's not a matter of changing rs.class and rs.impl, it's actually the same here. The difference is that the RPC protocol changed so it's not possible to copy between those versions. CopyTable just uses the TableOuputFormat which is an HBase client. You need to do an Export to dump the data on HDFS

Re: Coprocessors on specific servers

2012-06-27 Thread Amandeep Khurana
Mohammad, Can you describe what you are trying to do a little more? Is this a endpoint coprocessor you are trying to build? What is the functionality it'll provide? -Amandeep On Tue, Jun 26, 2012 at 12:44 PM, Mohammad Tariq wrote: > Hello Lars, > >Thank you so much for the quick respon

Re: How to free space

2012-06-27 Thread Amandeep Khurana
Cyril, Did you notice the space on the hbase directory in HDFS change at all? It takes time to complete the major compactions (and it depends on the size of the tables). Deleting column families will just delete those HFiles. That should definitely free up space. -Amandeep On Wed, Jun 27, 2012 a

Re: HBase Schema Design for clickstream data

2012-06-27 Thread Amandeep Khurana
Mohit, What would be your read patterns later on? Are you going to read per session, or for a time period, or for a set of users, or process through the entire dataset every time? That would play an important role in defining your keys and columns. -Amandeep On Tue, Jun 26, 2012 at 1:34 PM, Mohi

Re: Best practices for custom filter class distribution?

2012-06-27 Thread Amandeep Khurana
Currently, you have to compile a jar, put them on all servers and restart the RS process. I don't believe there is an easier way to do it as of right now. And I agree, it's not entirely desirable to have to restart the cluster to install a custom filter. You can combine multiple filters into a

Best practices for custom filter class distribution?

2012-06-27 Thread Evan Pollan
What're the current best practices for making custom Filter implementation classes available to the region servers? My cluster is running 0.90.4 from the CDH3U3 distribution, FWIW. I searched around and didn't find anything other than "add your filter to the region server's classpath." I'm hopin

Re: direct Hfile Read and Writes

2012-06-27 Thread Jerry Lam
Hi Samar: I have used IncrementalLoadHFile successfully in the past. Basically, once you have written hfile youreself you can use the IncrementalLoadHFile to merge with the HFile currently managed by HBase. Once it is loaded to HBase, the records in the increment hfile are accessible by clients.

Re: In memory table after using 'alter'

2012-06-27 Thread Minh Duc Nguyen
Sever, the IN_MEMORY option doesn't change when table content is transferred into RAM. Whether set to true or false, the blocks of data are only loaded into memory after a normal retrieval operation. When IN_MEMORY is set to true, HBase just tries to keep data in memory more aggressively than it n

Re: [ hbase ] performance of Get from MR Job

2012-06-27 Thread Michael Segel
I'm not sure as to what you are attempting to do with your data. There are a couple of things to look at. Looking at the issue, you have (K,V) pair. That's Key, Value. But the value isn't necessarily a single element. It could be a set of elements. You have to consider that rather than store

Re: direct Hfile Read and Writes

2012-06-27 Thread shixing
1. Since the data we might need would be distributed across regions how would direct reading of Hfile be helpful. You can read the HFilePrettyPrinter, it shows how to create a HFile.Reader and use it to read the HFile. Or you can use the ./hbase org.apache.hadoop.hbase.io.hfile.HFile -p -f hdf

Howto CopyTable from 0.90 to 0.92 ?

2012-06-27 Thread Eran Kutner
Hi, I can't figure out what to put in the rs.class and rs.impl in order to get CopyTable to copy from a 0.90 cluster to 0.92. Also, how should I reference the other cluster JAR? should I add it to the classpath? Thanks. -eran

In memory table after using 'alter'

2012-06-27 Thread Sever Fundatureanu
Hello, I initially created a table without the IN_MEMORY option enabled and loaded some data into it. Then I disabled it, modified the IN_MEMORY option using the hbase shell 'alter' command, re-enabled it and finally ran a major compaction. I do notice now the memory usage of the region servers ha

direct Hfile Read and Writes

2012-06-27 Thread samar kumar
Hi Hbase Users, I have seen API's supporting HFile direct reads and write. I Do understand it would create Hfiles in the location specified and it should be much faster since we would skip all the look ups to ZK. catalog table . RS , but can anyone point me to a particular case when we would like

Internalscanner problem in HBase

2012-06-27 Thread Liu, Keyan (NSN - CN/Beijing)
Hi All, I am using HBase to store large scale data and run range query. I have one question about the scan rows number using Internalscanner. If I use Internalscanner to scan regions, only 15 million rows are scanned. If I use Resultscanner to scan regions, I can scan 43 million rows to get more