Re: what is the recommended configuration in hbase to write big data

Stack Wed, 01 Jun 2011 10:22:30 -0700

What is slow?

You have read http://hbase.apache.org/book/performance.html?


The default splitter makes as many tasks as there regions in your
table.  If you want more, then split your table.

Why you have a max filesize of 10G?  What brought that on?

This line makes it so you flush to hbase each Put:

>                                hTable.flushCommits();

Try disabling it and let hbase manage flushing (Add a flushCommitts to
the map close() method).

You should consider bulk loader: http://hbase.apache.org/bulk-loads.html

St.Ack


On Wed, Jun 1, 2011 at 9:24 AM, byambajargal <[email protected]> wrote:
> Hello everybody
>
>  I have run  a cluster with 11 nodes hbase CDH3u0 and i have 3
>  zookeeper server in my cluster It seems very slowly when i run the job that
> import
>  text file into hbase table my question is what is the recommended
> configuration in hbase to write big data around 17GB
>  into Hbase table. Where i run the job it launched map task is only 20 it
> could be 100 or more.
> i have attached the hbase-site.xml file if someone knows it please help me
>
> here is the map function of my job:
>
> public void map(LongWritable key, Text value, OutputCollector<TextPair,
> Text>  output, Reporter reporter)   throws IOException {
>
>
>                            String line = value.toString();
>
>                                //System.out.println("[read line ]"+line);
>
>                                if(line !=null&&  !line.isEmpty()){
>
>                                String[] items = line.split("\\,");
>
>                                String concept_id = items[1];
>                            String element_id = items[0];
>                                //System.out.println("[Concept
> id]:"+concept_id);
>                                Put put = new Put(Bytes.toBytes(concept_id));
>
>                                //keys of ELEMENT_* column families are
> element id
>                                put.add(Constant.COLUMN_ELEMENT_ID,
> Bytes.toBytes(element_id),Bytes.toBytes(items[0]));
>
>  put.add(Constant.COLUMN_ELEMENT_CONTEXT_ID,Bytes.toBytes(element_id),Bytes.toBytes(items[2]));
>
>  put.add(Constant.COLUMN_ELEMENT_POSITION_FORM,
> Bytes.toBytes(element_id),Bytes.toBytes(items[3]));
>                                put.add(Constant.COLUMN_ELEMENT_POSITION_TO,
> Bytes.toBytes(element_id),Bytes.toBytes(items[4]));
>                                put.add(Constant.COLUMN_ELEMENT_TERM_ID,
> Bytes.toBytes(element_id),Bytes.toBytes(items[5]));
>
>  put.add(Constant.COLUMN_ELEMENT_DICTIONARY_ID,
> Bytes.toBytes(element_id),Bytes.toBytes(items[6]));
>
>  put.add(Constant.COLUMN_ELEMENT_WORKFLOW_STATUS,
> Bytes.toBytes(element_id),Bytes.toBytes(items[7]));
>                                hTable.put(put);
>                                hTable.setAutoFlush(true);
>                                hTable.flushCommits();
>                                //output.collect(new
> TextPair(items[1],"1"),new Text(items[0]+items[1]));
>
>                                //System.out.println("[key value]"+
> concept_id+" : "+line );
>
>                                }
> //======================================================================================
> here is the configuration file of hbase:
>
> <property>
> <name>hbase.zookeeper.property.maxClientCnxns</name>
>    <value>1000</value>
> </property>
> <property>
> <name>hbase.hregion.max.filesize</name>
>    <value>1073741824</value>
> </property>
> <property>
> <name>hbase.regionserver.handler.count</name>
>    <value>200</value>
> </property>
>
>
> <property>
>  <name>dfs.datanode.max.xcievers</name>
>  <value>4096</value>
> </property>
>
> <property>
>  <name>hfile.block.cache.size</name>
>  <value>0.4</value>
> </property>
> <property>
>  <name>hbase.client.scanner.caching</name>
>  <value>100000</value>
> </property>
>
> <property>
>  <name>hbase.zookeeper.quorum</name>
>  <value>server1,serve3,server5</value>
> </property>
>
>
>
>  cheers
>
>
>  Byambajargal
>
>
>

Re: what is the recommended configuration in hbase to write big data

Reply via email to