Hi John, What's the fix (committed on Jan11) you mentioned ? Is it in opentsdb/hbase ? Do you have a JIRA #.. ?
Thanks On Wed, Feb 25, 2015 at 5:49 AM, brady2 [via Apache HBase] < [email protected]> wrote: > Hi Sathya and Nick, > > Here are the stack traces of the region server dumps when the huge .tmp > files are created: > > https://drive.google.com/open?id=0B1tQg4D17jKQNDdFZkFQTlg4ZjQ&authuser=0 > > As background we are not using compression. Compaction is occurs every > hour. Everything else is default. > > OpenTSDB v2.0 is running on top of Cloudera 5.3.1 in AWS. We have a 7 node > Cloudera cluster(each node with 32GB ram and 3TB disk space), with 5 > OpenTSDB instances dedicated for writing and 2 for reading. We are using > AWS ELB’s in front of OpenTSDB to balance the read/writes. > > We are load testing OpenTSDB using SOCKETS, but running into several > issues. Let me explain first how we do this load testing: > > 1.From another AWS system, we have written a testing framework to generate > load. > > 2. The framework takes several parameters, we can specify the number of > threads, the loop size (i.e. the number of sockets that each thread will > open) and the batch size (i.e. the number of PUT’s, or inserts, that each > socket connection will handle). > > 3. To simplify troubleshooting, we removed variables from the tests, we > have just 1 OpenTSDB instance behind the AWS ELB so the load is being sent > to 1 instance only. > > 4. We are initially creating the openTSDB tables without any pre-splitting > of regions. > > 5. We are doing the loading with 1 metric only for ease of querying in the > UI. > > 6. We are sending under 5000 inserts per second: > > 7. At the top of the hour, the row compaction kicks in and the region > server is too busy so we lose data. it recovers the first time. But the 2nd > hour, there is so much data presumably, that it doesn’t recover. To fix it, > we have to restart cloudera, reboot the nodes, drop the tsdb tables and > re-create them. Otherwise the .tmp file keeps growing until it fills the > 3TB disks and the system is unresponsive. > > 8. We see problems with region splits happening under heavy load. We noted > a code fix committed on Jan 11 for this but I presume that is not in RC2.1. > > Thanks > > > > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-hbase.679495.n3.nabble.com/HBase-with-opentsdb-creates-huge-tmp-file-runs-out-of-hdfs-space-tp4067577p4068627.html > To unsubscribe from HBase with opentsdb creates huge .tmp file & runs out > of hdfs space, click here > <http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4067577&code=c2F0aHlhZm10QGdtYWlsLmNvbXw0MDY3NTc3fDUxNzU0MjkyMA==> > . > NAML > <http://apache-hbase.679495.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-with-opentsdb-creates-huge-tmp-file-runs-out-of-hdfs-space-tp4067577p4068650.html Sent from the HBase User mailing list archive at Nabble.com.
