Re: hbase table as a queue.

2011-07-19 Thread Daniel Einspanjer
Cool. filed a task for us to work on that. https://bugzilla.mozilla.org/show_bug.cgi?id=672527 On 7/19/11 12:05 PM, Stack wrote: Set region size very large (In trunk you can actually disable splitting). St.Ack On Tue, Jul 19, 2011 at 8:26 AM, Daniel Einspanjer wrote: We use a queue table

Re: hbase table as a queue.

2011-07-19 Thread Daniel Einspanjer
We use a queue table like this too and ran into the same problem. How did you configure it such that it never splits? -Daniel On 7/16/11 4:24 PM, Stack wrote: I learned friday that our fellas on the frontend are using an hbase table to do simple queuing. They insert stuff to be processed by

Re: Current status of Hbasene - is the Lily project relevant?

2010-12-27 Thread Daniel Einspanjer
Mozilla is using it for a test project that is building a datawarehouse on top of our bugzilla installation. While it is still a bit young, it is usable and very exciting, not only for the searching capabilities, but also for the application friendly extensions to HBase such as linked fields,

Re: Where do you get your hardware?

2010-11-07 Thread Daniel Einspanjer
I set up a googledocs spreadsheet a while back and shared it on this list that specifically broke down the costs associated with doing SuperMicro 2 node 2U or 4 node 2U servers. The problem with the 2.5 inch drives is that you can't get a large drive that is enterprise class (important for vib

Paid OSS task for performing manual major compactions

2010-10-05 Thread Daniel Einspanjer
o pin on your chest if you are working toward getting HBase commit access. (Did I mention getting paid for it?) Thanks for your time, Daniel Einspanjer Metrics Architect Mozilla Corporation

Re: How do people handle the OS disk partition on Hadoop or HBase nodes?

2010-09-30 Thread Daniel Einspanjer
o non-HDFS data, you can literally wedge it in like 8gb. The biggest things that are not HDFS data are logs, and those can go into the HDFS partition, they tend to be low volume but can add up over time since the default is not to reap them. On Thu, Sep 30, 2010 at 4:17 PM, Daniel Einspanjer

Re: How do people handle the OS disk partition on Hadoop or HBase nodes?

2010-09-30 Thread Daniel Einspanjer
ge it in like 8gb. The biggest things that are not HDFS data are logs, and those can go into the HDFS partition, they tend to be low volume but can add up over time since the default is not to reap them. On Thu, Sep 30, 2010 at 4:17 PM, Daniel Einspanjer wrote: Right now, most of our boxes h

How do people handle the OS disk partition on Hadoop or HBase nodes?

2010-09-30 Thread Daniel Einspanjer
Right now, most of our boxes have 3 disk in them. We take a small partition on each of those and raid stripe them together to use as the OS partition then allocate the rest of the disks as JBOD for HDFS storage. We are building out a new cluster and I'm wondering if there are any better ideas

Re: Upgrading 0.20.6 -> 0.89

2010-09-29 Thread Daniel Einspanjer
Question regarding configuration and tuning... Our current configuration/schema has fairly low hlog rollover sizes to keep the possibility of data loss to a minimum. When we upgrade to .89 with append support, I imagine we'll be able to safely set this to a much larger size. Are there any r

Cluster hardware scenarios

2010-09-25 Thread Daniel Einspanjer
I've been trying to figure out what specs our next HBase cluster should have. That mostly involves considering the balance between # nodes, disks, memory, and CPU. I put together this rough Google Docs spreadsheet with inaccurate but somewhat relative prices for some SuperMicro enclosures th

Re: HBase Thrift health checker

2010-09-10 Thread Daniel Einspanjer
Really good point about the firewall loophole, thanks for bringing it up. The code that I wrote is very much the bridge daemon you suggested. So I guess it just needs to remain living separately from Thrift. -Daniel On 9/10/10 12:05 PM, Time Less wrote: If it were to be included in HBase in

Re: HBase Thrift health checker

2010-09-09 Thread Daniel Einspanjer
able to offer an ASF grant we could include this in hbase. -ryan On Thu, Sep 9, 2010 at 5:36 PM, Daniel Einspanjer wrote: Cross posting my recent blog entry... As documented in THRIFT-601, sending random data to Thrift can cause it to leak memory. At Mozilla, we use a web load balanc

HBase Thrift health checker

2010-09-09 Thread Daniel Einspanjer
Cross posting my recent blog entry... As documented in THRIFT-601, sending random data to Thrift can cause it to leak memory.
 At Mozilla, we use a web load balancer to distribute traffic to our Thrift machines, and the default liveness check it uses is a simple TCP connect. We also had Nag

Re: HBase Query

2010-08-27 Thread Daniel Einspanjer
Xavier recently mentioned some code we use at Mozilla that should help here. It is a unioning scanner that would let you define a list of scan ranges to run for the job. You'd set a prefix for each Friday in the selected months and the range of 143000 through 143060 Then you'd apply a filter on t

Re: Region servers up and running, but Master reports 0

2010-08-23 Thread Daniel Einspanjer
Matthew, Maybe instead of changing the replication factor, you could spin up new nodes with a different datacenter/rack configuration which would cause hadoop to ensure the replicas are not solely on those temp nodes? Matthew LeMieux wrote: J-D, Thank you for the very fast response.

Re: Need help trying to balance HBase RegionServer load

2010-06-17 Thread Daniel Einspanjer
org.apache.hadoop.hbase.master.RegionServerOperation: Updated row crash_reports,21006172b7ec9f5-dcad-4c98-9dc5-969532100617,1276788891647 in region .META.,,1 with startcode=1276778868841, server=1 0.2.72.74:60020 On 6/17/10 11:42 AM, Daniel Einspanjer wrote: Currently, in our production cluster, almost all of the

Need help trying to balance HBase RegionServer load

2010-06-17 Thread Daniel Einspanjer
Currently, in our production cluster, almost all of the traffic for a day ends up assigned to a single RS and that causes the load on that machine to be too high. 
 With our last release, we salted our rowkeys so that rather than starting with the date: 
 100617 
they now start with the firs

Re: HBASE-2001 and ElasticSearch

2010-06-08 Thread Daniel Einspanjer
I didn't realize that Lily was that far along, I thought you were still in R&D for a few more months. This sounds very promising and we'll take a look at what you have available. -Daniel On 6/8/10 3:38 AM, Steven Noels wrote: On Tue, Jun 8, 2010 at 2:55 AM, Daniel Einspanjer

Re: HBASE-2001 and ElasticSearch

2010-06-07 Thread Daniel Einspanjer
At the moment, we want to do nothing more than execute a callback function *after* a get/incr/delete has sucessfully completed. If the callback fails to execute for some reason, we'd want to log an error, but wouldn't want it to have any impact on the HBase side of things. This is an extreme

Re: HBASE-2001 and ElasticSearch

2010-06-07 Thread Daniel Einspanjer
We are specifically looking for the ability to create callbacks on put, increment, and delete for specific tables so we can implement the indexing solution. This is actually advance preparation for Socorro 2.0 which won't be released until August or maybe September, so we have some dev time.

Suggested config changes to be made

2010-06-07 Thread Daniel Einspanjer
rue'}, {NAME => 'processed_data', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'raw_data', COMPRESSION => 'LZO', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} Is there any other information I should provide that could lead to other important config changes we should make on this upgrade? Daniel Einspanjer Mozilla Corporation

Re: elastic search or other Lucene for HBase?

2010-06-04 Thread Daniel Einspanjer
Mozilla is taking a hard look at using Elastic Search as an indexing/searching mechanism for Socorro 2.0. We're evaluating the possibility of using HBASE-2001 patch as a mechanism to be able to hook in NRT indexing of the documents. -Daniel On 6/3/10 5:36 PM, Steven Noels wrote: On Thu, Ju

Re: HBase Design Considerations

2010-05-27 Thread Daniel Einspanjer
rther about use cases and implementation plans to see where we might be able to effectively collaborate. Daniel Einspanjer Metrics Architect Mozilla Corporation

Request for comments - new project using HBase

2010-05-25 Thread Daniel Einspanjer
Just wanted to see if anyone knew of any potential problems with my desired HBase schema for this project. Please feel free to comment here or on the discussion page of the wiki. https://wiki.mozilla.org/BouncerRealTimeMetricsProject Daniel Einspanjer Metrics Architect Mozilla Corporation