Re: copy job for mapreduce failing due to large rows

2012-01-09 Thread T Vinod Gupta
Hi, thanks for your response. copying table A to table A was my plan but thats not what I am doing. I am copying table A to table B. Also, I am wondering - if I were able to create such large rows from my java client in the first place, then how come map reduce is erroring out? it doesn't make sens

RE: copy job for mapreduce failing due to large rows

2012-01-09 Thread Michael Segel
Uhmm... You're copying data from Table A back to Table A? Ok... you really want to disable your caching altogether and make sure each row as you write it is committed to the table. Try that... it will hurt your performance, but it may keep you afloat. HTH -Mike You've got a scanner and yo

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
Yeah, look, here's the hard part. I don't want to be a debbie downer, or someone who constantly says... its not a good idea. I really want to encourage people to think about what they are doing, and conceptually how HBase is going to handle that process. The more you think, the better your d

Re: Question about HBase for OLTP

2012-01-09 Thread Doug Meil
I think that Amandeep pretty much nailed the intent of the original question with his response "Delete and Updates in HBase are like new writes.." since I think one of the central questions was about over-write behavior (also covered in DataModel section), and the subsequent delete isn't required

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
Ok.. Look, here's the thing... HBase has no transactional support. OLTP systems like PoS systems, Hotel Reservation Systems, Trading systems... among others really need this. Again, I can't stress this point enough... DO NOT THINK ABOUT USING HBASE AS AN OLTP SYSTEM UNLESS YOU HAVE ALREADY GON

Re: Capturing RegionServerMetrics during inserts

2012-01-09 Thread Christian Schäfer
I just wonder why it isn't able to print the server load info every second to the console with the following code. Instead it just prints irregularly in time whats very disadvantageous because I want to make a simple requests/second diagram where there has to be a value each second. (Think u

copy job for mapreduce failing due to large rows

2012-01-09 Thread T Vinod Gupta
hi, I wrote a mapreduce job to copy rows from my table to the same table since i want to change my row key schema. but the job is failing consistently at the same point due to presence of large rows. i don't know how to unblock myself. here is the error stack i see. attempt_201112151554_0028_m_00

Re: Question about HBase for OLTP

2012-01-09 Thread Nicolas Spiegelberg
1) Eventual Consistency isn't a problem here. HBase is a strict consistency system. Maybe you have us confused with other Dynamo-based Open Source projects? 2) MySQL and other traditional RDBMS systems are definitely a lot more solid, well-tested, and subtlety tuned than HBase. The vast majorit

Re: information, whether a GET Request inside Map-Task is data local or not

2012-01-09 Thread Jean-Daniel Cryans
It would definitely be interesting, please do report back. Thx, J-D On Mon, Jan 9, 2012 at 2:33 PM, Christopher Dorner wrote: > Thank you for the reply. > Though that sounds a bit like some dirty hacking, it seems to be doable. I > think i will give it a try. > I can report back when i get some

HConnectionManager.deleteConnection(..., boolean stopProxy)

2012-01-09 Thread Garrett Wu
What does the stopProxy flag do in HConnectionManager.deleteConnection(Configuration conf, boolean stopProxy)? Assuming an HConnection was made with a unique Configuration instance, and I want to completely clean up after it, should I be using stopProxy=true? When would I want to use stopProxy=fals

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
Uhmmm. Well... It depends on your data and what you want to do... Can you fit all of the data into a single row? Does it make sense to use a sequence file for the raw data and then use HBase to maintain indexes? Just some food for thought. > From: t...@cloudera.com > Date: Mon, 9 Jan 2012

Re: information, whether a GET Request inside Map-Task is data local or not

2012-01-09 Thread Christopher Dorner
Thank you for the reply. Though that sounds a bit like some dirty hacking, it seems to be doable. I think i will give it a try. I can report back when i get some usable results. Maybe some more people are interested in that. Christopher Am 09.01.2012 23:15, schrieb Jean-Daniel Cryans: Short

RE: Question about HBase for OLTP

2012-01-09 Thread Michael Segel
All, Just my $0.02 worth of 'expertise'... 1) Just because you can do something doesn't mean you should. 2) One should always try to use the right tool for the job regardless of your 'fashion sense'. 3) Just because someone says "Facebook or Yahoo! does X", doesn't mean its a good idea, or

Re: information, whether a GET Request inside Map-Task is data local or not

2012-01-09 Thread Jean-Daniel Cryans
Short answer: no. Painful way to get around the problem: You *could* by looking up the machines hostname when the job starts and then from the HConnection that HTables can give you through getConnection() do getRegionLocation for the row you are going to Get and then get the hostname by getServer

Re: Missing region data.

2012-01-09 Thread James Estes
Should we file a ticket for this issue? FWIW we got this fixed (not sure if we actually lost any data though). We had to bounce the region server (non-gracefully). The region server seemed to have some stale file handles into hdfs...open inputstreams to files that were long deleted in hdfs. Any c

Re: How does HBase treat end keys?

2012-01-09 Thread lars hofhansl
If you needed to make it inclusive you can add a trailing 0 byte to the byte[] passed to setStopRow. -- Lars From: Lewis John Mcgibbney To: user@hbase.apache.org Sent: Monday, January 9, 2012 12:46 PM Subject: Re: How does HBase treat end keys? Thank you J

information, whether a GET Request inside Map-Task is data local or not

2012-01-09 Thread Christopher Dorner
Hi, i am using the input of a mapper as a rowkey to make a GET Request to a table. Is it somehow possible to retrieve information about how much data had to be transferred over network or how many of the requests were data local (namenodes are also regionservers) or where the request was not

Re: How does HBase treat end keys?

2012-01-09 Thread Lewis John Mcgibbney
Thank you Jean-Daniel, great help. Regards Lewis On Mon, Jan 9, 2012 at 8:19 PM, Jean-Daniel Cryans wrote: > From Scan's javadoc: > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStopRow(byte[]) > > stopRow - row to end at (exclusive) > > Hope this helps, > > J-D

Re: schema optimisation - go for multiple tables, rows or column families?

2012-01-09 Thread Tom
Hi Jon, Kisalay and Rohit, thank you for your feedback! I almost always need to access my metadata and (the most recent subset of ) the measurement data together. To do this access (scan/put) fast, it seems a valid goal to have my data distributed as little as possible among the cluster (ideal

Re: How does HBase treat end keys?

2012-01-09 Thread Jean-Daniel Cryans
>From Scan's javadoc: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStopRow(byte[]) stopRow - row to end at (exclusive) Hope this helps, J-D On Mon, Jan 9, 2012 at 12:14 PM, Lewis John Mcgibbney wrote: > Hi, > > Whilst working on some tests for Apache Gora, we've

How does HBase treat end keys?

2012-01-09 Thread Lewis John Mcgibbney
Hi, Whilst working on some tests for Apache Gora, we've discovered a problem with one of them. The following test [1], which I have also pasted below (I've made the area if code we are concerned with *bold* to try and point it out clearly), expects the last key in a range that was deleted to be pr

Re: Question about HBase for OLTP

2012-01-09 Thread Dhruba Borthakur
> I know HBase is designed for OLAP, query intensive type of applications. That is not entirely true. HBase is a pure transaction system and does OLTP workloads for us. We probably more than 2 millions ops/sec for one of our application, details here: https://www.facebook.com/note.php?note_id=4549

Re: snappy error during completebulkload

2012-01-09 Thread Todd Lipcon
On Mon, Jan 9, 2012 at 2:42 AM, Oliver Meyn (GBIF) wrote: > It seems really weird that compression (native compression even moreso) > should be required by a command that is in theory moving files from one place > on a remote filesystem to another.  Any light shed would be appreciated. The issu

Re: Question about HBase for OLTP

2012-01-09 Thread Todd Lipcon
On Mon, Jan 9, 2012 at 9:25 AM, fullysane wrote: > > Hi > > I know HBase is designed for OLAP, query intensive type of applications. I would disagree. HBase isn't designed for OLAP at all - It's a way better fit for the kind of applications you're referring to with mostly single-row accesses. -T

Re: Question about HBase for OLTP

2012-01-09 Thread Doug Meil
And this... http://hbase.apache.org/book.html#datamodel On 1/9/12 12:36 PM, "Amandeep Khurana" wrote: >Delete and Updates in HBase are like new writes.. The way to update a cell >is to actually do a Put. And when you delete, it internally flags the cell >to be deleted and removes the data fr

Re: Question about HBase for OLTP

2012-01-09 Thread Amandeep Khurana
Delete and Updates in HBase are like new writes.. The way to update a cell is to actually do a Put. And when you delete, it internally flags the cell to be deleted and removes the data from the underlying file on the next compaction. If you want to learn the technical details further, you could loo

Re: Question about HBase for OLTP

2012-01-09 Thread Doug Meil
For starters, see the two video presentations on this page... http://hbase.apache.org/book.html#other.info On 1/9/12 12:25 PM, "fullysane" wrote: > >Hi > >I know HBase is designed for OLAP, query intensive type of applications. >But >I like the flexibility feature of its column-base archi

Question about HBase for OLTP

2012-01-09 Thread fullysane
Hi I know HBase is designed for OLAP, query intensive type of applications. But I like the flexibility feature of its column-base architecture which allows me having no need to predefine every column of a table and I can dynamically add new column with value in my OLTP application code and captur

Re: snappy error during completebulkload

2012-01-09 Thread Jeff Whiting
Sounds like the snappy library isn't installed on the machine or that java can't find the native library. I think you need the hadoop-0.20-native installed (via apt or yum). ~Jeff On 1/9/2012 3:42 AM, Oliver Meyn (GBIF) wrote: Hi all, I'm trying to do bulk loading into a table with snappy co

Re: schema optimisation - go for multiple tables, rows or column families?

2012-01-09 Thread Rohit Kelkar
Tom, think of it this way (guys correct me if I am wrong) Each column family translates to 1 file on hdfs. You have 3 cases - case 1: Multiple tables - single key - single column family N tables and each table has 1 column family. This translates to N files on hdfs case 2: Single table - single k

Re: schema optimisation - go for multiple tables, rows or column families?

2012-01-09 Thread kisalay
Tom, I would want to add to what Jonathan suggested. The approach (1) of having multiple problems: a> As Jonathan suggested, regions are created on a per table basis, so data from different tables will fall in different regions. There is no guarantee on what servers are these regions allocated. b>

Re: schema optimisation - go for multiple tables, rows or column families?

2012-01-09 Thread Jonathan Hsieh
Hi Tom, In the case you describe -- two HTables -- there is no guarantee that they will end up going to the same region server. If you have multiple tables, these are different regions and which can (and most likely will) be distributed to different regionserver machines. The fact that both tabl

snappy error during completebulkload

2012-01-09 Thread Oliver Meyn (GBIF)
Hi all, I'm trying to do bulk loading into a table with snappy compression enabled and I'm getting an exception complaining about missing native snappy library, namely: 12/01/09 11:16:53 WARN snappy.LoadSnappy: Snappy native library not loaded Exception in thread "main" java.io.IOException: jav

schema optimisation - go for multiple tables, rows or column families?

2012-01-09 Thread Tom
Hello, I got most, but not all, answers about schemas from the HBase Book and the "Definite Guide". Let's say there is a single row key and I use this key to add to two tables, one row each (case (1)). Could someone please confirm that even though the tables are different, based on the key, th