Re: [VOTE] The 1st HBase 0.98.2 release candidate (RC0) is available

2014-05-11 Thread Todd Lipcon
gt; Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> > > > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > > > > -- > Best regards, > >- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Todd Lipcon Software Engineer, Cloudera

Re: CCSHB : PASS!

2012-07-18 Thread Todd Lipcon
it is not possible to retain all >> concepts and details without practice, and working in an new project where >> we are using HBase, since project inception, helped a lot! >> >> There are some really evolving questions, where you need to think about >> consequences over the system as a whole, and questions that go deeper in >> its architecture. It's a well elaborated exam indeed. >> >> Thanks again! >> >> Cristofer >> -- Todd Lipcon Software Engineer, Cloudera

Re: 0.92 and Read/writes not scaling

2012-04-14 Thread Todd Lipcon
On Fri, Apr 13, 2012 at 9:06 PM, Stack wrote: > On Fri, Apr 13, 2012 at 8:02 PM, Todd Lipcon wrote: >> If you want to patch on the HBase side, you can edit HLog.java to >> remove the checks for the "sync" method, and have it only call >> "hflush". It'

Re: 0.92 and Read/writes not scaling

2012-04-13 Thread Todd Lipcon
take, >>> it would generally be better to build more smaller servers which will >>> thus >>> not limit themselves by trying to put a lot of data per server through a >>> single WAL file. >>> >>> Of course there may be another solution to this that I'm not aware of? If >>> so I'd love to hear it. >>> > -- Todd Lipcon Software Engineer, Cloudera

Re: Sync latency

2012-04-09 Thread Todd Lipcon
tle time. > > On Thu, Apr 5, 2012 at 8:16 PM, Todd Lipcon wrote: > >> Hi Placido, >> >> Sounds like it might be related to HDFS-2379. Try updating to Hadoop >> 1.0.1 or CDH3u3 and you'll get a fix for that. >> >> You can verify by grepping for &qu

Re: Sync latency

2012-04-05 Thread Todd Lipcon
; than 20 bytes). > > The thrift servers are doing about 250 multi operations per second in each > RS, sustained (on peaks it is just packing more requests in each operation). > > So, the cluster is performing well on average, but these pauses are killing > us (they last up to 30 seconds on bursts of up to 3 minutes). Does anyone > have a clue as to what could be happening? > > Thanks. -- Todd Lipcon Software Engineer, Cloudera

Re: 0.92 and Read/writes not scaling

2012-03-29 Thread Todd Lipcon
h 28, 2012 1:12 PM >>> To: user@hbase.apache.org >>> Subject: Re: 0.92 and Read/writes not scaling >>> >>> On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron  wrote: >>> >>>> For us, setting these two, got rid of  all of the 20 and 40 ms response >>>> times and dropped the average response time we measured from HBase by >>>> more than half.  Plus, we can push HBase a lot harder. >>>> >>>>  That had an effect on random read workload only Ron? >>> Thanks, >>> St.Ack >>> >>> >>> >>> >> -- Todd Lipcon Software Engineer, Cloudera

Re: 0.92 and Read/writes not scaling

2012-03-26 Thread Todd Lipcon
nificant. I've also been watching the rpcs and past events > from the html control panel which don't seem to be indicative of a > problem. However I will try changing the flushes and using bigger > batches, it might turn up something interesting, thanks. -- Todd Lipcon Software Engineer, Cloudera

Re: Improving HBase read performance (based on YCSB)

2012-02-14 Thread Todd Lipcon
HDFS's >> maximum receivers per datanode is 4096. >> >>  The benchmark dataset is large enough not to fit in memory. >> Update/Insert/Write throughput goes up to 8000 ops/sec easily. >> However, I see read latencies in the order of seconds, and read >> throughputs of only a few 100 ops per second. >> >> "Top" tells me that the CPU's on regionservers spend 70-80% of their time >> waiting for IO, while disk and network >> have plenty of unused bandwidth. How could I diagnose where the read >> bottleneck is? >> >> Any help would be greatly appreciated :) >> >> Thanks in advance! >> -- >> Bharath Ravi >> >> > > > -- > Bharath Ravi -- Todd Lipcon Software Engineer, Cloudera

Re: Improving HBase read performance (based on YCSB)

2012-02-13 Thread Todd Lipcon
egionservers spend 70-80% of their time > waiting for IO, while disk and network > have plenty of unused bandwidth. How could I diagnose where the read > bottleneck is? > > Any help would be greatly appreciated :) > > Thanks in advance! > -- > Bharath Ravi -- Todd Lipcon Software Engineer, Cloudera

Re: On 'routs' and traackr

2012-02-08 Thread Todd Lipcon
minders for us therein: e.g. some more attention to ease >> of operation.  This is not news I know -- and our letting go of the >> (unsatisfactory) built-in secondary indexing contrib left them a high >> and dry (we could have done better messaging around these contribs it >> seems) -- but nonetheless a timely reminder from the lads over at >> Traackr (sorry to see you go George and crew). >> >> St.Ack >> -- Todd Lipcon Software Engineer, Cloudera

Re: counters and scanners inconsistency

2012-01-16 Thread Todd Lipcon
; Thanks! > p.s. I ran a smaller experiment using hbase shell, and found the counters to > be consistent even for the high rate counters.  I am wondering if there is a > buffering issue with the htable scanner object if it is unable to obtain a > lock on the row it will default to the data on disk? > -- Todd Lipcon Software Engineer, Cloudera

Re: snappy error during completebulkload

2012-01-09 Thread Todd Lipcon
m. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Question about HBase for OLTP

2012-01-09 Thread Todd Lipcon
mostly single-row accesses. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: hdfs-1623 Was: ceph and hbase.

2011-12-26 Thread Todd Lipcon
gt; Andrey: >> A quick search led me to https://github.com/toddlipcon where you would be >> able to find: >> https://github.com/toddlipcon/hadoop-common >> >> Cheers >> >> On Sat, Dec 24, 2011 at 12:39 PM, Andrey Stepachev >> wrote: >> >>

Re: ceph and hbase.

2011-12-23 Thread Todd Lipcon
beta releases early next year. -Todd > > 20 декабря 2011 г. 11:46 пользователь Todd Lipcon написал: > >> Hi Andrey, >> >> I don't think anyone has ever run Ceph in a production use case, much >> less run it in a production use case with HBase. >> >>

Re: ceph and hbase.

2011-12-19 Thread Todd Lipcon
he files. > Does hbase uses modification times for some critical tasks? > > -- > Andrey. -- Todd Lipcon Software Engineer, Cloudera

Re: HBase and Consistency in CAP

2011-12-03 Thread Todd Lipcon
c 2, 2011 at 12:01 PM, Mohit Anchlia >>> mailto:mohitanch...@gmail.com>> wrote: >>> Where can I read more on this specific subject? >>> >>> Based on your answer I have more questions, but I want to read more >>> specific information about how it works and why it's designed that >>> way. >>> >>> On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans >>> mailto:jdcry...@apache.org>> wrote: >>> No, data is only served by one region server (even if it resides on >>> multiple data nodes). If it dies, clients need to wait for the log >>> replay and region reassignment. >>> >>> J-D >>> >>> On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia >>> mailto:mohitanch...@gmail.com>> wrote: >>> Why is HBase consisdered high in consistency and that it gives up >>> parition tolerance? My understanding is that failure of one data node >>> still doesn't impact client as they would re-adjust the list of >>> available data nodes. >>> >>> >> >> >> -- Todd Lipcon Software Engineer, Cloudera

Re: Snappy for 0.90.4

2011-10-17 Thread Todd Lipcon
cloudera/patches/ directory. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Hive+HBase performance is much poorer than Hive+HDFS

2011-10-13 Thread Todd Lipcon
e (stored as >> >> >> sequence file), this statement only needs 43 seconds to finish. >> >> >> >> >> >> So Hive+HBase is 40X slower than Hive+HDFS. >> >> >> >> >> >> Though Hive+HBase has less map tasks (32 vs 223), but since there are >> >> >> only 44 map slots available, I don't think it is the main cause. >> >> >> >> >> >> I studied the source code of HBase scan implementation. To me, it >> >> >> seems, in my case, the scan performs HFile read in a quite similar >> way >> >> >> as sequence file read (sequential reading of each key/value pair). >> So, >> >> >> in theory, the performance shall be quite similar. >> >> >> >> >> >> Can anyone explain the 40X slowdown? >> >> >> >> >> >> Thanks >> >> >> Weihua >> >> >> >> >> > >> >> >> > >> > -- Todd Lipcon Software Engineer, Cloudera

Re: HBase Meetup during Hadoop World NYC '11

2011-09-12 Thread Todd Lipcon
l > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux> > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > -- Todd Lipcon Software Engineer, Cloudera

Re: HBase Meetup during Hadoop World NYC '11

2011-08-30 Thread Todd Lipcon
I haven't gotten many responses so far. If there doesn't seem to be much interest, I may not spend the time to organize. If you're feeling too busy to answer the full survey, feel free to just reply with a "+1" so I know there's some interest! -Todd On Fri, Aug 26

HBase Meetup during Hadoop World NYC '11

2011-08-26 Thread Todd Lipcon
f you have any questions. Thanks! Looking forward to seeing everyone, and meeting new folks there. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Quick Question about Bulk loading of HFiles & Timestamps

2011-08-05 Thread Todd Lipcon
A. > > Please confirm that at the intersection of outputs A & B will be the values > from B. > > Thanks, > Jacques > -- Todd Lipcon Software Engineer, Cloudera

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-05 Thread Todd Lipcon
missions are only checked when a file is opened, not on every write. That's true of HDFS as well as Linux's behavior with all local filesystems I'm aware of. As far as I know it's the case with most NFS filers as well. Hence the existence of explicit fencing commands accessible via remote administrative interfaces on NetApps, for example. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Apparent data loss on 90.4 rc2 after partial zookeeper network partition (on MapR)

2011-08-04 Thread Todd Lipcon
e created (the region server could >> >> have rolled a log while the master was splitting them) and will >> >> restart if that situation happens until it's able to own all files and >> >> split them. >> >> >> > >> > JD,   I didn't think the master explicitly dealt with writer leases. >> > >> > Does HBase rely on single-writer semantics on the log file? That is, if >> the >> > master and a RS both decide to mucky-muck with a log file, you expect the >> FS >> > to lock out one of the writers? >> > >> > >> > >> > >> >> >> >> > >> >> > Have you heard if anyone else is been having problems with the second >> >> 90.4 >> >> > rc? >> >> >> >> Nope, we run it here on our dev cluster and didn't encounter any issue >> >> (with the code or node failure). >> >> >> >> > >> >> > Thanks again for your help.  I'm following up with the MapR guys as >> well. >> >> >> >> Good idea! >> >> >> >> J-D >> >> >> > >> > > > > -- > Have a Nice Day! > Lohit > -- Todd Lipcon Software Engineer, Cloudera

Re: Monitoring

2011-07-25 Thread Todd Lipcon
an the > >explained FOSS option is totally acceptable. > > > >Best regards, > > > > > > - Andy > > > >Problems worthy of attack prove their worth by hitting back. - Piet Hein > >(via Tom White) > > > > > >- Original Message - > >

Re: Monitoring

2011-07-25 Thread Todd Lipcon
ith HBase we could recommend that, too. I wouldn't have a problem with any of the above. -Todd > > > > On Mon, Jul 25, 2011 at 1:00 PM, Todd Lipcon wrote: > > On Mon, Jul 25, 2011 at 11:55 AM, Ted Dunning > wrote: > > > >> Todd, > >> >

Re: Monitoring

2011-07-25 Thread Todd Lipcon
ect to be held to the same standards as a MapR employee touting MapR -- i.e. I wouldn't bring it up on the public mailing list. > That seems to be important information for not just the original poster but > others who may have the same problem. > > What is the consensus on that? &g

Re: Monitoring

2011-07-25 Thread Todd Lipcon
es up on our list might take it as > > license to dump their commercial messages for tech related, or not, > > into hbase mailing lists. A list riddled with commerical messages > > would likely sour many who are subscribed here. > > > > Thanks boss, > > St.Ack > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Problem with importtsv on trsnferring data from HDFS to hbase table:

2011-06-14 Thread Todd Lipcon
un(MapTask.java:369) > >at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > >at java.security.AccessController.doPrivileged(Native Method) > >at javax.security.auth.Subject.doAs(Subject.java:396) > >at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > >at org.apache.hadoop.mapred.Child.main(Child.java:253) > > > > attempt_201106141233_0042_m_00_0: Bad line at offset: 0: > > attempt_201106141233_0042_m_00_0: No delimiter > > attempt_201106141233_0042_m_00_0: Bad line at offset: 34: > > attempt_201106141233_0042_m_00_0: No delimiter > > attempt_201106141233_0042_m_00_0: Bad line at offset: 51: > > .. x33123 lines > > > > > > > > This message was sent using IMP, the Internet Messaging Program. > > > > > > > > > -- > Prashant Sharma > Development Engineer > Pramati Technologies > Begumpet > > "Hare Krishna" > -- Todd Lipcon Software Engineer, Cloudera

Re: Best way to Import data from Cassandra to HBase

2011-06-14 Thread Todd Lipcon
mand line: > ./hadoop jar hbase-0.90.0.jar importtsv > -Dimporttsv.columns=HBASE_ROW_KEY,f1:b,f1:c -Dimporttsv.bulk.output=output > t1 input > > In that, 't1', 'f1' is table and family in HBase. > > No data write in 'output' folder. > >

Re: Question from HBase book: "HBase currently does not do well with anything about two or three column families"

2011-06-13 Thread Todd Lipcon
gt; Row locking is rarely a good idea, it doesn't scale and they currently > >> aren't persisted anywhere except the RS memory (so if it dies...). > >> Using a single family might be better for you. > > > > > > Thanks for the pointer. > > > > Leif > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Best way to Import data from Cassandra to HBase

2011-06-13 Thread Todd Lipcon
to HBase data files, and use completebulkload tool > > Sounds about right. I don't know what the .dbf format is, but if you can make an InputFormat that supports them, you can write a mapper to translate from those records into HBase Puts, and then use HFileOutputFormat and bulk loads just like Im

Re: what is the recommended configuration in hbase to write big data

2011-06-01 Thread Todd Lipcon
} > > //== > here is the configuration file of hbase: > > > hbase.zookeeper.property.maxClientCnxns >1000 > > > hbase.hregion.max.filesize >1073741824 > > > hbase.regionserver.han

Re: data loss after killing RS

2011-05-28 Thread Todd Lipcon
t;> Is WAL the only way to guarantee data safety in hbase? We want high > insert >>> rate though. >>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will > be >>> perfect! >>> >>> >>>> >>> > -- Todd Lipcon Software Engineer, Cloudera

Re: mslab enabled jvm crash

2011-05-25 Thread Todd Lipcon
u > > http://pastebin.com/9KfRZFBW > > > On Wed, May 25, 2011 at 1:42 PM, Todd Lipcon wrote: > >> Hi Wayne, >> >> Looks like your RAM might be oversubscribed. Could you paste your >> hbase-site.xml and hbase-env.sh files? Also looks like you have some >

Re: mslab enabled jvm crash

2011-05-25 Thread Todd Lipcon
t; > >> >> CMS's you should be able to configure around.  u21 was supposed to >> make improvements to put off frag but apparently made it worse.  Try >> u25, the latest.  Also google for other's experience with JVMs up on >> CentOS 5.5. >> >> St.Ack >> > -- Todd Lipcon Software Engineer, Cloudera

Re: GC and High CPU

2011-05-17 Thread Todd Lipcon
hen applications that need the low pause times provided by the > > > concurrent collector are run on machines with small numbers of > > > processors (e.g., 1 or 2)." [See > > > > > > http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms > > ] > > > Don't you have > 2 processors per machine? > > > > > > St.Ack > > > -- Todd Lipcon Software Engineer, Cloudera

Re: [0.90.1-cdh3u0] Setting Filter Lists to a client Scan

2011-05-17 Thread Todd Lipcon
_24] >at > > javax.management.NotificationBroadcasterSupport.sendNotification(NotificationBroadcasterSupport.java:229) > [:1.6.0_24] >at javax.management.timer.Timer.sendNotification(Timer.java:1237) > [:1.6.0_24] > at javax.management.timer.Timer.notifyAlarmClock(Timer.java:1206) > [:1.6.0_24] >at javax.management.timer.TimerAlarmClock.run(Timer.java:1289) > [:1.6.0_24] >at java.util.TimerThread.mainLoop(Timer.java:512) [:1.6.0_24] >at java.util.TimerThread.run(Timer.java:462) [:1.6.0_24] > Thanks for any suggestions. > -- Todd Lipcon Software Engineer, Cloudera

Re: any performance results of transferring tera bytes from db to hbase?

2011-05-11 Thread Todd Lipcon
resentative of > the > > intended recipient, you are hereby notified that any dissemination of > this > > communication is strictly prohibited. If you have received this > communication in > > error, please notify us immediately by e-mail and delete the message and > any > > attachments from your system. > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: put to WAL and scan/get operation concurrency

2011-05-06 Thread Todd Lipcon
better read throughput} > Reads are scaled by splitting regions and distributing them around multiple servers. If you have one super-hot row, it should fit in cache and give you some >20k reads/second. If you need more reads/sec on a single row than that, you'll need to add your own caching

Re: one of our datanodes stops working after few hours

2011-05-02 Thread Todd Lipcon
Mon, May 2, 2011 at 1:56 PM, Jack Levin wrote: > >> my yourkit version expired :)... but here is the jstack when it > >> happens: http://pastebin.com/5v6mHg3t > >> > >> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon wrote: > >>> On Mon, May 2,

Re: one of our datanodes stops working after few hours

2011-05-02 Thread Todd Lipcon
n Mon, May 2, 2011 at 7:53 AM, Todd Lipcon wrote: > > Hi Jack, > > > > Does this happen even if you aren't running Yourkit on the DN? > > > > Can you try using a Sun JDK instead of OpenJDK? > > > > -Todd > > > > On Sun, May 1, 2011 at 7:34

Re: one of our datanodes stops working after few hours

2011-05-02 Thread Todd Lipcon
astebin.com/5v6mHg3t). After few hours, its > >> literally staggers to a halt and gets very very slow... Any ideas > >> whats its blocking on? > >> (main issue is that fsreads for RS get really slow when that happens). > >> > >> -Jack > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Help with NPE during bulk load (completebulkload)

2011-04-30 Thread Todd Lipcon
Sent: Sunday, April 24, 2011 5:39 PM > > > To: user@hbase.apache.org > > > Subject: Re: Help with NPE during bulk load (completebulkload) > > > > > > On Sun, Apr 24, 2011 at 4:29 PM, Andy Sautins < > andy.saut...@returnpath.net> wrote: > > >&g

Re: LoadIncrementalHFiles now deleting the hfiles?

2011-04-30 Thread Todd Lipcon
; before >>>>> making a remote copy of them using distcp. >>>>> >>>>> However we just upgraded hbase (we're using cloudera's package, so we >>>>> went >>>>> from CDH3B4 to CDH3U0, both of which are versions of 0.90.1), and >>>>> discovered >>>>> that the HFiles now get deleted by the load operation. Is this a >>>>> recent >>>>> change? Is there a configuration variable to revert this behavior? >>>>> >>>>> We can work around it by doing the copy before the load, but that is >>>>> less >>>>> than optimal in our scenario as we'd prefer to have quicker access to >>>>> the >>>>> data in HBase. >>>>> >>>>> - Adam >>>>> >>>>> >>>>> >>>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: importtsv

2011-04-27 Thread Todd Lipcon
ence. > The importtsv program doesn't appear to be picking up mapred-site.xml, then. Are you sure it's valid XML? You can try "xmllint" to verify. Perhaps attach it here? -Todd > > Do you have suggestions for any other features/options that I should check? > > &

Re: importtsv

2011-04-25 Thread Todd Lipcon
35:40 AM > org.apache.hadoop.filecache.TrackerDistributedCacheManager deleteLocalPath > INFO: Deleted path > /tmp/hadoop-hadoop/mapred/local/archive/953502662101888516_-198765657_2115049918/file/home/hadoop/test/java/partitions_1303490140287 > Apr 22, 2011 9:35:41 AM org.apache.hadoop.mapred.JobClient >

Re: apache hbase 0.90.2 vs CDH3 hbase0.90.1+15.18

2011-04-19 Thread Todd Lipcon
tchlevel for CDH3u0 (aka "GA" or "stable") I agree with Stack's assessment above. We'll continue to cherry-pick bug fixes back into the CDH3 version of HBase for our quarterly "update" releases (CDH3u1, CDH3u2, etc). -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: ImportTsv usage

2011-04-08 Thread Todd Lipcon
ode, using hadoop jar to run. But I find the time > cost > >> > to > >> > run the job is not twice as much as the original one, it's nearly ten > >> > times > >> > as much as the one only emit one key/value pair. I checked my code, > and > >> > I > >> > did not find any problem. If the map() function emits either of the > two > >> > key/value pairs I wrote, the time cost becomes normal. > >> > What's the cause? Do I miss any tips in bulk load? > >> > > >> > -- > >> > Best wishes > >> > Gan, Xiyun > >> > > > > > > > > > -- > > Best wishes > > Gan, Xiyun > > > -- Todd Lipcon Software Engineer, Cloudera

Re: file is already being created by NN_Recovery

2011-04-08 Thread Todd Lipcon
t; logs was not concerning since all were test clusters. Now we got this on >>> the >>> production cluster, and strange, this cluster was just installed, there >>> is >>> no table and no data, no activity there. So what logs is master trying to >>> create? >>> >>> We are running the latest CDH3B4 from Cloudera. >>> >>> Thanks for any hints, >>> Daniel >>> >>> > -- > Daniel Iancu > Java Developer,Web Components Romania > 1&1 Internet Development srl. > 18 Mircea Eliade St > Sect 1, Bucharest > RO Bucharest, 012015 > www.1and1.ro > Phone:+40-031-223-9081 > Email:daniel.ia...@1and1.ro > IM:diancu@united.domain > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Yet another Hadoop 0.20-append query ... sorry

2011-04-05 Thread Todd Lipcon
- > View this message in context: > http://old.nabble.com/Yet-another-Hadoop-0.20-append-query-...-sorry-tp31329123p31329123.html > Sent from the HBase User mailing list archive at Nabble.com. > > -- Todd Lipcon Software Engineer, Cloudera

Re: LZO Compression changes in 0.90 ?

2011-03-29 Thread Todd Lipcon
;> at > >> > >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94) > >> at > >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> at java.lang.Thread.run(Thread.java:680) > >> > >> Just want to make sure I'm not missing something before I start > >> re-compiling > >> those libs. > >> > >> -GS > >> > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-26 Thread Todd Lipcon
; >> > >>> On Fri, Mar 25, 2011 at 12:36 PM, Alex Baranau < > alex.barano...@gmail.com> > >>> wrote: > >>> > As far as I know HBase configured to initiate up to 30 connections by > >>> > default, and maxClientCnxns for Zookeeper was meant to be 30 as well. > >>> > >>> Yes > >>> > >>> I'm not sure how it'd go from 30 to 10 (Is 10 the default connections > >>> for zk?). Is it possible its not reading hbase-site.xml when the task > >>> runs? > >>> > >>> St.Ack > >>> > >> > > > -- Todd Lipcon Software Engineer, Cloudera

Re: zookeeper-3.3.2 has default maxClientCnxns set to 10?

2011-03-25 Thread Todd Lipcon
wrote: > > > > > >> On Fri, Mar 25, 2011 at 12:36 PM, Alex Baranau < > > alex.barano...@gmail.com> > > >> wrote: > > >> > As far as I know HBase configured to initiate up to 30 connections > by > > >> > default, and maxClientCnxns for Zookeeper was meant to be 30 as > well. > > >> > > >> Yes > > >> > > >> I'm not sure how it'd go from 30 to 10 (Is 10 the default connections > > >> for zk?). Is it possible its not reading hbase-site.xml when the task > > >> runs? > > >> > > >> St.Ack > > >> > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

2011-03-23 Thread Todd Lipcon
nfiguration) passing in the >>> configuration that i used to create MR job. Doesn't seem to work. >>> >>> So.. Is there a way to run MR job with TableInputFormat without >>> leaking a connection? I am pretty sure i am not creating any HTables >>> in the client side. Or is it a bug? I spent several days now >>> investigation an issue but i am still not able to come up with a >>> workaround against zookeeper connection leaks in HBase MR jobs. >>> >>> thank you very much. >>> -Dmitriy >>> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: CDH3B4 "fixes" permissions

2011-03-17 Thread Todd Lipcon
s" permissions >>> >>> Good question, you might want to ask that to cloudera. >>> >>> J-D >>> >>> On Thu, Mar 17, 2011 at 4:00 PM, Geoff Hendrey >>> wrote: >>>> Hi - >>>> >>>> >>>> >>>> I just upgraded to CDH3B4. I noticed when I ran 'hadoop dfsadmin >>>> -upgrade' that the logs on the datanodes showed that hadoop was >>> "fixing" >>>> the permissions of my dfs storage disks to "rwx--". I am just >>>> wondering why it does this? I had been using a subdirectory of one of >>>> the disks for hadoop.tmp.dir, and of course now the hadoop.tmp.dir >>>> cannot be written. >>>> >>>> >>>> >>>> -geoff >>>> >>>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: major hdfs issues

2011-03-13 Thread Todd Lipcon
to iptables, what is the > best way to disable? Generally haven't tuned tcp cookies. For iptables, it's usually enabled by some script in /etc/init.d - you can use chkconfig or update-rc.d to disable depending on OS. -Todd > > -Jack > > On Sat, Mar 12, 2011 at 8:24 PM, T

Re: major hdfs issues

2011-03-12 Thread Todd Lipcon
stack issue? >> >>>> > >> >>>> > So, does datanode need higher heap than 1GB?  Or possible we ran out >> of >> >>>> RAM >> >>>> > for other reasons? >> >>>> > >> >>>> > -Jack >> >>>> > >> >>>> > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson >> wrote: >> >>>> > >> >>>> >> Looks like a datanode went down.  InterruptedException is how java >> >>>> >> uses to interrupt IO in threads, its similar to the EINTR errno. >>  That >> >>>> >> means the actual source of the abort is higher up... >> >>>> >> >> >>>> >> So back to how InterruptedException works... at some point a thread >> in >> >>>> >> the JVM decides that the VM should abort.  So it calls >> >>>> >> thread.interrupt() on all the threads it knows/cares about to >> >>>> >> interrupt their IO.  That is what you are seeing in the logs. The >> root >> >>>> >> cause lies above I think. >> >>>> >> >> >>>> >> Look for the first "Exception" string or any FATAL or ERROR strings >> in >> >>>> >> the datanode logfiles. >> >>>> >> >> >>>> >> -ryan >> >>>> >> >> >>>> >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin >> wrote: >> >>>> >> > http://pastebin.com/ZmsyvcVc  Here is the regionserver log, they >> all >> >>>> >> have >> >>>> >> > similar stuff, >> >>>> >> > >> >>>> >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack >> wrote: >> >>>> >> > >> >>>> >> >> Whats in the regionserver logs?  Please put up regionserver and >> >>>> >> >> datanode excerpts. >> >>>> >> >> Thanks Jack, >> >>>> >> >> St.Ack >> >>>> >> >> >> >>>> >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin > > >> >>>> wrote: >> >>>> >> >> > All was well, until this happen: >> >>>> >> >> > >> >>>> >> >> > http://pastebin.com/iM1niwrS >> >>>> >> >> > >> >>>> >> >> > and all regionservers went down, is this xciever issue? >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > dfs.datanode.max.xcievers >> >>>> >> >> > 12047 >> >>>> >> >> > >> >>>> >> >> > >> >>>> >> >> > this is what I have, should I set it higher? >> >>>> >> >> > >> >>>> >> >> > -Jack >> >>>> >> >> > >> >>>> >> >> >> >>>> >> > >> >>>> >> >> >>>> > >> >>>> >> >>> >> >> >> > >> > -- Todd Lipcon Software Engineer, Cloudera

Re: putting a border around 0.92 release

2011-02-28 Thread Todd Lipcon
/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+HBASE+AND+fixVersion+%3D+%220.92.0%22+AND+resolution+%3D+Unresolved+AND+priority+%3D+Critical >> >> Best regards, >> >>    - Andy >> >> Problems worthy of attack prove their worth by hitting back. >>  - Piet Hein (via Tom White) >> >> >> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: Cluster Size/Node Density

2011-02-18 Thread Todd Lipcon
>>>> comes out (unless you use a very small heap). The IO usage patterns > >>>> will change, but unless you flush very small files all the time and > >>>> need to recompact them into much bigger ones, then it shouldn't really > >>>>

[ANN] HBase 0.90.1 available for download

2011-02-17 Thread Todd Lipcon
The Apache HBase team is happy to announce the general availability of HBase 0.90.1, available from your Apache mirror of choice: http://www.apache.org/dyn/closer.cgi/hbase/ [at the time of this writing, not all mirrors have updated yet -- please pick a different mirror if your first choice does n

Re: LZO Compression

2011-02-11 Thread Todd Lipcon
adoop.compression.lzo.GPLNativeCodeLoader: Could not load native gpl > library > > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path > > > > > > My questions are > > > > 1. Are the two items I was given compatible? > > 2. If so, what could I be doing wrong? > > > > Thanks > > > > -Pete > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Region Servers Crashing during Random Reads

2011-02-04 Thread Todd Lipcon
> > > > > > On Thu, Feb 3, 2011 at 10:26 PM, charan kumar > wrote: > >> Here you go.. > >> > >> HBase Performance tuning page > >> http://wiki.apache.org/hadoop/Hbase/FAQ#A7refers to the following > >> hadoop URL. > >> > >&

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Todd Lipcon
Thu, Feb 3, 2011 at 2:06 PM, Charan K wrote: > > > Thanks Todd.. I will try it out .. > > > > > > On Feb 3, 2011, at 1:43 PM, Todd Lipcon wrote: > > > > > Hi Charan, > > > > > > Your GC settings are way off - 6m newsize will promote way too much

Re: Region Servers Crashing during Random Reads

2011-02-03 Thread Todd Lipcon
orAccessorImpl.newInstance(NativeConstructor > > > AccessorImpl.java:39) > > > at > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon > > > structorAccessorImpl.java:27) > > > at > > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > at > > > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExce > > > ption.java:96) > > > at > > > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(Remote > > > Exception.java:80) > > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerRep > > > ort(HRegionServer.java:729) > > > at > > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.j > > > ava:586) > > > at java.lang.Thread.run(Thread.java:619) > > > > > > > > > 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew > > > (promotion > > > failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224: > > > [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark: > > > 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs] > > > > > > > > > > > > Thanks, > > > Charan > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Exception reading from Hbase table with LZO compression

2011-02-03 Thread Todd Lipcon
r(HRegion.java:1119) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:) > at > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1095) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1765) > ... > 5 more > > -- Todd Lipcon Software Engineer, Cloudera

Re: Unresponsive master in Hbase 0.90.0

2011-01-28 Thread Todd Lipcon
region in hand? > > > > Can you tar up one of these regionserver dirs and put it somewhere I > can pull? I'll try it over here. > St.Ack > -- Todd Lipcon Software Engineer, Cloudera

Re: Building LZO jar

2011-01-27 Thread Todd Lipcon
ase doesn't startup > because of the missing jar. I am trying to build the LZO jar but Ivy, for > some reason, can't connect to > >>>>> > >>>>> > http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar > >>>>> > >>>>> to download the jar. I have tried running the build in offline mode > but the Ivy gets stuck at.. > >>>>> > >>>>> ivy-resolve-common: > >>>>> [ivy:resolve] :: resolving dependencies :: > com.hadoop.gplcompression#Hadoop-GPL-Compression;working@caiss01a > >>>>> [ivy:resolve] confs: [common] > >>>>> > >>>>> Would anyone have a pre-build 32bit LZO jar available? > >>>>> > >>>>> Thanks > >>>>> > >>>>> -Pete > >>>> > >>>> > >> > >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Cluster Wide Pauses

2011-01-26 Thread Todd Lipcon
> > >> >>>>> I suppose reads could be blocked for same reason if all > are > > > > >> >>>>> trying > > > > >> to > > > > >> >>>>> read from the offlined region. > > > > >> >>>>> > > > > >> >>>>> What version of hbase are you using? Splits should be > faster > > > > >> >>>>> in > > > > >> 0.90 > > > > >> >>>>> now that the split daughters come up on the same region. > > > > >> >>>>> > > > > >> >>>>> Sorry I don't have a better answer for you. Need to dig > in. > > > > >> >>>>> > > > > >> >>>>> File a JIRA. If you want to help out some, stick some > data > > > > >> >>>>> up in > > > > >> it. > > > > >> >>>>> Some suggestions would be to enable logging of when we > > > lookup > > > > region > > > > >> >>>>> locations in client and then note when requests go to > zero. > > > > >> >>>>> Can > > > > you > > > > >> >>>>> figure what region the clients are waiting on (if they > are > > > > >> >>>>> waiting > > > > >> on > > > > >> >>>>> any). If you can pull out a particular one, try and > elicit > > > > >> >>>>> its history at time of blockage. Is it being moved or > > > > >> >>>>> mid-split? I suppose it makes sense that bigger regions > > > > >> >>>>> would make the > > > > situation > > > > >> >>>>> 'worse'. I can take a look at it too. > > > > >> >>>>> > > > > >> >>>>> St.Ack > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> > > > > >> >>>>> We are constantly loading data to this cluster of 10 > nodes. > > > > >> >>>>>> These pauses can happen as frequently as every minute > but > > > > sometimes > > > > >> >> are > > > > >> >>>>> not > > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region > server > > > > >> >>>>>> list > > > > >> with > > > > >> >>>>> request > > > > >> >>>>>> counts is the only evidence of what is going on. All > reads > > > > >> >>>>>> and > > > > >> writes > > > > >> >>>>>> totally stop and if there is ever any activity it is on > the > > > > >> >>>>>> node > > > > >> >> hosting > > > > >> >>>>> the > > > > >> >>>>>> .META. table with a request count of region count + 1. > This > > > > problem > > > > >> >>>>> seems to > > > > >> >>>>>> be worse with a larger region size. We tried a 1GB > region > > > > >> >>>>>> size > > > > and > > > > >> >> saw > > > > >> >>>>> this > > > > >> >>>>>> more than we saw actual activity (and stopped using a > larger > > > > region > > > > >> >> size > > > > >> >>>>>> because of it). We went back to the default region size > and > > > > >> >>>>>> it > > > > was > > > > >> >>>>> better, > > > > >> >>>>>> but we had too many regions so now we are up to 512M for > a > > > > >> >>>>>> region > > > > >> >> size > > > > >> >>>>> and > > > > >> >>>>>> we are seeing it more again. > > > > >> >>>>>> > > > > >> >>>>>> Does anyone know what this is? We have dug into all of > the > > > > >> >>>>>> logs > > > > to > > > > >> >> find > > > > >> >>>>> some > > > > >> >>>>>> sort of pause but are not able to find anything. Is this > an > > > > >> >>>>>> wal > > > > >> hlog > > > > >> >>>>> roll? > > > > >> >>>>>> Is this a region split or compaction? Of course our > biggest > > > > >> >>>>>> fear > > > > is > > > > >> a > > > > >> >> GC > > > > >> >>>>>> pause on the master but we do not have java logging > turned > > > > >> >>>>>> on > > > > with > > > > >> >> the > > > > >> >>>>>> master to tell. What could possibly stop the entire > cluster > > > > >> >>>>>> from > > > > >> >> working > > > > >> >>>>> for > > > > >> >>>>>> seconds at a time very frequently? > > > > >> >>>>>> > > > > >> >>>>>> Thanks in advance for any ideas of what could be causing > > this. > > > > >> >>>>>> > > > > >> >>>>> > > > > >> >>>> > > > > >> >>>> > > > > >> >>> > > > > >> >> > > > > >> > > > > >> > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Bulk upload an LZO compression

2011-01-26 Thread Todd Lipcon
," > > delimited single configuration item. I figure that Configuration > > should have a method to return all key values where key's match a > > pattern. Maybe there are better ways to do this. Will get this into > > the ticket. > > > > Thanks and regards, >

Re: Bulk upload an LZO compression

2011-01-24 Thread Todd Lipcon
; Is there a way to compress the table as the bulk uploader creates the > > HFile. This is important for us because we don't want to have a burst > > increase in our disk usage. > > > > Thanks and regards, > > - Ashish > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Ruby Bulk Load tool in 0.90

2011-01-13 Thread Todd Lipcon
e existing HTable create API that takes a list of boundaries. Then you don't need to deal with ZK or META manually in any way, and if any of that stuff changes you'll be using a supported public API. -Todd > On 1/13/11 10:30 AM, "Todd Lipcon" wrote: > > Hey Vidhya, &g

Re: Ruby Bulk Load tool in 0.90

2011-01-13 Thread Todd Lipcon
ed in 0.90? > > > > I was trying to use load_table.rb to create a new table and bulk load > files into it. It worked partly in the sense that the META table got > populated, the files were moved to the appropriate location, but the server > assignment did not happen until I restarted HBase. Is this a consequence of > the master rewrite? > > > > V > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-13 Thread Todd Lipcon
me, so it ought to work on any Sun or OpenJDK as far as I know. > > (Probably the wrong list for this, but: is there a officially supported JVM > version for CDH3?) > > We recommend the Sun 1.6 >=u16 but not u18 -Todd > > > On 13 jan 2011, at 07:42, Todd Lipcon wrote: &

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Todd Lipcon
can actually free() a direct buffer - check out the patch referenced earlier in this thread. Of course it probably doesn't work on other JVMs... oh well. -Todd > > > On Jan 13, 2011, at 8:30 AM, Todd Lipcon wrote: > > > On Wed, Jan 12, 2011 at 3:25 PM, Tatsuya Kawano >wr

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Todd Lipcon
(like all the rigged tests I've attempted to do) these get cleaned up nicely by the JVM. It seems only in pretty large heaps in real workloads does the leak actually end up running away. -Todd > > On Jan 13, 2011, at 7:50 AM, Todd Lipcon wrote: > > > Can someone who is havin

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Todd Lipcon
Can someone who is having this issue try checking out the following git branch and rebuilding LZO? https://github.com/toddlipcon/hadoop-lzo/tree/realloc This definitely stems one leak of a 64KB directbuffer on every reinit. -Todd On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon wrote: >

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Todd Lipcon
t. But the Hadoop (mapreduce) LZO compression is not the > problem. Compressing the map output using LZO works just fine. The problem > is HBase LZO compression. The region server process is the one with the > memory leak... > >> > > > > (Sorry for dumb question Friso

Re: Java Commited Virtual Memory significally larged then Heap Memory

2011-01-12 Thread Todd Lipcon
The problem > is HBase LZO compression. The region server process is the one with the > memory leak... > > > > (Sorry for dumb question Friso) But HBase is leaking because we make > use of the Compression API in a manner that produces leaks? > Thanks, > St.Ack > -- Todd Lipcon Software Engineer, Cloudera

Re: HBase fails to start - DataXceiver Version Mismatch

2011-01-10 Thread Todd Lipcon
ideas where to look? I'm kinda at a loss here, since I'm fairly > certain the versions are all in sync. I'm able to browse HDFS in the > UI and copyToLocal the /hbase-app/hbase/hbase.version file without > problems. > > thanks, > Bill > -- Todd Lipcon Software Engineer, Cloudera

Re: log reply failures, how to resolve

2011-01-08 Thread Todd Lipcon
and. Now, is it possible that master would do that? > Master should be able to detect invalid files in the .log/ dir and get > rid of them without operators interaction, is there is some sort of > design element that I am simply missing? > > Thanks. > > -Jack > -- Todd Lipcon Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

2011-01-08 Thread Todd Lipcon
gt; >>>>> at > >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931) > >>>>> at > >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732) > >>>>> at > >

Re: question about merge-join (or AND operator betwween colums)

2011-01-08 Thread Todd Lipcon
; > > > > > > > > > > > > > > > > On Sat, Jan 8, 2011 at 11:43 AM, Stack wrote: > > > > > > Strike that. This is a Scan, so can't do blooms + filter. > Sorry. > > > > > > Sounds like a coprocessor then. You'd have your query 'lean' on > > the > > > > > > column that you know has the lesser items and then per item, > you'd > > do > > > > > > a get inside the coprocessor against the column of many entries. > > The > > > > > > get would go via blooms. > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > On Sat, Jan 8, 2011 at 11:39 AM, Stack wrote: > > > > > >> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin > > > > wrote: > > > > > >>> Yes, we thought about using filters, the issue is, if one > family > > > > > >>> column has 1ml values, and second family column has 10 values > at > > > the > > > > > >>> bottom, we would end up scanning and filtering 0 records > and > > > > > >>> throwing them away, which seems inefficient. > > > > > >> > > > > > >> Blooms+filters? > > > > > >> St.Ack > > > > > >> > > > > > > > > > > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Bulk load using HFileOutputFormat.RecordWriter

2011-01-05 Thread Todd Lipcon
riter in my mapper and 0 > reducers but the output directory has a only a _temporary directory > with my outputs in each subdirectory. That doesn't seem be be what the > loadtable script expects (a column family directory with HFiles). Can > someone tell me if what I am doing make

Re: ImportTsv tool for v0.20.6?

2011-01-03 Thread Todd Lipcon
LoadIncremetalHBase.java source files > in > > my version, is there any reason why it wouldn't work for any case? Is > there > > any architectural change that is necessary for these to work well that is > > not present in 0.20.6? > > > > Thanks, > > Hari > > > -- Todd Lipcon Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

2011-01-03 Thread Todd Lipcon
r the job started. > > Any ideas? > > > Friso > > > > On 3 jan 2011, at 19:18, Todd Lipcon wrote: > > > Hi Friso, > > > > Which OS are you running? Particularly, which version of glibc? > > > > Can you try running with the environment variable

Re: CMF & NodeIsDeadException

2011-01-03 Thread Todd Lipcon
, that seems too > small). > +1. I'd recommend at least 64m new size.. if reasonably frequent 200-300ms pauses are acceptable, go to 128m or larger. You can also tune SurvivorRatio down and use a larger new size for some workloads, but it's a little messy to figure this out. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: problem with LZO compressor on write only loads

2011-01-03 Thread Todd Lipcon
e maybe just one fixed size buffer (they > all appear near 64M anyway) or possibly reuse an existing buffer also when > it is not the exact required size but just large enough to make do? Having > short lived direct byte buffers is apparently a discouraged practice. If > anyone can prov

Re: Java Commited Virtual Memory significally larged then Heap Memory

2010-12-30 Thread Todd Lipcon
> Operating System: Linux 2.6.34.7-0.5-xen Architecture: amd64 > Number of processors: > > 8 Committed virtual memory: 4 403 512 kbytes Total physical > > memory: 6 815 744 kbytes Free physical memory: 82 720 kbytes > Total swap space: > > 8 393 924 kbytes Free swap space: 8 050 880 kbytes > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: HBase Bulk Load script

2010-12-29 Thread Todd Lipcon
rce at > > > hbase-0.89.20100924+28/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java > > to call FileOutputFormat.setCompressOutput/setOutputCompressorClass() on > the > > Job; in order to turn on compression. > > > > Does that sound right? > > > > Ma

PoweredBy

2010-12-27 Thread Todd Lipcon
way to get publicity and even help drive candidates if you're recruiting for teams that use HBase! The page is here: http://wiki.apache.org/hadoop/Hbase/PoweredBy Thanks. and happy holidays! -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Even After removing the hdfs blocks from Data Nodes, I could still able to query the table.

2010-12-24 Thread Todd Lipcon
> When you restarted HBase, it probably flushed the memstore to the storefile. > > > > I could able to see the same data , say by doing > > ./hadoop fs -cat /hbase/temp/123456/colfam/4588349323987497 > > This is the flushed file from above. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: meta region cpu contention

2010-12-24 Thread Todd Lipcon
-5 000 S 2.0 0.0 423:52.39 [kswapd0] > > That server is dual-core, and great for natural region serving, but > meta interaction implies a lot of fast transactions, which would be > better served on 8 Core Box. > > -Jack > -- Todd Lipcon Software Engineer, Cloudera

Re: HBase Bulk Load script

2010-12-23 Thread Todd Lipcon
o the local cluster first. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Unresponsive yet functioning region server during bulk loads.

2010-12-23 Thread Todd Lipcon
Ingestor/workspace/ingest_output/metadata/2776824162022751481 > for inclusion in store metadata region > vidhyash_test,r:cn#com#sina#news!/c/2006-07-12/192710404866.shtml!http,1292936195947.b3b27d1cc94a6378ab4da90acad4efbf. > 2010-12-22 21:35:57,547 INFO org.apache.hadoop.hbase.regionserver.Store: > Validating hfile at > /user/vidhyash/wcc1/debug/Ingestor/workspace/ingest_output/metadata/1753280038544504583 > for inclusion in store metadata region > vidhyash_test,r:com#yoka#space!/blog/34726!http,1292936189782.714fc4e266abca11f578fd90a3561337. > > -- Todd Lipcon Software Engineer, Cloudera

Re: provide a 0.20-append tarball?

2010-12-22 Thread Todd Lipcon
> > >>> I've looked over the CDH3B3 installation documentation > >> but have not installed it nor do presently use it. > >>> > >>> If we draw a line, then as an ASF community we should > >> have a fallback option somewhere in ASF-land for the user to > >> try. Vanilla Hadoop is not sufficient for HBase. Therefore, > >> I propose we make a Hadoop 0.20-append tarball available. > >>> > >>> Best regards, > >>> > >>>- Andy > >>> > >>> Problems worthy of attack prove their worth by hitting > >> back. > >>> - Piet Hein (via Tom White) > >>> > >>> > >>> > >>> > >> > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Simple OOM crash?

2010-12-19 Thread Todd Lipcon
On Fri, Dec 17, 2010 at 7:37 PM, Sandy Pratt wrote: > -XX:MaxDirectMemorySize=100m Yep, I always leave that at the default, whatever that might be. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: substantial performance degradation when using WAL

2010-12-18 Thread Todd Lipcon
oint I haven't seen direct evidence of GC >> activity. >> >> J-D >> >> On Fri, Dec 17, 2010 at 1:27 AM, Friso van Vollenhoven >> mailto:fvanvollenho...@xebia.com>> wrote: >> Hi J-D, >> >> Thanks for your comments and clarification. I guess GC does blow (especially >> when writing things like databases and filesystems). >> >> Right now I will dive into GC tuning once more and probably lower the number >> of reducers on the insert jobs. >> >> >> Thanks, >> Friso >> >> > -- Todd Lipcon Software Engineer, Cloudera

  1   2   3   >