gt; Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
--
Todd Lipcon
Software Engineer, Cloudera
it is not possible to retain all
>> concepts and details without practice, and working in an new project where
>> we are using HBase, since project inception, helped a lot!
>>
>> There are some really evolving questions, where you need to think about
>> consequences over the system as a whole, and questions that go deeper in
>> its architecture. It's a well elaborated exam indeed.
>>
>> Thanks again!
>>
>> Cristofer
>>
--
Todd Lipcon
Software Engineer, Cloudera
On Fri, Apr 13, 2012 at 9:06 PM, Stack wrote:
> On Fri, Apr 13, 2012 at 8:02 PM, Todd Lipcon wrote:
>> If you want to patch on the HBase side, you can edit HLog.java to
>> remove the checks for the "sync" method, and have it only call
>> "hflush". It'
take,
>>> it would generally be better to build more smaller servers which will
>>> thus
>>> not limit themselves by trying to put a lot of data per server through a
>>> single WAL file.
>>>
>>> Of course there may be another solution to this that I'm not aware of? If
>>> so I'd love to hear it.
>>>
>
--
Todd Lipcon
Software Engineer, Cloudera
tle time.
>
> On Thu, Apr 5, 2012 at 8:16 PM, Todd Lipcon wrote:
>
>> Hi Placido,
>>
>> Sounds like it might be related to HDFS-2379. Try updating to Hadoop
>> 1.0.1 or CDH3u3 and you'll get a fix for that.
>>
>> You can verify by grepping for &qu
; than 20 bytes).
>
> The thrift servers are doing about 250 multi operations per second in each
> RS, sustained (on peaks it is just packing more requests in each operation).
>
> So, the cluster is performing well on average, but these pauses are killing
> us (they last up to 30 seconds on bursts of up to 3 minutes). Does anyone
> have a clue as to what could be happening?
>
> Thanks.
--
Todd Lipcon
Software Engineer, Cloudera
h 28, 2012 1:12 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: 0.92 and Read/writes not scaling
>>>
>>> On Wed, Mar 28, 2012 at 5:41 AM, Buckley,Ron wrote:
>>>
>>>> For us, setting these two, got rid of all of the 20 and 40 ms response
>>>> times and dropped the average response time we measured from HBase by
>>>> more than half. Plus, we can push HBase a lot harder.
>>>>
>>>> That had an effect on random read workload only Ron?
>>> Thanks,
>>> St.Ack
>>>
>>>
>>>
>>>
>>
--
Todd Lipcon
Software Engineer, Cloudera
nificant. I've also been watching the rpcs and past events
> from the html control panel which don't seem to be indicative of a
> problem. However I will try changing the flushes and using bigger
> batches, it might turn up something interesting, thanks.
--
Todd Lipcon
Software Engineer, Cloudera
HDFS's
>> maximum receivers per datanode is 4096.
>>
>> The benchmark dataset is large enough not to fit in memory.
>> Update/Insert/Write throughput goes up to 8000 ops/sec easily.
>> However, I see read latencies in the order of seconds, and read
>> throughputs of only a few 100 ops per second.
>>
>> "Top" tells me that the CPU's on regionservers spend 70-80% of their time
>> waiting for IO, while disk and network
>> have plenty of unused bandwidth. How could I diagnose where the read
>> bottleneck is?
>>
>> Any help would be greatly appreciated :)
>>
>> Thanks in advance!
>> --
>> Bharath Ravi
>>
>>
>
>
> --
> Bharath Ravi
--
Todd Lipcon
Software Engineer, Cloudera
egionservers spend 70-80% of their time
> waiting for IO, while disk and network
> have plenty of unused bandwidth. How could I diagnose where the read
> bottleneck is?
>
> Any help would be greatly appreciated :)
>
> Thanks in advance!
> --
> Bharath Ravi
--
Todd Lipcon
Software Engineer, Cloudera
minders for us therein: e.g. some more attention to ease
>> of operation. This is not news I know -- and our letting go of the
>> (unsatisfactory) built-in secondary indexing contrib left them a high
>> and dry (we could have done better messaging around these contribs it
>> seems) -- but nonetheless a timely reminder from the lads over at
>> Traackr (sorry to see you go George and crew).
>>
>> St.Ack
>>
--
Todd Lipcon
Software Engineer, Cloudera
; Thanks!
> p.s. I ran a smaller experiment using hbase shell, and found the counters to
> be consistent even for the high rate counters. I am wondering if there is a
> buffering issue with the htable scanner object if it is unable to obtain a
> lock on the row it will default to the data on disk?
>
--
Todd Lipcon
Software Engineer, Cloudera
m.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
mostly single-row accesses.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
gt; Andrey:
>> A quick search led me to https://github.com/toddlipcon where you would be
>> able to find:
>> https://github.com/toddlipcon/hadoop-common
>>
>> Cheers
>>
>> On Sat, Dec 24, 2011 at 12:39 PM, Andrey Stepachev
>> wrote:
>>
>>
beta releases
early next year.
-Todd
>
> 20 декабря 2011 г. 11:46 пользователь Todd Lipcon написал:
>
>> Hi Andrey,
>>
>> I don't think anyone has ever run Ceph in a production use case, much
>> less run it in a production use case with HBase.
>>
>>
he files.
> Does hbase uses modification times for some critical tasks?
>
> --
> Andrey.
--
Todd Lipcon
Software Engineer, Cloudera
c 2, 2011 at 12:01 PM, Mohit Anchlia
>>> mailto:mohitanch...@gmail.com>> wrote:
>>> Where can I read more on this specific subject?
>>>
>>> Based on your answer I have more questions, but I want to read more
>>> specific information about how it works and why it's designed that
>>> way.
>>>
>>> On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans
>>> mailto:jdcry...@apache.org>> wrote:
>>> No, data is only served by one region server (even if it resides on
>>> multiple data nodes). If it dies, clients need to wait for the log
>>> replay and region reassignment.
>>>
>>> J-D
>>>
>>> On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia
>>> mailto:mohitanch...@gmail.com>> wrote:
>>> Why is HBase consisdered high in consistency and that it gives up
>>> parition tolerance? My understanding is that failure of one data node
>>> still doesn't impact client as they would re-adjust the list of
>>> available data nodes.
>>>
>>>
>>
>>
>>
--
Todd Lipcon
Software Engineer, Cloudera
cloudera/patches/
directory.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
e (stored as
>> >> >> sequence file), this statement only needs 43 seconds to finish.
>> >> >>
>> >> >> So Hive+HBase is 40X slower than Hive+HDFS.
>> >> >>
>> >> >> Though Hive+HBase has less map tasks (32 vs 223), but since there are
>> >> >> only 44 map slots available, I don't think it is the main cause.
>> >> >>
>> >> >> I studied the source code of HBase scan implementation. To me, it
>> >> >> seems, in my case, the scan performs HFile read in a quite similar
>> way
>> >> >> as sequence file read (sequential reading of each key/value pair).
>> So,
>> >> >> in theory, the performance shall be quite similar.
>> >> >>
>> >> >> Can anyone explain the 40X slowdown?
>> >> >>
>> >> >> Thanks
>> >> >> Weihua
>> >> >>
>> >> >
>> >>
>> >
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
l
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
--
Todd Lipcon
Software Engineer, Cloudera
I haven't gotten many responses so far. If there doesn't seem to be
much interest, I may not spend the time to organize.
If you're feeling too busy to answer the full survey, feel free to
just reply with a "+1" so I know there's some interest!
-Todd
On Fri, Aug 26
f you have
any questions.
Thanks! Looking forward to seeing everyone, and meeting new folks there.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
A.
>
> Please confirm that at the intersection of outputs A & B will be the values
> from B.
>
> Thanks,
> Jacques
>
--
Todd Lipcon
Software Engineer, Cloudera
missions are only checked when a file is
opened, not on every write. That's true of HDFS as well as Linux's
behavior with all local filesystems I'm aware of. As far as I know
it's the case with most NFS filers as well. Hence the existence of
explicit fencing commands accessible via remote administrative
interfaces on NetApps, for example.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
e created (the region server could
>> >> have rolled a log while the master was splitting them) and will
>> >> restart if that situation happens until it's able to own all files and
>> >> split them.
>> >>
>> >
>> > JD, I didn't think the master explicitly dealt with writer leases.
>> >
>> > Does HBase rely on single-writer semantics on the log file? That is, if
>> the
>> > master and a RS both decide to mucky-muck with a log file, you expect the
>> FS
>> > to lock out one of the writers?
>> >
>> >
>> >
>> >
>> >>
>> >> >
>> >> > Have you heard if anyone else is been having problems with the second
>> >> 90.4
>> >> > rc?
>> >>
>> >> Nope, we run it here on our dev cluster and didn't encounter any issue
>> >> (with the code or node failure).
>> >>
>> >> >
>> >> > Thanks again for your help. I'm following up with the MapR guys as
>> well.
>> >>
>> >> Good idea!
>> >>
>> >> J-D
>> >>
>> >
>>
>
>
>
> --
> Have a Nice Day!
> Lohit
>
--
Todd Lipcon
Software Engineer, Cloudera
an the
> >explained FOSS option is totally acceptable.
> >
> >Best regards,
> >
> >
> > - Andy
> >
> >Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >(via Tom White)
> >
> >
> >- Original Message -
> >
ith HBase we could recommend that, too. I
wouldn't have a problem with any of the above.
-Todd
>
>
>
> On Mon, Jul 25, 2011 at 1:00 PM, Todd Lipcon wrote:
> > On Mon, Jul 25, 2011 at 11:55 AM, Ted Dunning
> wrote:
> >
> >> Todd,
> >>
>
ect to be
held to the same standards as a MapR employee touting MapR -- i.e. I
wouldn't bring it up on the public mailing list.
> That seems to be important information for not just the original poster but
> others who may have the same problem.
>
> What is the consensus on that?
&g
es up on our list might take it as
> > license to dump their commercial messages for tech related, or not,
> > into hbase mailing lists. A list riddled with commerical messages
> > would likely sour many who are subscribed here.
> >
> > Thanks boss,
> > St.Ack
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
un(MapTask.java:369)
> >at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at javax.security.auth.Subject.doAs(Subject.java:396)
> >at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >at org.apache.hadoop.mapred.Child.main(Child.java:253)
> >
> > attempt_201106141233_0042_m_00_0: Bad line at offset: 0:
> > attempt_201106141233_0042_m_00_0: No delimiter
> > attempt_201106141233_0042_m_00_0: Bad line at offset: 34:
> > attempt_201106141233_0042_m_00_0: No delimiter
> > attempt_201106141233_0042_m_00_0: Bad line at offset: 51:
> > .. x33123 lines
> >
> >
> >
> > This message was sent using IMP, the Internet Messaging Program.
> >
> >
> >
>
>
> --
> Prashant Sharma
> Development Engineer
> Pramati Technologies
> Begumpet
>
> "Hare Krishna"
>
--
Todd Lipcon
Software Engineer, Cloudera
mand line:
> ./hadoop jar hbase-0.90.0.jar importtsv
> -Dimporttsv.columns=HBASE_ROW_KEY,f1:b,f1:c -Dimporttsv.bulk.output=output
> t1 input
>
> In that, 't1', 'f1' is table and family in HBase.
>
> No data write in 'output' folder.
>
>
gt; Row locking is rarely a good idea, it doesn't scale and they currently
> >> aren't persisted anywhere except the RS memory (so if it dies...).
> >> Using a single family might be better for you.
> >
> >
> > Thanks for the pointer.
> >
> > Leif
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
to HBase data files, and use completebulkload tool
>
>
Sounds about right. I don't know what the .dbf format is, but if you can
make an InputFormat that supports them, you can write a mapper to translate
from those records into HBase Puts, and then use HFileOutputFormat and bulk
loads just like Im
}
>
> //==
> here is the configuration file of hbase:
>
>
> hbase.zookeeper.property.maxClientCnxns
>1000
>
>
> hbase.hregion.max.filesize
>1073741824
>
>
> hbase.regionserver.han
t;> Is WAL the only way to guarantee data safety in hbase? We want high
> insert
>>> rate though.
>>> Is there a middle ground? e.g. a sync operation to flush RS to HDFS will
> be
>>> perfect!
>>>
>>>
>>>>
>>>
>
--
Todd Lipcon
Software Engineer, Cloudera
u
>
> http://pastebin.com/9KfRZFBW
>
>
> On Wed, May 25, 2011 at 1:42 PM, Todd Lipcon wrote:
>
>> Hi Wayne,
>>
>> Looks like your RAM might be oversubscribed. Could you paste your
>> hbase-site.xml and hbase-env.sh files? Also looks like you have some
>
t; >
>>
>> CMS's you should be able to configure around. u21 was supposed to
>> make improvements to put off frag but apparently made it worse. Try
>> u25, the latest. Also google for other's experience with JVMs up on
>> CentOS 5.5.
>>
>> St.Ack
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
hen applications that need the low pause times provided by the
> > > concurrent collector are run on machines with small numbers of
> > > processors (e.g., 1 or 2)." [See
> > >
> >
> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#icms
> > ]
> > > Don't you have > 2 processors per machine?
> > >
> > > St.Ack
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
_24]
>at
>
> javax.management.NotificationBroadcasterSupport.sendNotification(NotificationBroadcasterSupport.java:229)
> [:1.6.0_24]
>at javax.management.timer.Timer.sendNotification(Timer.java:1237)
> [:1.6.0_24]
> at javax.management.timer.Timer.notifyAlarmClock(Timer.java:1206)
> [:1.6.0_24]
>at javax.management.timer.TimerAlarmClock.run(Timer.java:1289)
> [:1.6.0_24]
>at java.util.TimerThread.mainLoop(Timer.java:512) [:1.6.0_24]
>at java.util.TimerThread.run(Timer.java:462) [:1.6.0_24]
> Thanks for any suggestions.
>
--
Todd Lipcon
Software Engineer, Cloudera
resentative of
> the
> > intended recipient, you are hereby notified that any dissemination of
> this
> > communication is strictly prohibited. If you have received this
> communication in
> > error, please notify us immediately by e-mail and delete the message and
> any
> > attachments from your system.
> >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
better read throughput}
>
Reads are scaled by splitting regions and distributing them around multiple
servers. If you have one super-hot row, it should fit in cache and give you
some >20k reads/second. If you need more reads/sec on a single row than
that, you'll need to add your own caching
Mon, May 2, 2011 at 1:56 PM, Jack Levin wrote:
> >> my yourkit version expired :)... but here is the jstack when it
> >> happens: http://pastebin.com/5v6mHg3t
> >>
> >> On Mon, May 2, 2011 at 1:00 PM, Todd Lipcon wrote:
> >>> On Mon, May 2,
n Mon, May 2, 2011 at 7:53 AM, Todd Lipcon wrote:
> > Hi Jack,
> >
> > Does this happen even if you aren't running Yourkit on the DN?
> >
> > Can you try using a Sun JDK instead of OpenJDK?
> >
> > -Todd
> >
> > On Sun, May 1, 2011 at 7:34
astebin.com/5v6mHg3t). After few hours, its
> >> literally staggers to a halt and gets very very slow... Any ideas
> >> whats its blocking on?
> >> (main issue is that fsreads for RS get really slow when that happens).
> >>
> >> -Jack
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
Sent: Sunday, April 24, 2011 5:39 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Help with NPE during bulk load (completebulkload)
> > >
> > > On Sun, Apr 24, 2011 at 4:29 PM, Andy Sautins <
> andy.saut...@returnpath.net> wrote:
> > >&g
; before
>>>>> making a remote copy of them using distcp.
>>>>>
>>>>> However we just upgraded hbase (we're using cloudera's package, so we
>>>>> went
>>>>> from CDH3B4 to CDH3U0, both of which are versions of 0.90.1), and
>>>>> discovered
>>>>> that the HFiles now get deleted by the load operation. Is this a
>>>>> recent
>>>>> change? Is there a configuration variable to revert this behavior?
>>>>>
>>>>> We can work around it by doing the copy before the load, but that is
>>>>> less
>>>>> than optimal in our scenario as we'd prefer to have quicker access to
>>>>> the
>>>>> data in HBase.
>>>>>
>>>>> - Adam
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
ence.
>
The importtsv program doesn't appear to be picking up mapred-site.xml, then.
Are you sure it's valid XML? You can try "xmllint" to verify. Perhaps attach
it here?
-Todd
>
> Do you have suggestions for any other features/options that I should check?
>
>
&
35:40 AM
> org.apache.hadoop.filecache.TrackerDistributedCacheManager deleteLocalPath
> INFO: Deleted path
> /tmp/hadoop-hadoop/mapred/local/archive/953502662101888516_-198765657_2115049918/file/home/hadoop/test/java/partitions_1303490140287
> Apr 22, 2011 9:35:41 AM org.apache.hadoop.mapred.JobClient
>
tchlevel for CDH3u0 (aka "GA" or "stable")
I agree with Stack's assessment above. We'll continue to cherry-pick bug
fixes back into the CDH3 version of HBase for our quarterly "update"
releases (CDH3u1, CDH3u2, etc).
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
ode, using hadoop jar to run. But I find the time
> cost
> >> > to
> >> > run the job is not twice as much as the original one, it's nearly ten
> >> > times
> >> > as much as the one only emit one key/value pair. I checked my code,
> and
> >> > I
> >> > did not find any problem. If the map() function emits either of the
> two
> >> > key/value pairs I wrote, the time cost becomes normal.
> >> > What's the cause? Do I miss any tips in bulk load?
> >> >
> >> > --
> >> > Best wishes
> >> > Gan, Xiyun
> >> >
> >
> >
> >
> > --
> > Best wishes
> > Gan, Xiyun
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
t; logs was not concerning since all were test clusters. Now we got this on
>>> the
>>> production cluster, and strange, this cluster was just installed, there
>>> is
>>> no table and no data, no activity there. So what logs is master trying to
>>> create?
>>>
>>> We are running the latest CDH3B4 from Cloudera.
>>>
>>> Thanks for any hints,
>>> Daniel
>>>
>>>
> --
> Daniel Iancu
> Java Developer,Web Components Romania
> 1&1 Internet Development srl.
> 18 Mircea Eliade St
> Sect 1, Bucharest
> RO Bucharest, 012015
> www.1and1.ro
> Phone:+40-031-223-9081
> Email:daniel.ia...@1and1.ro
> IM:diancu@united.domain
>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
-
> View this message in context:
> http://old.nabble.com/Yet-another-Hadoop-0.20-append-query-...-sorry-tp31329123p31329123.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>
--
Todd Lipcon
Software Engineer, Cloudera
;> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:94)
> >> at
> >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> at
> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> at java.lang.Thread.run(Thread.java:680)
> >>
> >> Just want to make sure I'm not missing something before I start
> >> re-compiling
> >> those libs.
> >>
> >> -GS
> >>
> >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
; >>
> >>> On Fri, Mar 25, 2011 at 12:36 PM, Alex Baranau <
> alex.barano...@gmail.com>
> >>> wrote:
> >>> > As far as I know HBase configured to initiate up to 30 connections by
> >>> > default, and maxClientCnxns for Zookeeper was meant to be 30 as well.
> >>>
> >>> Yes
> >>>
> >>> I'm not sure how it'd go from 30 to 10 (Is 10 the default connections
> >>> for zk?). Is it possible its not reading hbase-site.xml when the task
> >>> runs?
> >>>
> >>> St.Ack
> >>>
> >>
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
wrote:
> > >
> > >> On Fri, Mar 25, 2011 at 12:36 PM, Alex Baranau <
> > alex.barano...@gmail.com>
> > >> wrote:
> > >> > As far as I know HBase configured to initiate up to 30 connections
> by
> > >> > default, and maxClientCnxns for Zookeeper was meant to be 30 as
> well.
> > >>
> > >> Yes
> > >>
> > >> I'm not sure how it'd go from 30 to 10 (Is 10 the default connections
> > >> for zk?). Is it possible its not reading hbase-site.xml when the task
> > >> runs?
> > >>
> > >> St.Ack
> > >>
> > >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
nfiguration) passing in the
>>> configuration that i used to create MR job. Doesn't seem to work.
>>>
>>> So.. Is there a way to run MR job with TableInputFormat without
>>> leaking a connection? I am pretty sure i am not creating any HTables
>>> in the client side. Or is it a bug? I spent several days now
>>> investigation an issue but i am still not able to come up with a
>>> workaround against zookeeper connection leaks in HBase MR jobs.
>>>
>>> thank you very much.
>>> -Dmitriy
>>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
s" permissions
>>>
>>> Good question, you might want to ask that to cloudera.
>>>
>>> J-D
>>>
>>> On Thu, Mar 17, 2011 at 4:00 PM, Geoff Hendrey
>>> wrote:
>>>> Hi -
>>>>
>>>>
>>>>
>>>> I just upgraded to CDH3B4. I noticed when I ran 'hadoop dfsadmin
>>>> -upgrade' that the logs on the datanodes showed that hadoop was
>>> "fixing"
>>>> the permissions of my dfs storage disks to "rwx--". I am just
>>>> wondering why it does this? I had been using a subdirectory of one of
>>>> the disks for hadoop.tmp.dir, and of course now the hadoop.tmp.dir
>>>> cannot be written.
>>>>
>>>>
>>>>
>>>> -geoff
>>>>
>>>>
>>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
to iptables, what is the
> best way to disable?
Generally haven't tuned tcp cookies. For iptables, it's usually
enabled by some script in /etc/init.d - you can use chkconfig or
update-rc.d to disable depending on OS.
-Todd
>
> -Jack
>
> On Sat, Mar 12, 2011 at 8:24 PM, T
stack issue?
>> >>>> >
>> >>>> > So, does datanode need higher heap than 1GB? Or possible we ran out
>> of
>> >>>> RAM
>> >>>> > for other reasons?
>> >>>> >
>> >>>> > -Jack
>> >>>> >
>> >>>> > On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson
>> wrote:
>> >>>> >
>> >>>> >> Looks like a datanode went down. InterruptedException is how java
>> >>>> >> uses to interrupt IO in threads, its similar to the EINTR errno.
>> That
>> >>>> >> means the actual source of the abort is higher up...
>> >>>> >>
>> >>>> >> So back to how InterruptedException works... at some point a thread
>> in
>> >>>> >> the JVM decides that the VM should abort. So it calls
>> >>>> >> thread.interrupt() on all the threads it knows/cares about to
>> >>>> >> interrupt their IO. That is what you are seeing in the logs. The
>> root
>> >>>> >> cause lies above I think.
>> >>>> >>
>> >>>> >> Look for the first "Exception" string or any FATAL or ERROR strings
>> in
>> >>>> >> the datanode logfiles.
>> >>>> >>
>> >>>> >> -ryan
>> >>>> >>
>> >>>> >> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin
>> wrote:
>> >>>> >> > http://pastebin.com/ZmsyvcVc Here is the regionserver log, they
>> all
>> >>>> >> have
>> >>>> >> > similar stuff,
>> >>>> >> >
>> >>>> >> > On Thu, Mar 10, 2011 at 11:34 AM, Stack
>> wrote:
>> >>>> >> >
>> >>>> >> >> Whats in the regionserver logs? Please put up regionserver and
>> >>>> >> >> datanode excerpts.
>> >>>> >> >> Thanks Jack,
>> >>>> >> >> St.Ack
>> >>>> >> >>
>> >>>> >> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin > >
>> >>>> wrote:
>> >>>> >> >> > All was well, until this happen:
>> >>>> >> >> >
>> >>>> >> >> > http://pastebin.com/iM1niwrS
>> >>>> >> >> >
>> >>>> >> >> > and all regionservers went down, is this xciever issue?
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > dfs.datanode.max.xcievers
>> >>>> >> >> > 12047
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > this is what I have, should I set it higher?
>> >>>> >> >> >
>> >>>> >> >> > -Jack
>> >>>> >> >> >
>> >>>> >> >>
>> >>>> >> >
>> >>>> >>
>> >>>> >
>> >>>>
>> >>>
>> >>
>> >
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+HBASE+AND+fixVersion+%3D+%220.92.0%22+AND+resolution+%3D+Unresolved+AND+priority+%3D+Critical
>>
>> Best regards,
>>
>> - Andy
>>
>> Problems worthy of attack prove their worth by hitting back.
>> - Piet Hein (via Tom White)
>>
>>
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
>>>> comes out (unless you use a very small heap). The IO usage patterns
> >>>> will change, but unless you flush very small files all the time and
> >>>> need to recompact them into much bigger ones, then it shouldn't really
> >>>>
The Apache HBase team is happy to announce the general availability of HBase
0.90.1, available from your Apache mirror of choice:
http://www.apache.org/dyn/closer.cgi/hbase/
[at the time of this writing, not all mirrors have updated yet -- please
pick a different mirror if your first choice does n
adoop.compression.lzo.GPLNativeCodeLoader: Could not load native gpl
> library
> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> >
> >
> > My questions are
> >
> > 1. Are the two items I was given compatible?
> > 2. If so, what could I be doing wrong?
> >
> > Thanks
> >
> > -Pete
> >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
> >
> >
> > On Thu, Feb 3, 2011 at 10:26 PM, charan kumar
> wrote:
> >> Here you go..
> >>
> >> HBase Performance tuning page
> >> http://wiki.apache.org/hadoop/Hbase/FAQ#A7refers to the following
> >> hadoop URL.
> >>
> >&
Thu, Feb 3, 2011 at 2:06 PM, Charan K wrote:
>
> > Thanks Todd.. I will try it out ..
> >
> >
> > On Feb 3, 2011, at 1:43 PM, Todd Lipcon wrote:
> >
> > > Hi Charan,
> > >
> > > Your GC settings are way off - 6m newsize will promote way too much
orAccessorImpl.newInstance(NativeConstructor
> > > AccessorImpl.java:39)
> > > at
> > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingCon
> > > structorAccessorImpl.java:27)
> > > at
> > java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> > > at
> > > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteExce
> > > ption.java:96)
> > > at
> > > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(Remote
> > > Exception.java:80)
> > > at
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerRep
> > > ort(HRegionServer.java:729)
> > > at
> > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.j
> > > ava:586)
> > > at java.lang.Thread.run(Thread.java:619)
> > >
> > >
> > > 2011-02-03T09:53:35.165-0800: 71320.785: [GC 71320.785: [ParNew
> > > (promotion
> > > failed): 5568K->5568K(5568K), 0.4384530 secs]71321.224:
> > > [CMS2011-02-03T09:53:45.111-0800: 71330.731: [CMS-concurrent-mark:
> > > 17.511/51.564 secs] [Times: user=38.72 sys=5.67, real=51.60 secs]
> > >
> > >
> > >
> > > Thanks,
> > > Charan
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
r(HRegion.java:1119)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1095)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1765)
> ...
> 5 more
>
>
--
Todd Lipcon
Software Engineer, Cloudera
region in hand?
> >
>
> Can you tar up one of these regionserver dirs and put it somewhere I
> can pull? I'll try it over here.
> St.Ack
>
--
Todd Lipcon
Software Engineer, Cloudera
ase doesn't startup
> because of the missing jar. I am trying to build the LZO jar but Ivy, for
> some reason, can't connect to
> >>>>>
> >>>>>
> http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar
> >>>>>
> >>>>> to download the jar. I have tried running the build in offline mode
> but the Ivy gets stuck at..
> >>>>>
> >>>>> ivy-resolve-common:
> >>>>> [ivy:resolve] :: resolving dependencies ::
> com.hadoop.gplcompression#Hadoop-GPL-Compression;working@caiss01a
> >>>>> [ivy:resolve] confs: [common]
> >>>>>
> >>>>> Would anyone have a pre-build 32bit LZO jar available?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>> -Pete
> >>>>
> >>>>
> >>
> >>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
> > >> >>>>> I suppose reads could be blocked for same reason if all
> are
> > > > >> >>>>> trying
> > > > >> to
> > > > >> >>>>> read from the offlined region.
> > > > >> >>>>>
> > > > >> >>>>> What version of hbase are you using? Splits should be
> faster
> > > > >> >>>>> in
> > > > >> 0.90
> > > > >> >>>>> now that the split daughters come up on the same region.
> > > > >> >>>>>
> > > > >> >>>>> Sorry I don't have a better answer for you. Need to dig
> in.
> > > > >> >>>>>
> > > > >> >>>>> File a JIRA. If you want to help out some, stick some
> data
> > > > >> >>>>> up in
> > > > >> it.
> > > > >> >>>>> Some suggestions would be to enable logging of when we
> > > lookup
> > > > region
> > > > >> >>>>> locations in client and then note when requests go to
> zero.
> > > > >> >>>>> Can
> > > > you
> > > > >> >>>>> figure what region the clients are waiting on (if they
> are
> > > > >> >>>>> waiting
> > > > >> on
> > > > >> >>>>> any). If you can pull out a particular one, try and
> elicit
> > > > >> >>>>> its history at time of blockage. Is it being moved or
> > > > >> >>>>> mid-split? I suppose it makes sense that bigger regions
> > > > >> >>>>> would make the
> > > > situation
> > > > >> >>>>> 'worse'. I can take a look at it too.
> > > > >> >>>>>
> > > > >> >>>>> St.Ack
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> We are constantly loading data to this cluster of 10
> nodes.
> > > > >> >>>>>> These pauses can happen as frequently as every minute
> but
> > > > sometimes
> > > > >> >> are
> > > > >> >>>>> not
> > > > >> >>>>>> seen for 15+ minutes. Basically watching the Region
> server
> > > > >> >>>>>> list
> > > > >> with
> > > > >> >>>>> request
> > > > >> >>>>>> counts is the only evidence of what is going on. All
> reads
> > > > >> >>>>>> and
> > > > >> writes
> > > > >> >>>>>> totally stop and if there is ever any activity it is on
> the
> > > > >> >>>>>> node
> > > > >> >> hosting
> > > > >> >>>>> the
> > > > >> >>>>>> .META. table with a request count of region count + 1.
> This
> > > > problem
> > > > >> >>>>> seems to
> > > > >> >>>>>> be worse with a larger region size. We tried a 1GB
> region
> > > > >> >>>>>> size
> > > > and
> > > > >> >> saw
> > > > >> >>>>> this
> > > > >> >>>>>> more than we saw actual activity (and stopped using a
> larger
> > > > region
> > > > >> >> size
> > > > >> >>>>>> because of it). We went back to the default region size
> and
> > > > >> >>>>>> it
> > > > was
> > > > >> >>>>> better,
> > > > >> >>>>>> but we had too many regions so now we are up to 512M for
> a
> > > > >> >>>>>> region
> > > > >> >> size
> > > > >> >>>>> and
> > > > >> >>>>>> we are seeing it more again.
> > > > >> >>>>>>
> > > > >> >>>>>> Does anyone know what this is? We have dug into all of
> the
> > > > >> >>>>>> logs
> > > > to
> > > > >> >> find
> > > > >> >>>>> some
> > > > >> >>>>>> sort of pause but are not able to find anything. Is this
> an
> > > > >> >>>>>> wal
> > > > >> hlog
> > > > >> >>>>> roll?
> > > > >> >>>>>> Is this a region split or compaction? Of course our
> biggest
> > > > >> >>>>>> fear
> > > > is
> > > > >> a
> > > > >> >> GC
> > > > >> >>>>>> pause on the master but we do not have java logging
> turned
> > > > >> >>>>>> on
> > > > with
> > > > >> >> the
> > > > >> >>>>>> master to tell. What could possibly stop the entire
> cluster
> > > > >> >>>>>> from
> > > > >> >> working
> > > > >> >>>>> for
> > > > >> >>>>>> seconds at a time very frequently?
> > > > >> >>>>>>
> > > > >> >>>>>> Thanks in advance for any ideas of what could be causing
> > this.
> > > > >> >>>>>>
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > > >>
> > > > >
> > > >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
,"
> > delimited single configuration item. I figure that Configuration
> > should have a method to return all key values where key's match a
> > pattern. Maybe there are better ways to do this. Will get this into
> > the ticket.
> >
> > Thanks and regards,
>
; Is there a way to compress the table as the bulk uploader creates the
> > HFile. This is important for us because we don't want to have a burst
> > increase in our disk usage.
> >
> > Thanks and regards,
> > - Ashish
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
e existing HTable create API that takes a list
of boundaries. Then you don't need to deal with ZK or META manually in any
way, and if any of that stuff changes you'll be using a supported public
API.
-Todd
> On 1/13/11 10:30 AM, "Todd Lipcon" wrote:
>
> Hey Vidhya,
&g
ed in 0.90?
> >
> > I was trying to use load_table.rb to create a new table and bulk load
> files into it. It worked partly in the sense that the META table got
> populated, the files were moved to the appropriate location, but the server
> assignment did not happen until I restarted HBase. Is this a consequence of
> the master rewrite?
> >
> > V
> >
> >
>
>
--
Todd Lipcon
Software Engineer, Cloudera
me, so it ought to work on any Sun or
OpenJDK as far as I know.
>
> (Probably the wrong list for this, but: is there a officially supported JVM
> version for CDH3?)
>
>
We recommend the Sun 1.6 >=u16 but not u18
-Todd
>
>
> On 13 jan 2011, at 07:42, Todd Lipcon wrote:
&
can actually free() a
direct buffer - check out the patch referenced earlier in this thread.
Of course it probably doesn't work on other JVMs... oh well.
-Todd
>
>
> On Jan 13, 2011, at 8:30 AM, Todd Lipcon wrote:
>
> > On Wed, Jan 12, 2011 at 3:25 PM, Tatsuya Kawano >wr
(like all the rigged tests I've attempted to do) these
get cleaned up nicely by the JVM. It seems only in pretty large heaps in
real workloads does the leak actually end up running away.
-Todd
>
> On Jan 13, 2011, at 7:50 AM, Todd Lipcon wrote:
>
> > Can someone who is havin
Can someone who is having this issue try checking out the following git
branch and rebuilding LZO?
https://github.com/toddlipcon/hadoop-lzo/tree/realloc
This definitely stems one leak of a 64KB directbuffer on every reinit.
-Todd
On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon wrote:
>
t. But the Hadoop (mapreduce) LZO compression is not the
> problem. Compressing the map output using LZO works just fine. The problem
> is HBase LZO compression. The region server process is the one with the
> memory leak...
> >>
> >
> > (Sorry for dumb question Friso
The problem
> is HBase LZO compression. The region server process is the one with the
> memory leak...
> >
>
> (Sorry for dumb question Friso) But HBase is leaking because we make
> use of the Compression API in a manner that produces leaks?
> Thanks,
> St.Ack
>
--
Todd Lipcon
Software Engineer, Cloudera
ideas where to look? I'm kinda at a loss here, since I'm fairly
> certain the versions are all in sync. I'm able to browse HDFS in the
> UI and copyToLocal the /hbase-app/hbase/hbase.version file without
> problems.
>
> thanks,
> Bill
>
--
Todd Lipcon
Software Engineer, Cloudera
and. Now, is it possible that master would do that?
> Master should be able to detect invalid files in the .log/ dir and get
> rid of them without operators interaction, is there is some sort of
> design element that I am simply missing?
>
> Thanks.
>
> -Jack
>
--
Todd Lipcon
Software Engineer, Cloudera
gt; >>>>> at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:931)
> >>>>> at
> >>>>> org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:732)
> >>>>> at
> >
; >
> > > > >
> > > > >
> > > > > On Sat, Jan 8, 2011 at 11:43 AM, Stack wrote:
> > > > > > Strike that. This is a Scan, so can't do blooms + filter.
> Sorry.
> > > > > > Sounds like a coprocessor then. You'd have your query 'lean' on
> > the
> > > > > > column that you know has the lesser items and then per item,
> you'd
> > do
> > > > > > a get inside the coprocessor against the column of many entries.
> > The
> > > > > > get would go via blooms.
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > > On Sat, Jan 8, 2011 at 11:39 AM, Stack wrote:
> > > > > >> On Sat, Jan 8, 2011 at 11:35 AM, Jack Levin
> > > > wrote:
> > > > > >>> Yes, we thought about using filters, the issue is, if one
> family
> > > > > >>> column has 1ml values, and second family column has 10 values
> at
> > > the
> > > > > >>> bottom, we would end up scanning and filtering 0 records
> and
> > > > > >>> throwing them away, which seems inefficient.
> > > > > >>
> > > > > >> Blooms+filters?
> > > > > >> St.Ack
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
riter in my mapper and 0
> reducers but the output directory has a only a _temporary directory
> with my outputs in each subdirectory. That doesn't seem be be what the
> loadtable script expects (a column family directory with HFiles). Can
> someone tell me if what I am doing make
LoadIncremetalHBase.java source files
> in
> > my version, is there any reason why it wouldn't work for any case? Is
> there
> > any architectural change that is necessary for these to work well that is
> > not present in 0.20.6?
> >
> > Thanks,
> > Hari
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
r the job started.
>
> Any ideas?
>
>
> Friso
>
>
>
> On 3 jan 2011, at 19:18, Todd Lipcon wrote:
>
> > Hi Friso,
> >
> > Which OS are you running? Particularly, which version of glibc?
> >
> > Can you try running with the environment variable
, that seems too
> small).
>
+1. I'd recommend at least 64m new size.. if reasonably frequent 200-300ms
pauses are acceptable, go to 128m or larger. You can also tune SurvivorRatio
down and use a larger new size for some workloads, but it's a little messy
to figure this out.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
e maybe just one fixed size buffer (they
> all appear near 64M anyway) or possibly reuse an existing buffer also when
> it is not the exact required size but just large enough to make do? Having
> short lived direct byte buffers is apparently a discouraged practice. If
> anyone can prov
> Operating System: Linux 2.6.34.7-0.5-xen Architecture: amd64
> Number of processors:
> > 8 Committed virtual memory: 4 403 512 kbytes Total physical
> > memory: 6 815 744 kbytes Free physical memory: 82 720 kbytes
> Total swap space:
> > 8 393 924 kbytes Free swap space: 8 050 880 kbytes
> >
> >
> >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
rce at
> >
> hbase-0.89.20100924+28/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
> > to call FileOutputFormat.setCompressOutput/setOutputCompressorClass() on
> the
> > Job; in order to turn on compression.
> >
> > Does that sound right?
> >
> > Ma
way to get publicity and even
help drive candidates if you're recruiting for teams that use HBase!
The page is here:
http://wiki.apache.org/hadoop/Hbase/PoweredBy
Thanks. and happy holidays!
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
>
When you restarted HBase, it probably flushed the memstore to the storefile.
>
>
>
> I could able to see the same data , say by doing
>
> ./hadoop fs -cat /hbase/temp/123456/colfam/4588349323987497
>
>
This is the flushed file from above.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
-5 000 S 2.0 0.0 423:52.39 [kswapd0]
>
> That server is dual-core, and great for natural region serving, but
> meta interaction implies a lot of fast transactions, which would be
> better served on 8 Core Box.
>
> -Jack
>
--
Todd Lipcon
Software Engineer, Cloudera
o the local cluster first.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
Ingestor/workspace/ingest_output/metadata/2776824162022751481
> for inclusion in store metadata region
> vidhyash_test,r:cn#com#sina#news!/c/2006-07-12/192710404866.shtml!http,1292936195947.b3b27d1cc94a6378ab4da90acad4efbf.
> 2010-12-22 21:35:57,547 INFO org.apache.hadoop.hbase.regionserver.Store:
> Validating hfile at
> /user/vidhyash/wcc1/debug/Ingestor/workspace/ingest_output/metadata/1753280038544504583
> for inclusion in store metadata region
> vidhyash_test,r:com#yoka#space!/blog/34726!http,1292936189782.714fc4e266abca11f578fd90a3561337.
>
>
--
Todd Lipcon
Software Engineer, Cloudera
>
> >>> I've looked over the CDH3B3 installation documentation
> >> but have not installed it nor do presently use it.
> >>>
> >>> If we draw a line, then as an ASF community we should
> >> have a fallback option somewhere in ASF-land for the user to
> >> try. Vanilla Hadoop is not sufficient for HBase. Therefore,
> >> I propose we make a Hadoop 0.20-append tarball available.
> >>>
> >>> Best regards,
> >>>
> >>>- Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting
> >> back.
> >>> - Piet Hein (via Tom White)
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
>
>
--
Todd Lipcon
Software Engineer, Cloudera
On Fri, Dec 17, 2010 at 7:37 PM, Sandy Pratt wrote:
> -XX:MaxDirectMemorySize=100m
Yep, I always leave that at the default, whatever that might be.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
oint I haven't seen direct evidence of GC
>> activity.
>>
>> J-D
>>
>> On Fri, Dec 17, 2010 at 1:27 AM, Friso van Vollenhoven
>> mailto:fvanvollenho...@xebia.com>> wrote:
>> Hi J-D,
>>
>> Thanks for your comments and clarification. I guess GC does blow (especially
>> when writing things like databases and filesystems).
>>
>> Right now I will dive into GC tuning once more and probably lower the number
>> of reducers on the insert jobs.
>>
>>
>> Thanks,
>> Friso
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
1 - 100 of 203 matches
Mail list logo