<after_school_special_voice>I never learned to 
read!</after_school_special_voice>

For performance metrics, you could see if there are any dtrace points for 
InnoDB. I wouldn't be able to tell you how to write them, but you'd want to 
look for things like page splits writes/second, file growth operations. That'll 
tell you what kind of storage operations are causing your performance slow 
down. There's an Oracle blog that looks like it contains some basics on writing 
this sort of thing: 
http://blogs.oracle.com/realneel/entry/inniostat_innodb_io_statistics

The other obvious options to switch up are adding noatime to skip the file 
access time writes and switching innostore to use dynamic format tables to 
avoid the large write issues. I think InnoDB uses 16K pages by default, so 8K 
objects are going to leave a lot of empty space in the data file leading to 
increased I/O. The write performance on previously seen objects is probably 
coincidence because InnoDB doesn't have to grow the data files and isn't 
performing page splits.

With very sparse page population (large values), you're going to have to update 
a lot of data pages when you insert a single row since every leaf level page 
contains pointers to the next and previous page. Worst case scenario - writing 
a single record could lead to four updates - data page, intermediate page, and 
next and previous page. So, 8k of data could result in 32k of write. With a 16k 
data page, a 9k write becomes a 64k write because of updates to other pages. 

Also, if you can put the log files on separate disks than your data files (at 
least for this load process) then you'll get much better throughput (less I/O 
contention, etc blah blah blah DBA blah blah).
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Aug 30, 2011, at 9:55 AM, Jonathan Langevin wrote:

> He's already running Raid 10
> 
> 
> Jonathan Langevin
> Systems Administrator
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - 
> www.loomlearning.com - Skype: intel352
> 
> 
> 
> On Tue, Aug 30, 2011 at 12:51 PM, Jeremiah Peschka 
> <jeremiah.pesc...@gmail.com> wrote:
> InnoStore is going to insert the data in key order. When you attempt to 
> insert a record that would fit in between two keys (inserting "apple" between 
> "aardvark" and "Byzantium") you're probably going to get a page split, just 
> like in an RDBMS. The data needs to be re-shuffled in order to write it in 
> key order. Despite using MVCC, InnoDB is an index ordered table. Data is 
> written in key order. LevelDB shouldn't have the ordered write issues that 
> InnoStore has.
> 
> The procedure to bulk load your data is going to be the same as bulk loading 
> data for any large Data Warehouse - ordered inserts will help you.
> 
> If there's a way to load your data in smaller ordered chunks, that's going to 
> help you out too.
> 
> Some other options would be to increase the buffer_pool_size for InnoStore. 
> InnoDB will use RAM as a disk cache to avoid disk hits. This would be a "Good 
> Thing". Since you're writing in 8k chunks, roughly, you could also increase 
> the page size. This would be something to experiment with, but increasing 
> database page size could improve write performance. Odds are if your average 
> record size is 8kb, then you're writing multiple pages per record (key + 
> value), which is causing even more I/O (two or more page reads per object). 
> Some filesystem tuning could be in order. SQL Server, for instance, performs 
> sequential reads in 64k chunks. The best practice for disk performance there 
> is to format NTFS to use 64k blocks. The throughput between the defaults and 
> 64k reads is amazing.
> 
> Also, you may want to look at your storage configuration in the back end. If 
> you have that much data, are you using a SAN or DAS? Both of these can help 
> get additional write performance, especially if you adjust the storage device 
> to cache writes in a battery backed cache. What kind of drive configuration 
> do you have? You can get tremendous write performance improvements by moving 
> to RAID10.
> 
> ---
> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
> Microsoft SQL Server MVP
> 
> On Aug 30, 2011, at 9:34 AM, David Koblas wrote:
> 
> > Yes - but the thought of sorting 800M records which are all about 8k in 
> > size is a little daunting...  Something like a 6TB sort...  Plus it doesn't 
> > answer the ongoing insert problem, which is 20 keys/sec isn't functional.
> >
> > --david
> >
> > On 8/30/11 9:27 AM, Kresten Krab Thorup wrote:
> >> If you can insert the objects in ascending key order, then innostore will 
> >> be much faster than a random insert.
> >>
> >> Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
> >> Trifork A/S   |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 
> >> 8732 8787  |  www.trifork.com<http://www.trifork.com/>
> >>
> >> Trifork organizes the world class conference on software development: GOTO 
> >> Aarhus<http://www.gotocon.com/>  - check it out!
> >>
> >> [cid:part1.09040606.08080401@trifork.com]
> >>
> >> On Aug 30, 2011, at 6:14 PM, David Koblas wrote:
> >>
> >> I'm currently working on importing a very large dataset (800M) into Riak 
> >> and running into some serious performance problems.  Hopefully this is 
> >> just configuration issues and nothing deeper...
> >>
> >> Hardware -
> >>   * 8 proc box
> >>   * 32 Gb ram
> >>   * 5TB disk - RAID10
> >>
> >> Have a cluster of 4 for these boxes all running riak - riak configuration 
> >> options that are different from stock:
> >>
> >>   * Listening on all IP address "0.0.0.0"
> >>   * {storage_backend, riak_kv_innostore_backend},
> >>   * innostore section - {buffer_pool_size, 17179869184}, %% 16GB
> >>   * innostore section - {flush_method, "O_DIRECT"}
> >>
> >> What I see is that the performance of my import script runs at about 
> >> 200...300 keys per/second for keys that it's seen recently (e.g. re-runs) 
> >> then drops to 20ish keys per/sec for new keys.
> >> STATS: 1000 keys handled in 3 seconds 250.75 keys/sec
> >> STATS: 1000 keys handled in 3 seconds 258.20 keys/sec
> >> STATS: 1000 keys handled in 4 seconds 240.11 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 177.63 keys/sec
> >> STATS: 1000 keys handled in 4 seconds 246.26 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 184.79 keys/sec
> >> STATS: 1000 keys handled in 5 seconds 195.95 keys/sec
> >> STATS: 1000 keys handled in 47 seconds 21.02 keys/sec
> >> STATS: 1000 keys handled in 44 seconds 22.63 keys/sec
> >> STATS: 1000 keys handled in 42 seconds 23.64 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 22.88 keys/sec
> >> STATS: 1000 keys handled in 45 seconds 22.12 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 22.83 keys/sec
> >> STATS: 1000 keys handled in 43 seconds 23.11 keys/sec
> >> Of course with 800M records to import a performance of 20 keys/sec is not 
> >> useful, plus as time goes on having an insert rate at that level is going 
> >> to be problematic.
> >>
> >> Questions -
> >>   Is there additional things to change for imports and datasets on this 
> >> scale?
> >>   Is there a way to get additional debugging to see where the performance 
> >> issues are?
> >>
> >> Thanks,
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >>
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to