He's already running Raid 10* <http://www.loomlearning.com/> Jonathan Langevin Systems Administrator Loom Inc. Wilmington, NC: (910) 241-0433 - jlange...@loomlearning.com - www.loomlearning.com - Skype: intel352 *
On Tue, Aug 30, 2011 at 12:51 PM, Jeremiah Peschka < jeremiah.pesc...@gmail.com> wrote: > InnoStore is going to insert the data in key order. When you attempt to > insert a record that would fit in between two keys (inserting "apple" > between "aardvark" and "Byzantium") you're probably going to get a page > split, just like in an RDBMS. The data needs to be re-shuffled in order to > write it in key order. Despite using MVCC, InnoDB is an index ordered table. > Data is written in key order. LevelDB shouldn't have the ordered write > issues that InnoStore has. > > The procedure to bulk load your data is going to be the same as bulk > loading data for any large Data Warehouse - ordered inserts will help you. > > If there's a way to load your data in smaller ordered chunks, that's going > to help you out too. > > Some other options would be to increase the buffer_pool_size for InnoStore. > InnoDB will use RAM as a disk cache to avoid disk hits. This would be a > "Good Thing". Since you're writing in 8k chunks, roughly, you could also > increase the page size. This would be something to experiment with, but > increasing database page size could improve write performance. Odds are if > your average record size is 8kb, then you're writing multiple pages per > record (key + value), which is causing even more I/O (two or more page reads > per object). Some filesystem tuning could be in order. SQL Server, for > instance, performs sequential reads in 64k chunks. The best practice for > disk performance there is to format NTFS to use 64k blocks. The throughput > between the defaults and 64k reads is amazing. > > Also, you may want to look at your storage configuration in the back end. > If you have that much data, are you using a SAN or DAS? Both of these can > help get additional write performance, especially if you adjust the storage > device to cache writes in a battery backed cache. What kind of drive > configuration do you have? You can get tremendous write performance > improvements by moving to RAID10. > > --- > Jeremiah Peschka - Founder, Brent Ozar PLF, LLC > Microsoft SQL Server MVP > > On Aug 30, 2011, at 9:34 AM, David Koblas wrote: > > > Yes - but the thought of sorting 800M records which are all about 8k in > size is a little daunting... Something like a 6TB sort... Plus it doesn't > answer the ongoing insert problem, which is 20 keys/sec isn't functional. > > > > --david > > > > On 8/30/11 9:27 AM, Kresten Krab Thorup wrote: > >> If you can insert the objects in ascending key order, then innostore > will be much faster than a random insert. > >> > >> Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab > >> Trifork A/S | Margrethepladsen 4 | DK- 8000 Aarhus C | Phone : +45 > 8732 8787 | www.trifork.com<http://www.trifork.com/> > >> > >> Trifork organizes the world class conference on software development: > GOTO Aarhus<http://www.gotocon.com/> - check it out! > >> > >> [cid:part1.09040606.08080401@trifork.com] > >> > >> On Aug 30, 2011, at 6:14 PM, David Koblas wrote: > >> > >> I'm currently working on importing a very large dataset (800M) into Riak > and running into some serious performance problems. Hopefully this is just > configuration issues and nothing deeper... > >> > >> Hardware - > >> * 8 proc box > >> * 32 Gb ram > >> * 5TB disk - RAID10 > >> > >> Have a cluster of 4 for these boxes all running riak - riak > configuration options that are different from stock: > >> > >> * Listening on all IP address "0.0.0.0" > >> * {storage_backend, riak_kv_innostore_backend}, > >> * innostore section - {buffer_pool_size, 17179869184}, %% 16GB > >> * innostore section - {flush_method, "O_DIRECT"} > >> > >> What I see is that the performance of my import script runs at about > 200...300 keys per/second for keys that it's seen recently (e.g. re-runs) > then drops to 20ish keys per/sec for new keys. > >> STATS: 1000 keys handled in 3 seconds 250.75 keys/sec > >> STATS: 1000 keys handled in 3 seconds 258.20 keys/sec > >> STATS: 1000 keys handled in 4 seconds 240.11 keys/sec > >> STATS: 1000 keys handled in 5 seconds 177.63 keys/sec > >> STATS: 1000 keys handled in 4 seconds 246.26 keys/sec > >> STATS: 1000 keys handled in 5 seconds 184.79 keys/sec > >> STATS: 1000 keys handled in 5 seconds 195.95 keys/sec > >> STATS: 1000 keys handled in 47 seconds 21.02 keys/sec > >> STATS: 1000 keys handled in 44 seconds 22.63 keys/sec > >> STATS: 1000 keys handled in 42 seconds 23.64 keys/sec > >> STATS: 1000 keys handled in 43 seconds 22.88 keys/sec > >> STATS: 1000 keys handled in 45 seconds 22.12 keys/sec > >> STATS: 1000 keys handled in 43 seconds 22.83 keys/sec > >> STATS: 1000 keys handled in 43 seconds 23.11 keys/sec > >> Of course with 800M records to import a performance of 20 keys/sec is > not useful, plus as time goes on having an insert rate at that level is > going to be problematic. > >> > >> Questions - > >> Is there additional things to change for imports and datasets on this > scale? > >> Is there a way to get additional debugging to see where the > performance issues are? > >> > >> Thanks, > >> _______________________________________________ > >> riak-users mailing list > >> riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >> > >> > > > > _______________________________________________ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com