OpenLDAP MDB backend for riak
Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped database library (MDB). Unfortunately Erlang is not a language I've used before. Any tips on where to get started? You can find information on MDB here: http://highlandsun.com/hyc/mdb/ Its read performance is several times faster than anything else out there, and scales perfeclty linearly since readers require no locks. Its write performance is also quite fast... -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: OpenLDAP MDB backend for riak
yaoxinming wrote: mdb looks very good ,you can use nif ,maybe you can look the eleveldb project,that's a good example. Thanks for the tip, I'll look into eleveldb. 2012/9/27 Howard Chu mailto:h...@symas.com>> Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped database library (MDB). Unfortunately Erlang is not a language I've used before. Any tips on where to get started? You can find information on MDB here: http://highlandsun.com/hyc/__mdb/ <http://highlandsun.com/hyc/mdb/> Its read performance is several times faster than anything else out there, and scales perfeclty linearly since readers require no locks. Its write performance is also quite fast... -- -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: OpenLDAP MDB backend for riak
Kresten Krab Thorup wrote: BETS May be a good template for MDB NIFs since the MDB API looks like bdb. It doesn't implement the Riak backend API but an ets subset. https://github.com/krestenkrab/bets Many thanks for all the helpful posts. BETS looks like a good starting point for a generic MDB driver in Erlang. Kresten Trifork On 28/09/2012, at 13.50, "Howard Chu" mailto:h...@symas.com>> wrote: yaoxinming wrote: mdb looks very good ,you can use nif ,maybe you can look the eleveldb project,that's a good example. Thanks for the tip, I'll look into eleveldb. 2012/9/27 Howard Chu mailto:h...@symas.com> <mailto:h...@symas.com>> Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped database library (MDB). Unfortunately Erlang is not a language I've used before. Any tips on where to get started? You can find information on MDB here: http://highlandsun.com/hyc/__mdb/ <http://highlandsun.com/hyc/mdb/> Its read performance is several times faster than anything else out there, and scales perfeclty linearly since readers require no locks. Its write performance is also quite fast... -- -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: OpenLDAP MDB backend for riak
Kresten Krab Thorup wrote: Howard, I took a look at MDB, and it does look quite promising! Perhaps you can enlighten us a bit on these aspects of MDB: - How do you back up an MDB instance? Can you do that while it is actively updating, or do you need to stop it? I'm asking because the operational ease of the log-structured stores is one of the features that many Riak'ers like quite a lot. Since this was originally developed for OpenLDAP slapd we currently use slapcat to backup an MDB instance. That won't be useful in general though, so we will be providing an mdb_dump/mdb_load utility shortly. There is no need to stop the database while performing a backup, but... MDB uses MVCC, so once a read transaction starts, it is guaranteed a self-consistent view of the database until the transaction ends. Keeping a read txn open for a very long time will prevent the dirty page reclaimer from re-using the pages that are referenced by that read txn. As such, any ongoing write activity will be forced to use new pages. The DB size can grow very rapidly in these situations, until the read txn ends. - What is the typical response-time distribution for inserts? I've tried to work with BDB some time back, and one of the issues with that is that every once in a while it slows down quite significantly as B-tree rebalancing makes some requests unreasonably slow. MDB is also a B+tree; at a high level it will have similar characteristics to BDB. It's always possible for an insert that results in a page split to have a cascade effect, causing page splits all the way back up to the root of the tree. But in general MDB is still more efficient than BDB so the actual variance will be much smaller. Also note that if you're bulk loading records in sorted order, using the MDB_APPEND option basically degenerates into sequential write operations - when one page fills, instead of splitting it in half as usual, we just allocate a new sibling page and continue filling it. Page "splits" of this sort can still ripple upward, but they're so cheap as to be unmeasurable. - Does an MDB instance exploit multiple cores? If so, what is the structure of this usage? In Riak, we have the benefit that a typical Riak node runs multiple independent databases (one for each data partition/vnode), and so at least that can provide some concurrency to better leverage I/O and CPU concurrency. Within an MDB environment, multiple readers can run concurrently with a single writer. Readers are never blocked by anything. (Readers don't block writers, but as mentioned above, can prevent old pages from being reused.) Kresten On Sep 29, 2012, at 9:50 PM, Kresten Krab Thorup mailto:k...@trifork.com>> wrote: BETS May be a good template for MDB NIFs since the MDB API looks like bdb. It doesn't implement the Riak backend API but an ets subset. https://github.com/krestenkrab/bets Kresten Trifork On 28/09/2012, at 13.50, "Howard Chu" mailto:h...@symas.com>> wrote: yaoxinming wrote: mdb looks very good ,you can use nif ,maybe you can look the eleveldb project,that's a good example. Thanks for the tip, I'll look into eleveldb. 2012/9/27 Howard Chu mailto:h...@symas.com> <mailto:h...@symas.com>> Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped database library (MDB). Unfortunately Erlang is not a language I've used before. Any tips on where to get started? You can find information on MDB here: http://highlandsun.com/hyc/__mdb/ <http://highlandsun.com/hyc/mdb/> Its read performance is several times faster than anything else out there, and scales perfectly linearly since readers require no locks. Its write performance is also quite fast... -- -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: OpenLDAP MDB backend for riak
Kresten Krab Thorup wrote: Howard, I took a look at MDB, and it does look quite promising! Perhaps you can enlighten us a bit on these aspects of MDB: - How do you back up an MDB instance? Can you do that while it is actively updating, or do you need to stop it? I'm asking because the operational ease of the log-structured stores is one of the features that many Riak'ers like quite a lot. The current mdb.master branch in git now has an mdb_env_copy() function for making a backup of an MDB instance (live or not). Also an mdb_copy program for invoking it. - What is the typical response-time distribution for inserts? I've tried to work with BDB some time back, and one of the issues with that is that every once in a while it slows down quite significantly as B-tree rebalancing makes some requests unreasonably slow. This post on memcacheDB with MDB may be illuminating: https://groups.google.com/d/msg/memcached/dxU8iO27ce4/c7YEBegnAlMJ You can see the response time distribution for BerkeleyDB 4.7, MDB, and the original Memcached 1.2.0 code on which MemcacheDB was based. An interesting data point is the 4-threaded case, in which MDB average reads are faster even than the pure-memory memcached and overall runs with much faster maximum duration and much narrower variance. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LMDB Backend for Riak
Date: Fri, 3 May 2013 12:05:14 -0600 From: Jon Meredith To: Brian Hong Hi Brian, Experimental backends for Riak are always exciting. I haven't played with it personally, and Basho has no current plans to support it as a leveldb alternative. It's worth adding two notes of caution. First, stores that use mmap for persistence can suffer from problems around dirty pages. If you have a very low update volume and a nice hot set that the operating system can keep the pages in memory they work nicely. LMDB doesn't accumulate dirty pages. In its default mode all writes are fully synchronous and thus fsync'd immediately. On some operating systems (specifically Linux), if you have a high update load, and consequently a large volume of dirty pages (more than dirty_ratio), I believe all OS level threads for the process are suspended until the condition is resolved by writing out the pages when the process is scheduled. This is bad for latencies in endurance tests. For something like LDAP this tradeoff is probably a good one, for Riak it concerns me, and other platforms may be better behaved to make the option interesting, and the linux kernel may have begun Second, I worry about crash resilience - can the internal memory structures tolerate a kernel panic where the dirty pages are not written, or potentially worse torn with a partial write. LMDB uses copy-on-write. It is impossible to corrupt the DB structure in a crash. This has already been proven in heavy-duty testing at a couple of telcos over the course of several months. Judging from the traffic on this list, LevelDB is nowhere near carrier-grade reliability. http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-March/011320.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-February/011246.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-January/010692.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010171.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010287.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-September/009351.html Good luck with your experiments, Jon On Wed, May 1, 2013 at 11:05 AM, Brian Hong wrote: OpenLDAP Lightning Memory-Mapped Database seems to be getting traction for it's high performance and similar query (iteration) functionality with leveldb: http://symas.com/mdb/ There seems to be an experimental backend for Riak: https://github.com/alepharchives/emdb Does anybody know of it's usefulness? Is there any benchmarks on it? I've heard that leveldb suffers write performance issues: https://news.ycombinator.com/item?id=5621884 Any chances of Basho guys supporting lmdb as an leveldb alternative to Riak? It would be awesome! :D -- Brian(Sung-jin) Hong -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LMDB Backend for Riak
Date: Fri, 3 May 2013 14:43:33 EDT From: Kresten Krab Thorup krab at trifork.com To: Brian Hong I tried lmdb in an iOS project recently and while it is indeed fast i also did run into several issues. I needed something seriously faster than SQLite and after some experimentation ended up embedding leveldb in stead. 1. When storing objects larger than the page size (default is 4k) then those pages are not taken from the free list (since they need to be consecutive) but always adds to the file size. This means you can easily run out of space. That restriction was removed a few months ago. (December 2012) 2. The fact that it can only have one write transaction open means that you have to be very careful in not causing serious lock congestion. So in my experience it only makes sense to use in a read-heavy application with mostly small values. Yes, LMDB is designed for read-heavy applications. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: LMDB Backend for Riak
date volume and a nice hot set that the operating system can keep the pages in memory they work nicely. On some operating systems (specifically Linux), if you have a high update load, and consequently a large volume of dirty pages (more than dirty_ratio), I believe all OS level threads for the process are suspended until the condition is resolved by writing out the pages when the process is scheduled. This is bad for latencies in endurance tests. For something like LDAP this tradeoff is probably a good one, for Riak it concerns me, and other platforms may be better behaved to make the option interesting, and the linux kernel may have begun Second, I worry about crash resilience - can the internal memory structures tolerate a kernel panic where the dirty pages are not written, or potentially worse torn with a partial write. Good luck with your experiments, Jon On Wed, May 1, 2013 at 11:05 AM, Brian Hong wrote: OpenLDAP Lightning Memory-Mapped Database seems to be getting traction for it's high performance and similar query (iteration) functionality with leveldb: http://symas.com/mdb/ There seems to be an experimental backend for Riak: https://github.com/alepharchives/emdb Does anybody know of it's usefulness? Is there any benchmarks on it? I've heard that leveldb suffers write performance issues: https://news.ycombinator.com/item?id=5621884 Any chances of Basho guys supporting lmdb as an leveldb alternative to Riak? It would be awesome! :D -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Migration from memcachedb to riak
On 10 July 2013 10:49, Edgar Veiga mailto:edgarmve...@gmail.com>> wrote: Hello all! I have a couple of questions that I would like to address all of you guys, in order to start this migration the best as possible. Context: - I'm responsible for the migration of a pure key/value store that for now is being stored on memcacheDB. - We're serializing php objects and storing them. - The total size occupied it's ~2TB. - The idea it's to migrate this data to a riak cluster with elevelDB backend (starting with 6 nodes, 256 partitions. This thing is scaling very fast). - We only need to access the information by key. *We won't need neither map/reduces, searches or secondary indexes*. It's a pure key/value store! My questions are: - Do you have any riak fine tunning tip regarding this use case (due to the fact that we will only use the key/value capabilities of riak)? If you only need a pure key/value store, you should consider memcacheDB using LMDB as its backing store. It's far faster than memcacheDB using BerkeleyDB. http://symas.com/mdb/memcache/ I doubt LevelDB accessed through any interpreted language will be anywhere near its performance either, though I haven't tested. (Is there a LevelDB backend for modular memcache yet?) Also if you're serializing language objects, you should consider using LMDB as an embedded data store. With the FIXEDMAP option you can copy objects to the store and then execute the objects directly from the store on future retrievals, no deserialization required. - It's expected that those 2TB would be reduced due to the levelDB compression. Do you think we should compress our objects to on the client? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com