OpenLDAP MDB backend for riak

2012-09-27 Thread Howard Chu
Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped 
database library (MDB). Unfortunately Erlang is not a language I've used 
before. Any tips on where to get started?


You can find information on MDB here: http://highlandsun.com/hyc/mdb/

Its read performance is several times faster than anything else out there, and 
scales perfeclty linearly since readers require no locks. Its write 
performance is also quite fast...

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: OpenLDAP MDB backend for riak

2012-09-28 Thread Howard Chu

yaoxinming wrote:

mdb looks very good ,you can use nif ,maybe you can look the eleveldb
project,that's a good example.


Thanks for the tip, I'll look into eleveldb.


2012/9/27 Howard Chu mailto:h...@symas.com>>

Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped
database library (MDB). Unfortunately Erlang is not a language I've used
before. Any tips on where to get started?

You can find information on MDB here: http://highlandsun.com/hyc/__mdb/
<http://highlandsun.com/hyc/mdb/>

Its read performance is several times faster than anything else out there,
and scales perfeclty linearly since readers require no locks. Its write
performance is also quite fast...
--


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: OpenLDAP MDB backend for riak

2012-09-30 Thread Howard Chu

Kresten Krab Thorup wrote:

BETS May be a good template for MDB NIFs since the MDB API looks like bdb. It 
doesn't implement the Riak backend API but an ets subset.

https://github.com/krestenkrab/bets


Many thanks for all the helpful posts. BETS looks like a good starting point 
for a generic MDB driver in Erlang.


Kresten
Trifork

On 28/09/2012, at 13.50, "Howard Chu" mailto:h...@symas.com>> 
wrote:

yaoxinming wrote:
mdb looks very good ,you can use nif ,maybe you can look the eleveldb
project,that's a good example.

Thanks for the tip, I'll look into eleveldb.

2012/9/27 Howard Chu mailto:h...@symas.com> 
<mailto:h...@symas.com>>

Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped
database library (MDB). Unfortunately Erlang is not a language I've used
before. Any tips on where to get started?

You can find information on MDB here: http://highlandsun.com/hyc/__mdb/
<http://highlandsun.com/hyc/mdb/>

Its read performance is several times faster than anything else out there,
and scales perfeclty linearly since readers require no locks. Its write
    performance is also quite fast...
--


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: OpenLDAP MDB backend for riak

2012-09-30 Thread Howard Chu

Kresten Krab Thorup wrote:

Howard,

I took a look at MDB, and it does look quite promising!

Perhaps you can enlighten us a bit on these aspects of MDB:

- How do you back up an MDB instance? Can you do that while it is actively

updating, or do you need to stop it? I'm asking because the operational ease
of the log-structured stores is one of the features that many Riak'ers like
quite a lot.

Since this was originally developed for OpenLDAP slapd we currently use 
slapcat to backup an MDB instance. That won't be useful in general though, so 
we will be providing an mdb_dump/mdb_load utility shortly. There is no need to 
stop the database while performing a backup, but...
  MDB uses MVCC, so once a read transaction starts, it is guaranteed a 
self-consistent view of the database until the transaction ends.
  Keeping a read txn open for a very long time will prevent the dirty page 
reclaimer from re-using the pages that are referenced by that read txn. As 
such, any ongoing write activity will be forced to use new pages. The DB size 
can grow very rapidly in these situations, until the read txn ends.



- What is the typical response-time distribution for inserts? I've tried
to

work with BDB some time back, and one of the issues with that is that every
once in a while it slows down quite significantly as B-tree rebalancing makes
some requests unreasonably slow.

MDB is also a B+tree; at a high level it will have similar characteristics to 
BDB. It's always possible for an insert that results in a page split to have a 
cascade effect, causing page splits all the way back up to the root of the 
tree. But in general MDB is still more efficient than BDB so the actual 
variance will be much smaller.


Also note that if you're bulk loading records in sorted order, using the 
MDB_APPEND option basically degenerates into sequential write operations - 
when one page fills, instead of splitting it in half as usual, we just 
allocate a new sibling page and continue filling it. Page "splits" of this 
sort can still ripple upward, but they're so cheap as to be unmeasurable.



- Does an MDB instance exploit multiple cores? If so, what is the
structure

of this usage? In Riak, we have the benefit that a typical Riak node runs
multiple independent databases (one for each data partition/vnode), and so at
least that can provide some concurrency to better leverage I/O and CPU
concurrency.

Within an MDB environment, multiple readers can run concurrently with a single 
writer. Readers are never blocked by anything. (Readers don't block writers, 
but as mentioned above, can prevent old pages from being reused.)



Kresten


On Sep 29, 2012, at 9:50 PM, Kresten Krab Thorup 
mailto:k...@trifork.com>> wrote:

BETS May be a good template for MDB NIFs since the MDB API looks like bdb. It 
doesn't implement the Riak backend API but an ets subset.

https://github.com/krestenkrab/bets

Kresten
Trifork

On 28/09/2012, at 13.50, "Howard Chu" mailto:h...@symas.com>> 
wrote:

yaoxinming wrote:
mdb looks very good ,you can use nif ,maybe you can look the eleveldb
project,that's a good example.

Thanks for the tip, I'll look into eleveldb.

2012/9/27 Howard Chu mailto:h...@symas.com> 
<mailto:h...@symas.com>>

Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped
database library (MDB). Unfortunately Erlang is not a language I've used
before. Any tips on where to get started?

You can find information on MDB here: http://highlandsun.com/hyc/__mdb/
<http://highlandsun.com/hyc/mdb/>

Its read performance is several times faster than anything else out there,
and scales perfectly linearly since readers require no locks. Its write
performance is also quite fast...
--




--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: OpenLDAP MDB backend for riak

2012-10-15 Thread Howard Chu

Kresten Krab Thorup wrote:

Howard,

I took a look at MDB, and it does look quite promising!

Perhaps you can enlighten us a bit on these aspects of MDB:

- How do you back up an MDB instance? Can you do that while it is actively

updating, or do you need to stop it? I'm asking because the operational ease
of the log-structured stores is one of the features that many Riak'ers like
quite a lot.

The current mdb.master branch in git now has an mdb_env_copy() function for 
making a backup of an MDB instance (live or not). Also an mdb_copy program for 
invoking it.



- What is the typical response-time distribution for inserts? I've tried
to

work with BDB some time back, and one of the issues with that is that every
once in a while it slows down quite significantly as B-tree rebalancing makes
some requests unreasonably slow.

This post on memcacheDB with MDB may be illuminating:
https://groups.google.com/d/msg/memcached/dxU8iO27ce4/c7YEBegnAlMJ

You can see the response time distribution for BerkeleyDB 4.7, MDB, and the 
original Memcached 1.2.0 code on which MemcacheDB was based. An interesting 
data point is the 4-threaded case, in which MDB average reads are faster even 
than the pure-memory memcached and overall runs with much faster maximum 
duration and much narrower variance.


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: LMDB Backend for Riak

2013-05-03 Thread Howard Chu

Date: Fri, 3 May 2013 12:05:14 -0600
From: Jon Meredith 
To: Brian Hong 



Hi Brian,

Experimental backends for Riak are always exciting.  I haven't played with
it personally, and Basho has no current plans to support it as a leveldb
alternative.

It's worth adding two notes of caution.  First, stores that use mmap for
persistence can suffer from problems around dirty pages.  If you have a
very low update volume and a nice hot set that the operating system can
keep the pages in memory they work nicely.


LMDB doesn't accumulate dirty pages. In its default mode all writes are fully 
synchronous and thus fsync'd immediately.



On some operating systems (specifically Linux), if you have a high update
load, and consequently a large volume of dirty pages (more than
dirty_ratio), I believe all OS level threads for the process are suspended
until the condition is resolved by writing out the pages when the process
is scheduled.

This is bad for latencies in endurance tests.   For something like LDAP
this tradeoff is probably a good one, for Riak it concerns me, and other
platforms may be better behaved to make the option interesting, and the
linux kernel may have begun

Second, I worry about crash resilience - can the internal memory structures
tolerate a kernel panic where the dirty pages are not written, or
potentially worse torn with a partial write.


LMDB uses copy-on-write. It is impossible to corrupt the DB structure in a 
crash. This has already been proven in heavy-duty testing at a couple of 
telcos over the course of several months. Judging from the traffic on this 
list, LevelDB is nowhere near carrier-grade reliability.


http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-March/011320.html
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-February/011246.html
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-January/010692.html
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010171.html
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010287.html
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-September/009351.html



Good luck with your experiments,

Jon



On Wed, May 1, 2013 at 11:05 AM, Brian Hong  wrote:


OpenLDAP Lightning Memory-Mapped Database seems to be getting traction for
it's high performance and similar query (iteration) functionality with
leveldb:
http://symas.com/mdb/

There seems to be an experimental backend for Riak:

https://github.com/alepharchives/emdb

Does anybody know of it's usefulness? Is there any benchmarks on it?

I've heard that leveldb suffers write performance issues:
https://news.ycombinator.com/item?id=5621884

Any chances of Basho guys supporting lmdb as an leveldb alternative to
Riak? It would be awesome! :D

--
Brian(Sung-jin) Hong



--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: LMDB Backend for Riak

2013-05-03 Thread Howard Chu

Date: Fri, 3 May 2013 14:43:33 EDT
From: Kresten Krab Thorup krab at trifork.com
To: Brian Hong 



I tried lmdb in an iOS project recently and while it is indeed fast i also

did run into several issues. I needed something seriously faster than SQLite
and after some experimentation ended up embedding leveldb in stead.


1. When storing objects larger than the page size (default is 4k) then
those

pages are not taken from the free list (since they need to be consecutive) but
always adds to the file size. This means you can easily run out of space.

That restriction was removed a few months ago. (December 2012)


2. The fact that it can only have one write transaction open means that
you

have to be very careful in not causing serious lock congestion.


So in my experience it only makes sense to use in a read-heavy application with 
mostly small values.


Yes, LMDB is designed for read-heavy applications.

--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: LMDB Backend for Riak

2013-05-21 Thread Howard Chu
date volume and a nice hot set that the operating system can
keep the pages in memory they work nicely.

On some operating systems (specifically Linux), if you have a high update
load, and consequently a large volume of dirty pages (more than
dirty_ratio), I believe all OS level threads for the process are suspended
until the condition is resolved by writing out the pages when the process
is scheduled.

This is bad for latencies in endurance tests.   For something like LDAP
this tradeoff is probably a good one, for Riak it concerns me, and other
platforms may be better behaved to make the option interesting, and the
linux kernel may have begun

Second, I worry about crash resilience - can the internal memory
structures tolerate a kernel panic where the dirty pages are not written,
or potentially worse torn with a partial write.

Good luck with your experiments,

Jon



On Wed, May 1, 2013 at 11:05 AM, Brian Hong  wrote:


OpenLDAP Lightning Memory-Mapped Database seems to be getting traction
for it's high performance and similar query (iteration) functionality with
leveldb:
http://symas.com/mdb/

There seems to be an experimental backend for Riak:

https://github.com/alepharchives/emdb

Does anybody know of it's usefulness? Is there any benchmarks on it?

I've heard that leveldb suffers write performance issues:
https://news.ycombinator.com/item?id=5621884

Any chances of Basho guys supporting lmdb as an leveldb alternative to
Riak? It would be awesome! :D


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Migration from memcachedb to riak

2013-07-10 Thread Howard Chu

On 10 July 2013 10:49, Edgar Veiga mailto:edgarmve...@gmail.com>> wrote:

 Hello all!

 I have a couple of questions that I would like to address all of
 you guys, in order to start this migration the best as possible.

 Context:
 - I'm responsible for the migration of a pure key/value store that
 for now is being stored on memcacheDB.
 - We're serializing php objects and storing them.
 - The total size occupied it's ~2TB.

 - The idea it's to migrate this data to a riak cluster with
 elevelDB backend (starting with 6 nodes, 256 partitions. This
 thing is scaling very fast).
 - We only need to access the information by key. *We won't need
 neither map/reduces, searches or secondary indexes*. It's a pure
 key/value store!

 My questions are:
 - Do you have any riak fine tunning tip regarding this use case
 (due to the fact that we will only use the key/value capabilities
 of riak)?


If you only need a pure key/value store, you should consider memcacheDB using 
LMDB as its backing store. It's far faster than memcacheDB using BerkeleyDB.

http://symas.com/mdb/memcache/

I doubt LevelDB accessed through any interpreted language will be anywhere 
near its performance either, though I haven't tested. (Is there a LevelDB 
backend for modular memcache yet?)


Also if you're serializing language objects, you should consider using LMDB as 
an embedded data store. With the FIXEDMAP option you can copy objects to the 
store and then execute the objects directly from the store on future 
retrievals, no deserialization required.



 - It's expected that those 2TB would be reduced due to the levelDB
 compression. Do you think we should compress our objects to on the
 client?


--
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com