Hi Sebastien,

I don't have any answers for you. Only more questions and only about lmdb. 
So, this might be venturing into areas inappropriate for this forum. 
Apologies to any sticklers.

Uncannily, I too am investigating the possibility of using lmdb and Cap'n 
Proto to store data for a new project. I was thinking of doing something 
much simpler but your post has got me thinking. You certainly have an 
interesting idea here.

Can you elaborate more on how you were planning to both read and update 
your data while limiting the need to make memory copies? My understanding 
of the lmdb api leads me to the conclusion that you will need to make at 
least one copy, and probably two in most cases. I am new to both projects 
so perhaps I am missing something.

It appears to me from the lmdb documentation thats its possible to read a 
key-value pair with mdb_get() or mdb_cursor_get() without a copy. However, 
the data returned is only valid until the end of the transaction so I 
expect that most use cases would require a copy so the transaction doesn't 
need to stay open. More importantly, it also appears that the only way one 
can change a value is to call mdb_put() or mdb_cursor_put(), both of which 
involve a copy.

It might be possible to modify the data returned by mdb_get() or 
mdb_cursor_get() even though the documentation is clear that one should not 
do this. Maybe its actually okay in a read-write transaction. My guess is 
that its not. I would expect that modifying the data, even in a read-write 
transaction, could be visible by a concurrent read-only transaction.

Another possibility is to call mdb_put() with the MDB_RESERVE flag. The 
documentation says that "the caller is expected to modify all of the space 
requested". However, its possible that this causes the pages covered by the 
value to be copied thus the data returned may actually be valid if you 
didn't change the value size. Sounds plausible but feels risky.

Do you have some other trick up your sleeve? Is there some part of the api 
that I have missed?

Frank 

On Saturday, September 23, 2017 at 12:59:34 PM UTC-5, Sebastien Diot wrote:
>
> Hi.
>
> Short version:
>
> Can I "drop the last segment", if it only contains orphans?
>
> Long version:
>
> I've got kind of a "mental dead-lock" trying to decide how to store data 
> for my new project. Currently, I'm considering if I can use Lightning 
> Memory-Mapped Database (LMDB), together with Cap'n Proto.
>
> I've seen "mmap-friendly mutable storage format" in the RoadMap, but I'm 
> not sure what it precisely means.
>
> My concept would look about like this:
>
> 1) To be able to store multiple "tables" and "indices" and update them in 
> a transactional way in LMDB, they must all be in the same LMDB file. So, I 
> would first add a fixed-size prefix (i.e. 2 bytes) to my keys, as a 
> table/index-ID.
> 2) Then comes the row/entity ID, say 8 bytes.
> 3) Then comes the *segment ID* (4 bytes, AFAIK, based on the Cap'n Proto 
> documentation).
>
> That way, I'm hoping I could spread a multi-segment Cap'n Proto message 
> over multiple keys in LMDB, and be able to both read and update my data 
> while limiting the need to make memory copies. I haven't actually tried 
> this, so it's all theoretical, but it "sounds possible" (correct me if I'm 
> wrong).
>
> Currently, the main problem I see, is that you cannot append to lists. You 
> have to replace the list with a bigger one, thereby creating an orphan, 
> thereby "wasting memory". Or possibly, if that fits your use-case, create a 
> "linked-list" of the lists, one after the other. But that way you loose 
> O(1) access to lists elements, as you first have to iterate over the 
> "linked list", to get at the desired list.
>
> One way for me to map the data of my "largest" entity, and the changes to 
> it, would be to have a variable-size map (key-value pair), containing 
> "changes".
>
> I would start by allocating a segment that would be exactly the right size 
> for all the data, without the changes map.
>
> Then I would add a "change map" with some small start size, for example 64 
> entries. This would cause the creation of a new segment (and a second 
> key-value pair in the DB). When the changes map fills, I would make a 
> temporary copy, make the map an orphan, drop the last segment (which only 
> contains an orphan), create a new map twice the size, causing the 
> allocation of a new segment, and copy the temp data back into it.
>
> While the "changes map" would need an occasional copy, the main part of 
> the message, the first segment, would not, and by "replacing" the last 
> segment, I would not be "wasting memory", while also not needing to make a 
> full copy of the message to eliminate space wasted by orphans.
>
> So,
>
> A) Can I "drop the last segment", if it only contains orphans?
> B) Is there any obvious reason I could not (effectively) use Cap'n Proto 
> with a memory-mapped DB, assuming A) is possible, and my "entities" have 
> maximum one "variable-sized" part?
>
> Regards,
> Sebastien Diot
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to