On Thu, Jul 20, 2017 at 3:40 PM, Farid Zakaria <[email protected]>
 wrote:

> Is MMAP the only way to randomly seek to an offset in the file?
>
> I can't seem to find a way to do that with kj::FdInputStream ?
>
>
> I'm trying to create an index of the elements in the file.
>

kj::InputStream doesn't assume the stream is seekable and doesn't track the
current location. You could create a custom wrapper around InputStream or
around BufferedInputStream that remembers how many bytes have been read.
You can also lseek() the underlying fd directly, though of course you'll
have to discard any buffers after that.

But indeed, if you use mmap() this will all be a lot easier, and faster. I
highly recommend using mmap() here.

On Thu, Jul 20, 2017 at 4:14 PM, Farid Zakaria <[email protected]>
wrote:

> One more question =)
>
> I need to copy the root from a FdStream to a vector
> Do I need to copy it into a MallocMessageBuilder ?
>

With InputStreamMessageReader, yes. You have to destroy the
InputStreamMessageReader before you can read the next message, and that
invalidates the root Reader and all other Readers pointing into it.

However, with the mmap strategy, you don't need to delete the
FlatArrayMessageReader before reading the next message. So, you can
allocate them on the heap and put them into your vector, and then all the
Readers pointing into them remain valid, as long as the
FlatArrayMessageReaders exist and the memory is still mapped. (In this case
you should remove the madvise() line since you plan to go back and randomly
access the data later.)

Again, I *highly* recommend this strategy instead of using a stream. With
the mmap strategy, not only do you avoid copying into a builder, but you
avoid copying the underlying data when you read it. The operating system
causes the memory addresses to point directly at its in-memory cache of the
file data. If multiple programs mmap() the same file, they share the
memory, rather than creating their own copies. Moreover, the operating
system is free to evict the data from memory and then load it again later
on-demand. There are tons of advantages to this approach and it is exactly
what Cap'n Proto is designed to enable.

-Kenton

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to