Finally (sorry I keep making separate messages) --
The reason why I was seeking a FdInputStream solution is because it seems
to be much faster than an MMAP solution.
Although my file is quite large (10GB) -- memory is not much of a concern.
How does one copy from InputStreamMessageReader into the
MallocMessageReader ?
On Thursday, July 20, 2017 at 5:30:30 PM UTC-7, Farid Zakaria wrote:
>
> I had to actually store the FlatArrayMessageReader rather than the
> Message::Reader for it to work ?
> I think i'm not grokking why that matters -- I thought
> FlatArrayMessageReader is just a pointer into the MMAP file.
> Why would it matter if it cast it to the reader ?
>
>
> hmm.
>
> On Thursday, July 20, 2017 at 5:25:00 PM UTC-7, Farid Zakaria wrote:
>>
>> All the items in my message array seem to be always pointing to the last
>> item read.
>> I'm not sure what I'm doing wrong here.
>>
>>
>> auto messages = std::make_unique<std::deque<Message::Reader *> >(10);
>>
>> while (words.size() > 0) {
>> capnp::FlatArrayMessageReader * reader = new
>> capnp::FlatArrayMessageReader(words);
>> Message::Reader message = reader->getRoot<Message>();
>> words = kj::arrayPtr(message->getEnd(), words.end());
>> messages->at(index++) = & message;
>> }
>>
>>
>> On Thursday, July 20, 2017 at 4:35:29 PM UTC-7, Kenton Varda wrote:
>>>
>>> On Thu, Jul 20, 2017 at 3:40 PM, Farid Zakaria <[email protected]>
>>> wrote:
>>>
>>>> Is MMAP the only way to randomly seek to an offset in the file?
>>>>
>>>> I can't seem to find a way to do that with kj::FdInputStream ?
>>>>
>>>>
>>>> I'm trying to create an index of the elements in the file.
>>>>
>>>
>>> kj::InputStream doesn't assume the stream is seekable and doesn't track
>>> the current location. You could create a custom wrapper around InputStream
>>> or around BufferedInputStream that remembers how many bytes have been read.
>>> You can also lseek() the underlying fd directly, though of course you'll
>>> have to discard any buffers after that.
>>>
>>> But indeed, if you use mmap() this will all be a lot easier, and faster.
>>> I highly recommend using mmap() here.
>>>
>>> On Thu, Jul 20, 2017 at 4:14 PM, Farid Zakaria <[email protected]>
>>> wrote:
>>>
>>>> One more question =)
>>>>
>>>> I need to copy the root from a FdStream to a vector
>>>> Do I need to copy it into a MallocMessageBuilder ?
>>>>
>>>
>>> With InputStreamMessageReader, yes. You have to destroy the
>>> InputStreamMessageReader before you can read the next message, and that
>>> invalidates the root Reader and all other Readers pointing into it.
>>>
>>> However, with the mmap strategy, you don't need to delete the
>>> FlatArrayMessageReader before reading the next message. So, you can
>>> allocate them on the heap and put them into your vector, and then all the
>>> Readers pointing into them remain valid, as long as the
>>> FlatArrayMessageReaders exist and the memory is still mapped. (In this case
>>> you should remove the madvise() line since you plan to go back and randomly
>>> access the data later.)
>>>
>>> Again, I *highly* recommend this strategy instead of using a stream.
>>> With the mmap strategy, not only do you avoid copying into a builder, but
>>> you avoid copying the underlying data when you read it. The operating
>>> system causes the memory addresses to point directly at its in-memory cache
>>> of the file data. If multiple programs mmap() the same file, they share the
>>> memory, rather than creating their own copies. Moreover, the operating
>>> system is free to evict the data from memory and then load it again later
>>> on-demand. There are tons of advantages to this approach and it is exactly
>>> what Cap'n Proto is designed to enable.
>>>
>>> -Kenton
>>>
>>
--
You received this message because you are subscribed to the Google Groups
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.