Finally (sorry I keep making separate messages) -- 

The reason why I was seeking a FdInputStream solution is because it seems 
to be much faster than an MMAP solution.
Although my file is quite large (10GB) -- memory is not much of a concern.

How does one copy from InputStreamMessageReader into the 
MallocMessageReader ?

On Thursday, July 20, 2017 at 5:30:30 PM UTC-7, Farid Zakaria wrote:
>
> I had to actually store the FlatArrayMessageReader rather than the  
> Message::Reader for it to work ?
> I think i'm not grokking why that matters -- I thought 
> FlatArrayMessageReader is just a pointer into the MMAP file. 
> Why would it matter if it cast it to the reader ?
>
>
> hmm.
>
> On Thursday, July 20, 2017 at 5:25:00 PM UTC-7, Farid Zakaria wrote:
>>
>> All the items in my message array seem to be always pointing to the last 
>> item read.
>> I'm not sure what I'm doing wrong here.
>>
>>
>> auto messages = std::make_unique<std::deque<Message::Reader *> >(10);
>>
>> while (words.size() > 0) {
>>     capnp::FlatArrayMessageReader * reader = new 
>> capnp::FlatArrayMessageReader(words);
>>     Message::Reader message = reader->getRoot<Message>();
>>     words = kj::arrayPtr(message->getEnd(), words.end());
>>     messages->at(index++) = & message;
>> }
>>
>>
>> On Thursday, July 20, 2017 at 4:35:29 PM UTC-7, Kenton Varda wrote:
>>>
>>> On Thu, Jul 20, 2017 at 3:40 PM, Farid Zakaria <[email protected]>
>>>  wrote:
>>>
>>>> Is MMAP the only way to randomly seek to an offset in the file?
>>>>
>>>> I can't seem to find a way to do that with kj::FdInputStream ?
>>>>
>>>>
>>>> I'm trying to create an index of the elements in the file.
>>>>
>>>
>>> kj::InputStream doesn't assume the stream is seekable and doesn't track 
>>> the current location. You could create a custom wrapper around InputStream 
>>> or around BufferedInputStream that remembers how many bytes have been read. 
>>> You can also lseek() the underlying fd directly, though of course you'll 
>>> have to discard any buffers after that.
>>>
>>> But indeed, if you use mmap() this will all be a lot easier, and faster. 
>>> I highly recommend using mmap() here.
>>>
>>> On Thu, Jul 20, 2017 at 4:14 PM, Farid Zakaria <[email protected]> 
>>> wrote:
>>>
>>>> One more question =)
>>>>
>>>> I need to copy the root from a FdStream to a vector
>>>> Do I need to copy it into a MallocMessageBuilder ?
>>>>
>>>
>>> With InputStreamMessageReader, yes. You have to destroy the 
>>> InputStreamMessageReader before you can read the next message, and that 
>>> invalidates the root Reader and all other Readers pointing into it.
>>>
>>> However, with the mmap strategy, you don't need to delete the 
>>> FlatArrayMessageReader before reading the next message. So, you can 
>>> allocate them on the heap and put them into your vector, and then all the 
>>> Readers pointing into them remain valid, as long as the 
>>> FlatArrayMessageReaders exist and the memory is still mapped. (In this case 
>>> you should remove the madvise() line since you plan to go back and randomly 
>>> access the data later.)
>>>
>>> Again, I *highly* recommend this strategy instead of using a stream. 
>>> With the mmap strategy, not only do you avoid copying into a builder, but 
>>> you avoid copying the underlying data when you read it. The operating 
>>> system causes the memory addresses to point directly at its in-memory cache 
>>> of the file data. If multiple programs mmap() the same file, they share the 
>>> memory, rather than creating their own copies. Moreover, the operating 
>>> system is free to evict the data from memory and then load it again later 
>>> on-demand. There are tons of advantages to this approach and it is exactly 
>>> what Cap'n Proto is designed to enable.
>>>
>>> -Kenton
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to