On Thu, 2017-08-24 at 15:33 +0530, Dennis Francis wrote: > Hello, > > As this bug report (https://bugs.documentfoundation.org/show_bug.cgi? > id=109097) suggests > it is highly desirable to improve the lookup time in > multi_type_vector (currently in worst case > it takes O(N)) without position hints. One way to improve the > situation is to store absolute > position of first element of each block in the block instead of the > block size member. This > can improve the lookup time up to log(log(N)) using linear > interpolation search or at least log(N) > if we fallback to binary search after a few initial passes of linear- > interpolation search.
Agreed. That's certainly one way to improve the situation, and I'm pretty sure that's the direction you guys are going (or hoping to go, with our blessing). But I have to point out that, the bug you are referencing may not be the best one to reference to push your agenda, since that performance issue described in the bug report can be solved by switching to using position hints, and modifying the existing code that accesses and updates the matrix content. That's precisely what Markus was suggesting and what we've been doing traditionally prior to this conversation. I'd rather use the argument that, the change you are proposing does make sense from the point of view of making mdds::multi_type_vector more friendly to concurrent read-only access from multiple threads. In that light, use of position hints may not be desirable since it would incur additional costs due to the position hints being additional states that we would need to keep and share between multiple threads. That would require thread locking semantics, which you guys are understandably trying to avoid. > > Down side of this approach is that in the worst case, insert/deletes > incur O(N) cost because > absolute positions stored in each block subsequent to the > delete/insert block have to be changed, This downside is also correct. We should also note that in this scenario, N refers to the number of blocks, not the number of cells, the former of which can be substantially lower in typical real-life scenarios than the latter. One good news is that, this position update can potentially can concurrently if necessary since such operation only modifies the size of one block at any given moment, and the delta size can be applied to all the subsequent blocks independently. I'm not suggesting we should make it multi-threaded from the get-go (I'd rather you not do that, at least not without some benchmarking), but when the worst comes to worst we could go that route. > Before working on any change from my side, I would love to hear > everyone's thoughts on the topic first. Thanks ! So, my short answer is "it makes sense". There are some risks involved especially with regard to the potential cost on insertions and deletions which we just don't know ahead of time. But my gut feeling tells me that it'll probably be "fine for most common use cases". But I have to warn you; making this change would not be a walk in the park. :-) multi_type_vector has tons of code branches to deal with all sorts of situations, and you need to be prepared to tackle all of those branches. I hope you are mentally prepared for it. :-) That's all I gotta say. Please keep me in the loop. Kohei _______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice