Paul Rubin <http://[EMAIL PROTECTED]> writes: > > > How does it do that? It has to scan every page in the entire wiki?! > > > That's totally impractical for a large wiki. > > > > So you want to say that c2 is not a large wiki? :-) > > I don't know how big c2 is. My idea of a large wiki is Wikipedia. > My guess is that c2 is smaller than that.
I just looked at c2; it has about 30k pages (I'd call this medium sized) and finds incoming links pretty fast. Is it using MoinMoin? It doesn't look like other MoinMoin wikis that I know of. I'd like to think it's not finding those incoming links by scanning 30k separate files in the file system. Sometimes I think a wiki could get by with just a few large files. Have one file containing all the wiki pages. When someone adds or updates a page, append the page contents to the end of the big file. That might also be a good time to pre-render it, and put the rendered version in the big file as well. Also, take note of the byte position in the big file (e.g. with ftell()) where the page starts. Remember that location in an in-memory structure (Python dict) indexed on the page name. Also, append the info to a second file. Find the location of that entry and store it in the in-memory structure as well. Also, if there was already a dict entry for that page, record a link to the old offset in the 2nd file. That means the previous revisions of a file can be found by following the links backwards through the 2nd file. Finally, on restart, scan the 2nd file to rebuild the in-memory structure. With a Wikipedia-sized wiki, the in-memory structure will be a few hundred MB and the 2nd file might be a few GB. On current 64-bit PC's, neither of these is a big deal. The 1st file might be several TB, which might not be so great; a better strategy is needed, left as an exercise (various straightforward approaches suggest themselves). Also, the active pages should be cached in ram. For a small wiki (up to 1-2 GB) that's no big deal, just let the OS kernel do it or use some LRU scheme in the application. For a large wiki, the cache and possibly the page store might be spread across multiple servers using some pseudo-RDMA scheme. I think the WikiMedia software is sort of barely able to support Wikipedia right now, but it's pushing its scaling limits. Within a year or two, if the limits can be removed, Wikipedia is likely to reach at least 10 times its present size and 1000 times its traffic volume. So the question of how to implement big, high-traffic wikis has been on my mind lately. Specifically I ask myself how Google would do it. I think it's quite feasible to write Wiki software that can handle this amount of load, but none of the current stuff can really do it. -- http://mail.python.org/mailman/listinfo/python-list