Hi All: Just to add simple hack, I had posted at my Blog an entry named "Uploading WikiPedia Dumps to Oracle databases": http://marceloochoa.blogspot.com/2007_12_01_archive.html with instructions to upload WikiPedia Dumps to Oracle XMLDB, it means transforming an XML file to an object-relational storage. Finally, I added instructions to index it with Lucene Domain Index. Best regards, Marcelo.
On Dec 14, 2007 5:08 AM, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > Good pointers, thanks. I asked because I did have a problem like this a few > months ago -- none of the existing parsers solved it for me (back then). > > D. > > > Petite Abeille wrote: > > > > On Dec 13, 2007, at 8:39 AM, Dawid Weiss wrote: > > > >> Just incidentally -- do you know of something that would parse the > >> wikipedia markup (to plain text, for example)? > > > > If you find out, let us know :) > > > > You may want to check the partial ANTLR grammar for Wikitext: > > > > http://www.mediawiki.org/wiki/User:Stevage/ANTLR > > http://lists.wikimedia.org/pipermail/wikitext-l/2007-December/000117.html > > > > This also might be of interest: > > > > http://www.softlab.ntua.gr/~ttsiod/buildWikipediaOffline.html > > > > "the nice people over at woc.fslab.de have created a standalone > > wiki-markup parser which is ready for use" > > http://fslab.de/svn/wpofflineclient/trunk/mediawiki_sa > > There is also Text::MediawikiFormat: > > http://search.cpan.org/~dprice/Text-MediawikiFormat-0.05/lib/Text/MediawikiFormat.pm > > > > Perhaps you will be better off processing the Wikipedia static HTML > > dump, instead of the XML one: > > http://static.wikipedia.org/ > > Not a piece of cake one way or another :( > > Cheers, > > PA. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home ______________ Do you Know DBPrism? Look @ DB Prism's Web Site http://www.dbprism.com.ar/index.html More info? Chapter 17 of the book "Programming the Oracle Database using Java & Web Services" http://www.amazon.com/gp/product/1555583296/ Chapter 21 of the book "Professional XML Databases" - Wrox Press http://www.amazon.com/gp/product/1861003587/ Chapter 8 of the book "Oracle & Open Source" - O'Reilly http://www.oreilly.com/catalog/oracleopen/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]