> > How would I search for > > an equation on the web, and how could I "grep" through a set of > > papers using a regular expression containing mathematics? > > > > The semantics of mathematical notation has to be considered in regex > > as well: a*b is the same as b*a, and the regex engine would have to > > know that. > > > XPath/XQuery would work pretty well for regex-like queries of > hierarchical MathML (XML) expressions. The bigger problem is getting > people to publish it. As a starting point, arXiv archives and parses > LaTeX, which it renders to PDF. A system like arXiv that is set up to > process user input for a search engine could be elaborated to do, for > example, LaTeX -> MathML conversions (just for the sake of making search > systematic) and over time to encourage authors to provide MathML input > that was machine readable by Mathematica, etc.
The lack of a common format is the real problem. That's cultural rather than technological. You see math on the web as gifs, PDFs, MathML, all kinds of different things. There's a company called Powerset which is working on Web searches via natural language processing. That might result in some improvements for math searches but it really doesn't solve the problem. Google Code and koders.com both deal with problems that are kind of similar, but not really. -- Giles Bowkett http://www.gilesgoatboy.org http://gilesbowkett.blogspot.com http://gilesgoatboy.blogspot.com ============================================================ FRIAM Applied Complexity Group listserv Meets Fridays 9a-11:30 at cafe at St. John's College lectures, archives, unsubscribe, maps at http://www.friam.org