I worked out a new version of the swish-e org indexer that indexes custom xml representing the org file that you may find interesting for your project.
http://kitchingroup.cheme.cmu.edu/blog/2015/07/04/An-xml-representation-of-an-org-document-for-indexing-with-swish-e/ It enables a search like this: swish-e -f index-org2xml.swish-e -w src-block.language=python -w src-block=diffusion to find org files with a python source block containing the word diffusion. I think swish-e supports ranking (http://swish-e.org/docs/swish-faq.html#how_is_ranking_calculated_) too, but I have not tried it. It is pretty interesting overall! Oleg Sivokon writes: > John Kitchin <jkitc...@andrew.cmu.edu> writes: > >> You would use org-element. Try org-element-parse-buffer and >> org-element-map and maybe org-element-interpret-data. There's also a >> bunch of regexp for identifying/finding particular types of elements. > > Thanks! I'm already looking into it. > >> That sounds really cool. I recently hacked a swish-e index of my org >> files (there might have been 3000+!) >> http://kitchingroup.cheme.cmu.edu/blog/2015/06/25/Integrating-swish-e-and-Emacs/. >> and >> >> I just updated it to index the html version of an org-file so that I >> take advantage of the structure in the >> search. >> http://kitchingroup.cheme.cmu.edu/blog/2015/07/03/Using-swish-e-to-index-org-files-as-html/. >> It >> would be cool to have more granular searching though. >> >> Is your info project visible >> anywhere? i can imagine a close-file hook function that updates the >> database automatically. > > Whoa, that's a lot of Org files :) What I wrote so far is on Github, but > it's in a very early stage, so it's not something you could just drop > into your Emacs directory and start using right away. > https://github.com/wvxvw/sphinx-mode > I've also looked into Swish some time ago. I also thought about using > Nepomuk, but, in the later case, I've to admit, I didn't make it through > the documentation. > > The difference in using Sphinx is that it has ranking, and it has a > relatively terse way of specifying searching criteria. For example, you > could ask to search for "some words in this phrase"/3 and it would look > for occurances of 3 of 5 words given between the quotes. Or, you could > ask it to search for @node "R" @contents "printf" "format", and this > would search for node titles mentioning "R" and having contents with > words "printf" and "format". > I've to admit I didn't master it fully (there are far more options and > settings) but it does something that seems reasonable (if I compare it > to M-x info-apropos). > > I'm also still trying to learn what's the best way to do indenxing, so > the project is still very raw, but I'll get there one day :) > > The ultimate goal is also to write a more human-friendly interface to > Sphinx, where one could ask questions in a subset of natural language :) > (but that's a very long way into the future!) > > PS. I see that many posts on this list are titled with [O]. What does > it mean, should I do that too? > > Best. > > Oleg -- Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu