Yup, the C/C++ code is parsed using some templates I wrote utilizing CodeWorker. It would be possible to do the same thing to any other language such as Java or PHP or Perl. Although you'd need an expert understanding of that language's syntax in order to successfully parse it correctly :)

Initially Lucene was never part of the site.
I was using MySQL to store the data, and used MySQL's FULLTEXT searching.
However once I reached 25 million+ rows in a single table, MySQL's FULLTEXT searching ground to a halt. After speaking with the MySQL folks, they told me to use Lucene as their FULLTEXT support doesn't scale well and Lucene is supposed to be one of the best engines around for that.

Since I was already several months into the project with the vast majority of the website written to use the MySQL database, converting entirely over to Lucene would have meant a complete code re-write.

I didn't want to do that so I combined both MySQL and Lucene and used both.

It took over 5 FULL MONTHS of 24/7 100% CPU time to PARSE the C/C++ code and insert it into the database.
And I only did 3,200 of the more than 25,000 projects I still need to parse.

In hindsight I might have chosen to house everything in Lucene, however it would be a major re-write at this point and I'm happy enough right now with my 'merged' approach of PHP, MySQL and Lucene.

Chris Lu wrote:
This is cool!

Seems you parsed the C/C++ code. Is this easy to extend to other
languages, like Java?

And you choose to display the data stored in database, any reason for
that compared to reading it from Lucene index itself?

I feel using Lucene's highlighter may make it easier to read the search results.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to