You should really look at Nutch. from the website http://lucene.apache.org/nutch: Nutch is open source web-search software. It builds on Lucene Java<http://lucene.apache.org/java/>, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
sounds like a good place to start, doesn't it :) simon On Mon, Jan 11, 2010 at 2:47 AM, <jyzhou...@yahoo.com> wrote: > Hi, > > Have you implemented such web search in your web application development? > As detailed as possible. example: > 1) index: ? > 2) search: Lucene > > Please do advise. > > Thanks. > > > --- On *Sat, 9/1/10, Simon Willnauer <simon.willna...@googlemail.com>*wrote: > > > From: Simon Willnauer <simon.willna...@googlemail.com> > Subject: Re: a complete solution for building a website search with lucene > To: java-user@lucene.apache.org > Date: Saturday, 9 January, 2010, 6:16 PM > > I don't know that much about nutch but hadoop shouldn't really run > under windows in production. If you use windows for development this > should not be a big issue. > Oatis is right you should use cygwin together with hadoop. look at > http://wiki.apache.org/hadoop/FAQ for initial info. > > simon > > On Sat, Jan 9, 2010 at 5:20 AM, Otis Gospodnetic > <otis_gospodne...@yahoo.com<http://mc/compose?to=otis_gospodne...@yahoo.com>> > wrote: > > Nutch is written in Java, so Nutch itself *should* work on other > non-Linux OSs that the JVM supports. > > But it does contain some shell scripts, as does Hadoop that Nutch uses. > Oh, I guess Windows people run it under Cygwin? > > Otis > > -- > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > > > > > ----- Original Message ---- > >> From: "jyzhou...@yahoo.com <http://mc/compose?to=jyzhou...@yahoo.com>" > <jyzhou...@yahoo.com <http://mc/compose?to=jyzhou...@yahoo.com>> > >> To: > >> java-user@lucene.apache.org<http://mc/compose?to=java-u...@lucene.apache.org> > >> Sent: Fri, January 8, 2010 5:03:41 AM > >> Subject: Re: a complete solution for building a website search with > lucene > >> > >> Hi Paul, > >> > >> Thanks. > >> Use Nutch to do crawling. and integrate Lucene to the web application, > so that > >> can do search online. > >> > >> BTW, Nutch seems to have only Linux version, what my development is on > Windows. > >> Am i right? > >> > >> Zhou > >> > >> --- On Fri, 8/1/10, Paul Libbrecht wrote: > >> > >> From: Paul Libbrecht > >> Subject: Re: a complete solution for building a website search with > lucene > >> To: > >> java-user@lucene.apache.org<http://mc/compose?to=java-u...@lucene.apache.org> > >> Date: Friday, 8 January, 2010, 4:27 PM > >> > >> Zhou, > >> > >> Lucene is a back-end library, it's very useful for developer but it is > not a > >> complete site-search-engine. > >> A lucene-based site-search-engine is Nutch, it does crawl. > >> Solr also provides functions close to these with a large amount of > thoughts on > >> flexible integration; crawling methods are rather based on feeds or > other > >> acquisition methods (see DIH for example). > >> > >> paul > >> > >> > >> > >> > >> Le 08-janv.-10 à 08:08, a écrit : > >> > >> > Hi , > >> > > >> > I am new in Lucene. > >> > > >> > To build a web search function, it need to have a backendc indexing > function. > >> But, before that, should run a Crawler? because Lucene index based on > Html > >> documents, while Crawler can change the website pages to Html documents. > Am i > >> right? > >> > > >> > If so, please anyone suggest to me a Crawler? like Nutch? > >> > Thanks > >> > Zhou > >> > > >> > > >> > > >> > > >> > New Email names for you! > >> > Get the Email name you've always wanted on the new @ymail and > @rocketmail. > >> > Hurry before someone else does! > >> > http://mail.promotions.yahoo.com/newdomains/sg/ > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: > >> java-user-unsubscr...@lucene.apache.org<http://mc/compose?to=java-user-unsubscr...@lucene.apache.org> > >> For additional commands, e-mail: > >> java-user-h...@lucene.apache.org<http://mc/compose?to=java-user-h...@lucene.apache.org> > >> > >> > >> > >> > >> New Email names for you! > >> Get the Email name you've always wanted on the new @ymail and > @rocketmail. > >> Hurry before someone else does! > >> http://mail.promotions.yahoo.com/newdomains/sg/ > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > java-user-unsubscr...@lucene.apache.org<http://mc/compose?to=java-user-unsubscr...@lucene.apache.org> > > For additional commands, e-mail: > > java-user-h...@lucene.apache.org<http://mc/compose?to=java-user-h...@lucene.apache.org> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org<http://mc/compose?to=java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-h...@lucene.apache.org<http://mc/compose?to=java-user-h...@lucene.apache.org> > > > ------------------------------ > New Email names for you! > <http://sg.rd.yahoo.com/sg/mail/domainchoice/mail/signature/*http://mail.promotions.yahoo.com/newdomains/sg/> > Get the Email name you've always wanted on the new @ymail and @rocketmail. > Hurry before someone else does! >