subject:"Re\: Search in HTML code"

Re: Search in HTML code

2006-10-04 Thread Erick Erickson

Don't interpret my reponses as *recommending* a database, since I don't know much about your problem space. It may or may not be the right choice. Mostly, I was thinking that your particular use of lucene as stated wasn't playing to lucene's strengths. It may well be that lucene is a fine choice

Re: Search in HTML code

2006-10-04 Thread John Bugger

Thanks, Erick! I'll try to use LIKE query to database.

Re: Search in HTML code

2006-10-03 Thread Erick Erickson

Sure, anything's possible. Whether Lucene is your best bet may be another question . But in this example, you're not using Lucene to do anything except store the strings. By storing all the data as UN_TOKENIZED, all you're doing is a regex match on the entire HTML text of each document. You might

Re: Search in HTML code

2006-10-03 Thread John Bugger

My crawler indexing crawled pages with these code: Document doc = new Document(); doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED )); doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED)); doc.add(new Field("title", page.getTitle(), Store.YES, Index.TO

Re: Search in HTML code

2006-10-02 Thread Erick Erickson

I guess the thundering silence is rooted in the problem statement. I have a hard time understanding how this index is used. By storing things this way, you'll force the user to know the *exact* format of anything she's looking for. That is, it's hard to search for and get docs containing both an

Re: Search in HTML code

Re: Search in HTML code

Re: Search in HTML code

Re: Search in HTML code

Re: Search in HTML code

5 matches

Site Navigation

Mail list logo

Footer information