Re: Search in HTML code

2006-10-04 Thread Erick Erickson
Don't interpret my reponses as *recommending* a database, since I don't know much about your problem space. It may or may not be the right choice. Mostly, I was thinking that your particular use of lucene as stated wasn't playing to lucene's strengths. It may well be that lucene is a fine choice

Re: Search in HTML code

2006-10-04 Thread John Bugger
Thanks, Erick! I'll try to use LIKE query to database.

Re: Search in HTML code

2006-10-03 Thread Erick Erickson
Sure, anything's possible. Whether Lucene is your best bet may be another question . But in this example, you're not using Lucene to do anything except store the strings. By storing all the data as UN_TOKENIZED, all you're doing is a regex match on the entire HTML text of each document. You might

Re: Search in HTML code

2006-10-03 Thread John Bugger
My crawler indexing crawled pages with these code: Document doc = new Document(); doc.add(new Field("body", page.getHtmlData(), Store.YES, Index.UN_TOKENIZED )); doc.add(new Field("url", page.getUrl(), Store.YES, Index.UN_TOKENIZED)); doc.add(new Field("title", page.getTitle(), Store.YES, Index.TO

Re: Search in HTML code

2006-10-02 Thread Erick Erickson
I guess the thundering silence is rooted in the problem statement. I have a hard time understanding how this index is used. By storing things this way, you'll force the user to know the *exact* format of anything she's looking for. That is, it's hard to search for and get docs containing both an

Search in HTML code

2006-10-02 Thread John Bugger
Hello! I've indexed HTML pages and stored html codes as UN_TOKENIZED fields. So, I need to search for specific tags in those documents, for example: Do I need to write some custom analyzer or something like that? Please help me!