Question about Field.TermVector

2006-07-18 Thread Liao Xuefeng
hi, i'm using lucene 2.0. To index a very long text i use Field.Index.TOKENIZED & Field.Store.NO. I don't know how to get its content (actually, only need words near keywords, like google's results: ... found this keyword here...) without querying database? Someone told me using term vector to p

RE: HTML text extraction

2006-06-22 Thread Liao Xuefeng
hi, all, I wrote my own html parser because it just meets my require and do not depend on 3rd part's lib. and i'd like to share it (in attachment). This class provides some static methods to do html <-> text convertion: HtmlUtil.html2text(String html); HtmlUtil.text2html(String text); a

RE: HTML text extraction

2006-06-21 Thread Liao Xuefeng
hi, i wrote my own html parser to do html2text and it works well. i can send you my code if it matches your require. -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 21, 2006 1:40 PM To: java-user@lucene.apache.org Subject: HTML text extraction Can someo

Use one or more indexes?

2006-06-13 Thread Liao Xuefeng
hi, I'm new to lucene. Now I want to add full-text search for my website to search articles, images and bbs topics. I'm not sure to use only one index to search all types of these, or create 3 indexes for each of type. If I use only one index, do I have to add a 'type' field to identify document