conditional High Freq Terms in Lucene index

2012-03-29 Thread starz10de
HI, I am using HighFreqTerms class to compute the high frequent terms in the Lucene index and it works well. However, I am interested to compute the high frequent terms under some condition. I would like to compute the high frequent terms not for all documents in the index instead only for documen

Query expansion in lucene

2011-12-20 Thread starz10de
Hi, Does lucene have a query expansion class () which works regardless of the intended language (e.g., it shouldn’t be based on Wordnet). It doesn’t matter if the expanded terms can be stored in the index or can be obtained in the run time. I googled and found SynonymAnalyzer however, I couldn

Re: highlighter by using term offsets

2011-11-24 Thread starz10de
Hi, here is the full part of the code: public static void doPagingSearch(BufferedReader in, Searcher searcher, Query query, int hitsPerPage, boolean raw, boolean interactive) throws IOException, ParseException, InvalidTokenOffsetsException {

Re: highlighter by using term offsets

2011-11-24 Thread starz10de
Hi, no hits are not null, I can print all retrieved docuemtns without problem. -- View this message in context: http://lucene.472066.n3.nabble.com/highlighter-by-using-term-offsets-tp3527712p3533380.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

highlighter by using term offsets

2011-11-22 Thread starz10de
I'm writing a highlighter by using term offsets as follows: IndexReader reader = IndexReader.open( indexPath ); TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector( hits[i].doc,"contents"); When I run the searcher, I face this error in TermPositionVector t

RE: lucene highlighter

2011-11-20 Thread starz10de
Hi, Thanks for your useful comments: here I could do what I want with the highlighter which work with lucene 3: QueryScorer scorer = new QueryScorer(query, reader, "contents"); Highlighter highlighter = new Highlighter(scorer); String fragment = highlighter.get

RE: lucene highlighter

2011-11-20 Thread starz10de
Hi Uwe, Thanks for your answer. I am using now lucene-highlighter-3.0.3 but the problem I have this error: “SpanScorer can’t be resolved as a type” > SpanScorer scorer = new SpanScorer(query, fieldName, new > CachingTokenFilter(stream)); I checked the class path and there were no old versi

lucene highlighter

2011-11-20 Thread starz10de
Hi, I have a problem with lucene highlighter. I couldn’t make it run. The compilation is without error but when I run it I got this error “Exception in thread "main" java.lang.NoSuchMethodError:org.apache.lucene.analysis.TokenStream.next(Lorg/apache/lucene/analysis/Token;)Lorg/apache/lucene/analys

Re: Index one huge text file

2011-07-22 Thread starz10de
I have no problem with indexing performance. I indexed the 60 000 (sentences) text files with only few minutes. I have performance problem split the huge file that contains 60 000 sentences into 60 000 text files even I can have an index in sentence level. I asked if I could read the one huge fi

Re: Index one huge text file

2011-07-22 Thread starz10de
I can save the sentences in lucene index as extra field which i can call for example "sentence_content" -- View this message in context: http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191637.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Index one huge text file

2011-07-22 Thread starz10de
I am interested to search in sentence level. It is a parallel corpora , each sentence in the first language is equivalence to sentence in the second language. I want to index each sentence and have some id for each sentence in order when I retrieve it I go easily and retrieve its equivalence in th

Index one huge text file

2011-07-22 Thread starz10de
Hi, I have one text file that contains 60 000 sentences. Is there a possibility to index this file sentence by sentence where each sentence is treated as one document? What I do now is splitting the huge text files into 60 000 sentences then index them. This work is not easy because I have few hug

Re: Store the documents content in the index

2011-07-19 Thread starz10de
thanks for your kind answer -- View this message in context: http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3182340.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Store the documents content in the index

2011-07-18 Thread starz10de
thanks for your reply -- View this message in context: http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180435.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

RE: Store the documents content in the index

2011-07-18 Thread starz10de
thanks for your reply -- View this message in context: http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180432.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

Store the documents content in the index

2011-07-17 Thread starz10de
HI, Currently my text source files (800 000) are stored in folder which make retrieving it by many users some how slow. I heard it might be possible that these files content can be stored in the index it self although I found this unrealistic. Is it possible storing the source text files conten

Re: problem with the lucene and tomcat server

2011-02-16 Thread starz10de
i found the solution: in WEB-INF\lib was the old version so I replaced it with the new one -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-the-lucene-and-tomcat-server-tp2508060p2508186.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

problem with the lucene and tomcat server

2011-02-16 Thread starz10de
Hi All, I have an application in java use lucene 3.0.3 which run fine. I wanted to use servlet to make this application as web application. However, I got this error: java.lang.NoSuchMethodError: org.apache.lucene.store.FSDirectory.open(Ljava/io/File;)Lorg/apache/lucene/store/FSDirectory; I se

Re: java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de
Dear Erick , thanks a lot, I placed the jar file in WEB-INF\lib and it works. best -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NoClassDefFoundError-org-apache-lucene-search-similar-MoreLikeThis-tp2036296p2037181.html Sent from the Lucene - Java Users mailing

java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis

2010-12-07 Thread starz10de
Hi All, I am using MoreLikeThis class in lucene to find more similar documents in the index to the giving one. It works fine when I run it directly from Eclipse but when I call it from my servlet I have this error: “java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis“

Re: high frequent terms in the search result set

2010-11-09 Thread starz10de
Thanks for the answer. My request is might easier. I will describe it in basic way: 1- I submit a query 2- I retrieve the matched documents 3- From this matched document I need to ´have a list of terms based on their high co-occurrence. Currently I could do this for the whole index but I still

high frequent terms in the search result set

2010-11-08 Thread starz10de
Extract the high frequent terms in the search result set. I need to know how to extract the most frequent terms in the search result set after submitting the query. Here the class where you can use to extract the most frequent terms from the index: int j=0; int numTerms=5;

Re: High frequency term for the searched query

2010-11-06 Thread starz10de
Hi Mic, I tried like this: String indexName = "path"; IndexReader r = IndexReader.open(indexName); MoreLikeThis mlt = new MoreLikeThis(r); . . . . . . . . BooleanQuery result = (BooleanQuery) mlt.like(docNum); result.add(query, BooleanClause.Occur.MUST_NOT); how I can print t

Re: High frequency term for the searched query

2010-11-05 Thread starz10de
HI Mike, I implemented MoreLikeThis but I couldn't figure out where or how to print the related term to the given query. All what I got is the relevant documents to the query with their scores. Any idea how to get the related terms? -- View this message in context: http://lucene.472066.n3.nab

RE: High frequency term for the searched query

2010-11-05 Thread starz10de
Hi, I did as it is explained in the website: final Set terms = new HashSet(); query = searcher.rewrite(query); query.extractTerms(terms); for(Term t : terms){ int frequency = searcher.docFreq(t); } however I can't understa

Re: High frequency term for the searched query

2010-11-05 Thread starz10de
HI Chris, I tried your solution and got one problem "the method extractterms(Set) is undefined for the type Query" this is the ocde: Query query = QueryParser.parse(line, "contents", analyzer); //System.out.println("Searching for: " + query.toString("contents")); Hits hits = s

RE: High frequency term for the searched query

2010-11-05 Thread starz10de
Hi, I need to expand the query with the most terms occurred with it in documents. For example: the word credits, tax, withdraw have high appearing with Bank. So my query is “Bank” and the result should be ranked list of the most frequent terms with "Bank" I could do that as I explained but not

High frequency term for the searched query

2010-11-04 Thread starz10de
I need to find the most frequent terms that are appeared with a query. HighFreqTerms.java can be used only to obtain the high frequency terms in the whole index. I need just to find the high frequency terms to the submitted query. What I do now is: I search the index with the query and retr

Problem with „AND“ operat or to search Chinese text

2010-01-27 Thread starz10de
Hello , I could successfully implement the Chinese analyzer (CJKAnalyzer) and search Chinese text. However, I have problem when I use the Boolean operator AND then I got always 0 hits. When I search for the 2 Chinese terms without the “AND” operator is no problem, When I want to count only the

Index html sites using IndexHtml

2009-07-26 Thread starz10de
Hi, I am indexing a set of html websites using lucene (IndexHtml). The indexer work fine and I can also find the indexed term but the problem this class (IndexHtml) index all text inside the html site even the advertisements. I am interested just in the body text and not interested in the adverti

Re: Cosine similarity

2009-07-25 Thread starz10de
p, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: starz10de >> To: java-user@lucene.apache.org >> Sent: Friday, July 24, 2009 4:50:22 PM >> Subject: Cosine similarity >> >> >> Does lucene use cosine smiliarity measu

most frquent term in the index

2009-07-24 Thread starz10de
How to get the most frequent terms in the index in descending order? Thanks -- View this message in context: http://www.nabble.com/most-frquent-term-in-the-index-tp24651807p24651807.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. ---

Cosine similarity

2009-07-24 Thread starz10de
Does lucene use cosine smiliarity measure to measure the similarity between the query and the indexed documents? Thanks -- View this message in context: http://www.nabble.com/Cosine-similarity-tp24651759p24651759.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Document path in lucene index

2008-07-25 Thread starz10de
Hi All, I am reading the index and printing the index terms and their corresponding paths. I can print the index terms but I don't know if there is any possibilites to print the coressbonds paths, i can just print the docid, but i need to print the paths as it is possible in searcher (query).

Re: storing the contents of a document in the lucene index

2008-07-24 Thread starz10de
> > You can add to the same field as often as you want and it just appends the > content of calls 2 to N to the same field. > > > Best > Erick > > > On Wed, Jul 23, 2008 at 3:42 AM, starz10de <[EMAIL PROTECTED]> wrote: > >> >> Hi Erik, >>

Re: storing the contents of a document in the lucene index

2008-07-23 Thread starz10de
? I am new to lucene and I don't know how to use this "Field.Store.YES" to store whole text. Best regards Farag starz10de wrote: > > Could any one tell me please how to print the content of the document > after reading the index. > for example if

storing the contents of a document in the lucene index

2008-07-22 Thread starz10de
Could any one tell me please how to print the content of the document after reading the index. for example if i like to print the index terms then i do : IndexReader ir = IndexReader.open(index); TermEnum termEnum = ir.terms(); while (termEnum.next()) { TermDocs dok =

Re: Return the sentence number in the indexed files

2008-07-20 Thread starz10de
19, 2008, at 6:00 AM, starz10de wrote: > >> >> Hi All, >> >> I have a text files that contain several sentences, there is space >> between >> each sentence. >> When searching the index , i get the path for the documents that >> match the

Return the sentence number in the indexed files

2008-07-19 Thread starz10de
Hi All, I have a text files that contain several sentences, there is space between each sentence. When searching the index , i get the path for the documents that match the query String path = doc.get("path"); Is it possible to get the number of the sentence that match the query inside the

Print the text files before indexing them in lucene

2008-07-15 Thread starz10de
Hi All, It might be easy question, but for new one as me in lucene it is not that easy. I want to print the text files before indexing them in lucene , I did try to do it , but i could just print the index content where we see the kewowrds and document nr and frequency. I need beside that to pr

Re: My own nalyzer in lucene

2008-07-09 Thread starz10de
the constructor IndexWriter(string, myanalyzer, boolean) is not defined " I think there is no problem inside the code of myAnalyzer.java as i did some test where i just change the name of StandardAnalyzer and then i got the same error. Thnaks Farag Marcelo Schneider wrote: > > starz10de esc

My own nalyzer in lucene

2008-07-09 Thread starz10de
Hi All, I am new in lucene! I am trying to do my own nalyzer (myAnalyzer) in lucene. I worte it and I compile it, then i add myAnlayzer.class to the folder \org\apache\lucene\analysis and then i create new jar files which contains myAnalyzer and the other files, then i imported myanalyzer i

RE: Index different files in different folders in lucene

2008-07-06 Thread starz10de
root-directory' you specify. So what you are trying to do won't work > unless > you modify the source to do what you want. It would not be that difficult > to > do. > > JohnG. > > -Original Message- > From: starz10de [mailto:[EMAIL PROTECTED] > Sen

RE: Index different files in different folders in lucene

2008-07-06 Thread starz10de
possible for lucene to index multiple folderes in same time and put them in several indexes? thanks John Griffin-3 wrote: > > Starz, > > How about your code so we can see what you are doing? We're flying blind > here. > > John G. > > -----Original Message-

Index different files in different folders in lucene

2008-07-05 Thread starz10de
Hi all, I am new to lucene , is it possible to Index different files in different folders in lucene for examples , i have two folderes a and b , each contain several files. in lucene args i wrote : c:\a\ , c:\b\ but it does index only the first files in folder A and it doesnt index any files

Lucene index content

2007-06-01 Thread starz10de
Hello all, I am printing luecene index content and I successed but I don't know how to print the indexed file names. System.out.println(dok.doc() ); here it printed the doc ID , but I need the document name. for exxample doc ID =1 , the file name = F1, how to print the file name F1. than

Re: [ANN] Printing lucene index content

2007-03-03 Thread starz10de
karl wettin-3 wrote: > > > 3 mar 2007 kl. 23.18 skrev starz10de: > >>>>> >>>>> IndexReader ir = IndexReader.open("index"); >>>>> >>>>> TermEnum terms=ir.terms(); >>>>> >>>>>

Re: [ANN] Printing lucene index content

2007-03-03 Thread starz10de
karl wettin-3 wrote: > > > 3 mar 2007 kl. 22.31 skrev starz10de: > >>> >>> hi Karl , >>> >>> but the problem is that the getReader is not defined for type >>> indexReader >>> !! >>> >>> this is my co

Re: [ANN] Printing lucene index content

2007-03-03 Thread starz10de
karl wettin-3 wrote: > > > 3 mar 2007 kl. 21.25 skrev starz10de: >>> how i can implement aprioriIndex ? > > Oh sorry. That should just be your IndexReader. > > -- > karl > > hi Karl , > > but the problem is that the getReader is not defined f

Re: [ANN] Printing lucene index content

2007-03-03 Thread starz10de
karl wettin-3 wrote: > > > 3 mar 2007 kl. 17.06 skrev starz10de: > >> >> I did try this but it is still not working >> >> IndexReader ir = IndexReader.open("index"); >> >> TermDocs dok=ir.termDocs(); >> while (dok.nex

Re: [ANN] Printing lucene index content

2007-03-03 Thread starz10de
karl wettin-3 wrote: > > > 3 mar 2007 kl. 13.54 skrev starz10de: > >> How i can print the index content in order to use them for some >> application. >> I did use >> TermEnum terms=ir.terms(); >> while (terms.next()) { >>

Printing lucene index content

2007-03-03 Thread starz10de
hi all, How i can print the index content in order to use them for some application. I did use TermEnum terms=ir.terms(); while (terms.next()) { System.out.println(terms.term().text()); } I still need to print the document id and the term frequency inside each document.