HI,
I am using HighFreqTerms class to compute the high frequent terms in the
Lucene index and it works well. However, I am interested to compute the high
frequent terms under some condition. I would like to compute the high
frequent terms not for all documents in the index instead only for documen
Hi,
Does lucene have a query expansion class () which works regardless of the
intended language (e.g., it shouldn’t be based on Wordnet).
It doesn’t matter if the expanded terms can be stored in the index or can be
obtained in the run time.
I googled and found SynonymAnalyzer however, I couldn
Hi,
here is the full part of the code:
public static void doPagingSearch(BufferedReader in, Searcher searcher,
Query query,
int hitsPerPage, boolean raw, boolean
interactive) throws IOException, ParseException,
InvalidTokenOffsetsException {
Hi,
no hits are not null, I can print all retrieved docuemtns without problem.
--
View this message in context:
http://lucene.472066.n3.nabble.com/highlighter-by-using-term-offsets-tp3527712p3533380.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---
I'm writing a highlighter by using term offsets as follows:
IndexReader reader = IndexReader.open( indexPath );
TermPositionVector tpv = (TermPositionVector)reader.getTermFreqVector(
hits[i].doc,"contents");
When I run the searcher, I face this error in
TermPositionVector t
Hi,
Thanks for your useful comments:
here I could do what I want with the highlighter which work with lucene 3:
QueryScorer scorer = new QueryScorer(query, reader, "contents");
Highlighter highlighter = new Highlighter(scorer);
String fragment = highlighter.get
Hi Uwe,
Thanks for your answer. I am using now lucene-highlighter-3.0.3 but the
problem I have this error:
“SpanScorer can’t be resolved as a type”
> SpanScorer scorer = new SpanScorer(query, fieldName, new
> CachingTokenFilter(stream));
I checked the class path and there were no old versi
Hi,
I have a problem with lucene highlighter. I couldn’t make it run. The
compilation is without error but when I run it I got this error “Exception
in thread "main"
java.lang.NoSuchMethodError:org.apache.lucene.analysis.TokenStream.next(Lorg/apache/lucene/analysis/Token;)Lorg/apache/lucene/analys
I have no problem with indexing performance. I indexed the 60 000 (sentences)
text files with only few minutes.
I have performance problem split the huge file that contains 60 000
sentences into 60 000 text files even I can have an index in sentence level.
I asked if I could read the one huge fi
I can save the sentences in lucene index as extra field which i can call for
example "sentence_content"
--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-one-huge-text-file-tp3191605p3191637.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
I am interested to search in sentence level.
It is a parallel corpora , each sentence in the first language is
equivalence to sentence in the second language. I want to index each
sentence and have some id for each sentence in order when I retrieve it I go
easily and retrieve its equivalence in th
Hi,
I have one text file that contains 60 000 sentences. Is there a possibility
to index this file sentence by sentence where each sentence is treated as
one document? What I do now is splitting the huge text files into 60 000
sentences then index them. This work is not easy because I have few hug
thanks for your kind answer
--
View this message in context:
http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3182340.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
thanks for your reply
--
View this message in context:
http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180435.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To
thanks for your reply
--
View this message in context:
http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180432.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To
HI,
Currently my text source files (800 000) are stored in folder which make
retrieving it by many users some how slow. I heard it might be possible that
these files content can be stored in the index it self although I found this
unrealistic.
Is it possible storing the source text files conten
i found the solution:
in WEB-INF\lib
was the old version so I replaced it with the new one
--
View this message in context:
http://lucene.472066.n3.nabble.com/problem-with-the-lucene-and-tomcat-server-tp2508060p2508186.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Hi All,
I have an application in java use lucene 3.0.3 which run fine. I wanted to
use servlet to make this application as web application. However, I got this
error:
java.lang.NoSuchMethodError:
org.apache.lucene.store.FSDirectory.open(Ljava/io/File;)Lorg/apache/lucene/store/FSDirectory;
I se
Dear Erick ,
thanks a lot, I placed the jar file in WEB-INF\lib and it works.
best
--
View this message in context:
http://lucene.472066.n3.nabble.com/java-lang-NoClassDefFoundError-org-apache-lucene-search-similar-MoreLikeThis-tp2036296p2037181.html
Sent from the Lucene - Java Users mailing
Hi All,
I am using MoreLikeThis class in lucene to find more similar documents in
the index to the giving one. It works fine when I run it directly from
Eclipse but when I call it from my servlet I have this error:
“java.lang.NoClassDefFoundError:org/apache/lucene/search/similar/MoreLikeThis“
Thanks for the answer. My request is might easier.
I will describe it in basic way:
1- I submit a query
2- I retrieve the matched documents
3- From this matched document I need to ´have a list of terms based on their
high co-occurrence.
Currently I could do this for the whole index but I still
Extract the high frequent terms in the search result set.
I need to know how to extract the most frequent terms in the search result
set after submitting the query.
Here the class where you can use to extract the most frequent terms from the
index:
int j=0;
int numTerms=5;
Hi Mic,
I tried like this:
String indexName = "path";
IndexReader r = IndexReader.open(indexName);
MoreLikeThis mlt = new MoreLikeThis(r);
. .
. .
. .
. .
BooleanQuery result = (BooleanQuery) mlt.like(docNum);
result.add(query, BooleanClause.Occur.MUST_NOT);
how I can print t
HI Mike,
I implemented MoreLikeThis but I couldn't figure out where or how to print
the related term to the given query. All what I got is the relevant
documents to the query with their scores.
Any idea how to get the related terms?
--
View this message in context:
http://lucene.472066.n3.nab
Hi,
I did as it is explained in the website:
final Set terms = new HashSet();
query = searcher.rewrite(query);
query.extractTerms(terms);
for(Term t : terms){
int frequency = searcher.docFreq(t);
}
however I can't understa
HI Chris,
I tried your solution and got one problem "the method
extractterms(Set) is undefined for the type Query"
this is the ocde:
Query query = QueryParser.parse(line, "contents", analyzer);
//System.out.println("Searching for: " + query.toString("contents"));
Hits hits = s
Hi,
I need to expand the query with the most terms occurred with it in
documents. For example: the word credits, tax, withdraw have high appearing
with Bank. So my query is “Bank” and the result should be ranked list of the
most frequent terms with "Bank"
I could do that as I explained but not
I need to find the most frequent terms that are appeared with a query.
HighFreqTerms.java can be used only to obtain the high frequency terms in
the whole index.
I need just to find the high frequency terms to the submitted query.
What I do now is:
I search the index with the query and retr
Hello ,
I could successfully implement the Chinese analyzer (CJKAnalyzer) and search
Chinese text. However, I have problem when I use the Boolean operator AND
then I got always 0 hits. When I search for the 2 Chinese terms without the
“AND” operator is no problem, When I want to count only the
Hi,
I am indexing a set of html websites using lucene (IndexHtml). The indexer
work fine and I can also find the indexed term but the problem this class
(IndexHtml) index all text inside the html site even the advertisements. I
am interested just in the body text and not interested in the adverti
p, HBase, UIMA, NLP, NER, IR
>
>
>
> - Original Message
>> From: starz10de
>> To: java-user@lucene.apache.org
>> Sent: Friday, July 24, 2009 4:50:22 PM
>> Subject: Cosine similarity
>>
>>
>> Does lucene use cosine smiliarity measu
How to get the most frequent terms in the index in descending order?
Thanks
--
View this message in context:
http://www.nabble.com/most-frquent-term-in-the-index-tp24651807p24651807.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---
Does lucene use cosine smiliarity measure to measure the similarity between
the query and the indexed documents?
Thanks
--
View this message in context:
http://www.nabble.com/Cosine-similarity-tp24651759p24651759.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
Hi All,
I am reading the index and printing the index terms and their corresponding
paths.
I can print the index terms but I don't know if there is any possibilites to
print the coressbonds paths, i can just print the docid, but i need to print
the paths as it is possible in searcher (query).
>
> You can add to the same field as often as you want and it just appends the
> content of calls 2 to N to the same field.
>
>
> Best
> Erick
>
>
> On Wed, Jul 23, 2008 at 3:42 AM, starz10de <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi Erik,
>>
?
I am new to lucene and I don't know how to use this "Field.Store.YES" to
store whole text.
Best regards
Farag
starz10de wrote:
>
> Could any one tell me please how to print the content of the document
> after reading the index.
> for example if
Could any one tell me please how to print the content of the document after
reading the index.
for example if i like to print the index terms then i do :
IndexReader ir = IndexReader.open(index);
TermEnum termEnum = ir.terms();
while (termEnum.next()) {
TermDocs dok =
19, 2008, at 6:00 AM, starz10de wrote:
>
>>
>> Hi All,
>>
>> I have a text files that contain several sentences, there is space
>> between
>> each sentence.
>> When searching the index , i get the path for the documents that
>> match the
Hi All,
I have a text files that contain several sentences, there is space between
each sentence.
When searching the index , i get the path for the documents that match the
query
String path = doc.get("path");
Is it possible to get the number of the sentence that match the query
inside the
Hi All,
It might be easy question, but for new one as me in lucene it is not that
easy. I want to print the text files before indexing them in lucene , I did
try to do it , but i could just print the index content where we see the
kewowrds and document nr and frequency. I need beside that to pr
the constructor IndexWriter(string, myanalyzer, boolean) is
not defined "
I think there is no problem inside the code of myAnalyzer.java as i did some
test where i just change the name of StandardAnalyzer and then i got the
same error.
Thnaks
Farag
Marcelo Schneider wrote:
>
> starz10de esc
Hi All,
I am new in lucene!
I am trying to do my own nalyzer (myAnalyzer) in lucene. I worte it and I
compile it, then i add myAnlayzer.class to the folder
\org\apache\lucene\analysis and then i create new jar files which
contains myAnalyzer and the other files, then i imported myanalyzer i
root-directory' you specify. So what you are trying to do won't work
> unless
> you modify the source to do what you want. It would not be that difficult
> to
> do.
>
> JohnG.
>
> -Original Message-
> From: starz10de [mailto:[EMAIL PROTECTED]
> Sen
possible for
lucene to index multiple folderes in same time and put them in several
indexes?
thanks
John Griffin-3 wrote:
>
> Starz,
>
> How about your code so we can see what you are doing? We're flying blind
> here.
>
> John G.
>
> -----Original Message-
Hi all,
I am new to lucene , is it possible to Index different files in different
folders in lucene
for examples , i have two folderes a and b , each contain several files.
in lucene args i wrote : c:\a\ , c:\b\ but it does index only the first
files in folder A and it doesnt index any files
Hello all,
I am printing luecene index content and I successed but I don't know how to
print the indexed file names.
System.out.println(dok.doc() );
here it printed the doc ID , but I need the document name. for exxample doc
ID =1 , the file name = F1, how to print the file name F1.
than
karl wettin-3 wrote:
>
>
> 3 mar 2007 kl. 23.18 skrev starz10de:
>
>>>>>
>>>>> IndexReader ir = IndexReader.open("index");
>>>>>
>>>>> TermEnum terms=ir.terms();
>>>>>
>>>>>
karl wettin-3 wrote:
>
>
> 3 mar 2007 kl. 22.31 skrev starz10de:
>
>>>
>>> hi Karl ,
>>>
>>> but the problem is that the getReader is not defined for type
>>> indexReader
>>> !!
>>>
>>> this is my co
karl wettin-3 wrote:
>
>
> 3 mar 2007 kl. 21.25 skrev starz10de:
>>> how i can implement aprioriIndex ?
>
> Oh sorry. That should just be your IndexReader.
>
> --
> karl
>
> hi Karl ,
>
> but the problem is that the getReader is not defined f
karl wettin-3 wrote:
>
>
> 3 mar 2007 kl. 17.06 skrev starz10de:
>
>>
>> I did try this but it is still not working
>>
>> IndexReader ir = IndexReader.open("index");
>>
>> TermDocs dok=ir.termDocs();
>> while (dok.nex
karl wettin-3 wrote:
>
>
> 3 mar 2007 kl. 13.54 skrev starz10de:
>
>> How i can print the index content in order to use them for some
>> application.
>> I did use
>> TermEnum terms=ir.terms();
>> while (terms.next()) {
>>
hi all,
How i can print the index content in order to use them for some application.
I did use
TermEnum terms=ir.terms();
while (terms.next()) {
System.out.println(terms.term().text());
}
I still need to print the document id and the term frequency inside each
document.
52 matches
Mail list logo