Can anybody provide me some information about it ? Even a small clue, I'm
kinda stuck on this and the owner of the libraries do not answer emails.
Thanks
On 28 April 2011 13:49, Patrick Diviacco wrote:
> Is Okapi BM25 (its implementation in Lucene:
> nlp.uned.es/~jperezi/
Is Okapi BM25 (its implementation in Lucene:
nlp.uned.es/~jperezi/Lucene-BM25) returning back normalized query scores (in
between 0 and 1) ?
According to Okapi formula the final score should be normalized. Could you
give some information about that ?
thanks
Nevermind, I've solved by indexing the fields with with Field.TermVector.YES
doc.add(new Field("tags", "foo bar", Store.NO, Index.ANALYZED,
Field.TermVector.YES));
On 21 April 2011 10:57, Patrick Diviacco wrote:
> Hi,
>
> for any document, the te
Hi,
for any document, the termFreqVector is always null.
I'm sure the documents are in the collection and the field exist. So where
is the problem ?
for (int i = 0; i < reader.numDocs(); i++){
TermFreqVector tfv = reader.getTermFreqVector(i, "tags");
thanks
ack-trace?
> Also, the query.toString()
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Apr 19, 2011 at 7:40 PM, Patrick Diviacco <
> patrick.divia...@gmail.com> wrote:
>
> > I get the following error message:
> java.lang.UnsupportedOper
I get the following error message: java.lang.UnsupportedOperationException
with Lucene search method: topDocs = searcher.search(booleanQuery, null,
100);
I'm using an old version of Lucene: Lucene 2.4.1 (I cannot upgrade!)
Can you help me to understand why I get such error ?
thanks
This is the c
I've also tried to use older Lucene versions such as:
Lucene 3.1 and Lucene 2.9.4 with no luck.
Thanks
On 19 April 2011 14:48, Patrick Diviacco wrote:
> Hi, I get this error:
>
> Exception in thread "main" java.lang.IncompatibleClassChangeError:
&
Hi, I get this error:
Exception in thread "main" java.lang.IncompatibleClassChangeError:
Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at java.securit
I'm using BM25 Okapi Query form here:
http://nlp.uned.es/~jperezi/Lucene-BM25/
I've a quick question. I've a list of tags: "tag1 tag2 tag3" and I'm
currently passing them to the query in this way:
BM25BooleanQuery okapiQuery = new BM25BooleanQuery("tag1 tag2 tag3",
"tags",
new WhitespaceAnalyzer(
Is there a way to update only 1 doc in the index rather than index the
entire collection everytime there is a change ?
Given a specific field (I use as ID) of my indexed doc, how can I select it
and update its other fields ?
thanks
Ok, I've now seen RAMDirectory class instead and I'm using it together what
the IndexWriter... it should be ok now thanks
On 4 April 2011 13:10, Patrick Diviacco wrote:
> ok Thanks,
>
> When I use IndexWriter, I call addDocument method to add a new instance to
> the index.
Since I need to overwrite an old ramDirectory file and I don't want memory
leaks, I have the following code lines to close first the existing
RAMDirectory and create a new one.
INDEX_DIR.close();
INDEX_DIR = new RAMDirectory();
However, I get the following exception. Should I remove close() line
RAMDirectory. The clue is in the name ...
> >
> >
> > --
> > Ian.
> >
> >
> > On Fri, Apr 1, 2011 at 11:08 AM, Patrick Diviacco
> > wrote:
> > > Is there a way to index data into memory without writing to disk in
> > Lucene
Is there a way to index data into memory without writing to disk in Lucene ?
This is my current code storing it on disk
writer = new IndexWriter(FSDirectory.open(index_dir), new
IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCEN
t;
> >> Plan B.
> >>
> >> Reverse your MUST NOT search to get a list of docids that you don't
> >> want, then loop round Random.nextInt(indexreader.numDocs()), selecting
> >> those that are not deleted (!indexreader.isDeleted(docid)) and are not
>
probably better.
>
>
> --
> Ian.
>
>
> On Tue, Mar 29, 2011 at 8:00 PM, Patrick Diviacco
> wrote:
> > Ok I've solved the first part of the problem. I'm now selecting all
> > documents that do not contain a given term with a BooleanFilter
> >
2011 20:40, Patrick Diviacco wrote:
> Is there a Filter to get a limited number of random collection docs from
> the index which DO NOT contain a specific term ?
>
> i.e. term="pizza"
>
> I want to run the query against 10 random documents of the collection that
> do not contain the term "pizza".
>
> thanks
>
Is there a Filter to get a limited number of random collection docs from the
index which DO NOT contain a specific term ?
i.e. term="pizza"
I want to run the query against 10 random documents of the collection that
do not contain the term "pizza".
thanks
Nevermind, I've compiled it using ant. solved thanks
On 29 March 2011 17:41, Patrick Diviacco wrote:
> Ok, the svn repository I can only find the source files. Should I build the
> jar by myself or is there a packaged jar to download ?
>
> thanks
>
>
> On 29 March
think it is contrib-queries, so should be lucene-queries.jar).
>
> Uwe
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Patrick Diviacco [mailto:patrick.di
packaged
> the nightly build and aren't inadvertently getting older jars?
>
> Best
> Erick
>
> On Tue, Mar 29, 2011 at 7:21 AM, Patrick Diviacco
> wrote:
> > I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to
> the
> > following
I've downloaded the nightly build of Lucene (TRUNK) and I'm referring to the
following documentation:
https://hudson.apache.org/hudson/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/all/index.html
But I get:
cannot find symbol
symbol : class TermsFilter
location: package org.apache.lucene.search
hi,
Can I run a query against few specific docs of the collection only ?
Can I filter the built collection according to documents fields content ?
For example I would like to query over documents having field2 = "abc".
thanks
> But do the figuring out first - there is little point in speeding up
> the bit that is already quick.
>
>
> --
> Ian.
>
>
> On Tue, Mar 29, 2011 at 10:22 AM, Patrick Diviacco
> wrote:
> > hi,
> >
> > I performing multiple queries (stored in a 100MB X
My machine is Intel Dual Duo Core with 4GB ram.. is there something wrong
here ?
On 29 March 2011 11:22, Patrick Diviacco wrote:
> hi,
>
> I performing multiple queries (stored in a 100MB XML file) against a
> collection (indexed with lucene, and it was stored before in a 100M
hi,
I performing multiple queries (stored in a 100MB XML file) against a
collection (indexed with lucene, and it was stored before in a 100MB XML
file).
The process seems pretty long on my machine (more than 2 hours), so I was
wondering if importing the 100MB queries XML file into a mysql dataset
hey Uwe, so from your last answer, I understand I'm done.. no need to do
anything, I can already compare the queries.
However there is actually a misunderstanding: my booleanqueries have
variable number of boolean clauses because the fields are fixed but the
terms per field are not. So, for exampl
hey Hoss,
thanks for your reply. I thought I've solved the issue according to Uwe, the
queries without coord function were reasonably comparable, but now you
actually reopened it.
So, I need to be sure I'm making them comparable and I would like to ask the
following.
My BooleanQueries have simil
n change the Similarity to only have the cosine
> similarity left over - if you only want to use that one.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Mess
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sent: Monday, March 28, 2
www.thetaphi.de
>> eMail: u...@thetaphi.de
>>
>>
>> > -Original Message-
>> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
>> > Sent: Monday, March 28, 2011 10:09 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: c
r
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sent: Monday, March 28, 2011 10:09 AM
> > To: java-user@lucene.apa
scoring:
>
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/Simila
> rity.html
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > Fro
Hi,
sorry I've already asked few days ago, but I got no reply and I really need
some help on this..
I'm running several queries against a doc collection. The queries are
documents of the collection itself, I need to measure how similar is each
document to the rest of the collection.
Now, Lucene
uot;));
>
> Unfortunately, you cannot give a charset to FileWriter itself.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Patrick Diviacco [ma
hich java app are you using?
>
> paul
>
>
> Le 28 mars 2011 à 09:03, Patrick Diviacco a écrit :
>
> > When I run my Lucene app and a parse a xml file I get the following error
> > due to some fonts such as "é" written in the text file.
> >
> > If I s
When I run my Lucene app and a parse a xml file I get the following error
due to some fonts such as "é" written in the text file.
If I save the text file as UTF-8 with my text editor I don't have this
issue, but when I create it with a java app, it is saved as MacRoman.
How can I specify a differ
from the collection: I compare 1 doc from
the Collection against all other docs.
I need some more info about this...
thanks
On 26 March 2011 15:57, Patrick Diviacco wrote:
> I'm performing several queries and I get scores per each document which I
> have been told being not compara
I'm performing several queries and I get scores per each document which I
have been told being not comparable across queries.
For example, if I get score: 8.234234 for a specific document from a query
A, I cannot compare such score with the document score: 3.342432 of the
query B.
However I need
#x27;t be 0.
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco <
> patrick.divia...@gmail.com> wrote:
>
> > yeah it is clear. However I don't just want all documents, I still want
> to
> > per
high enough for the numDocs to match param)
>
> *Query query = new MatchAllDocsQuery();*
> *searcher.search(query.);*
>
> Hope this clarifies your doubt.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco
ot supported by the used Luke version.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-
> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sen
419924 All-text score:0.0018638512
I was expecting the score to be 0, instead.
thanks
On 23 March 2011 08:44, Patrick Diviacco wrote:
> The issue with
>
>
> My confusion about MatchAllDocsQuery is that I cannot specify which terms
> in which fields to search with it. I'm p
ou may have a completely different option that you
> haven't read which someone could advice if they know the exact intent.
>
> Hope this helps.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> patr
s:f14 tags:usm tags:canonef50mmf14 tags:canonef50mmf14usm
I can see the tags field repeated multiple times, so it seems to me
correctly parsed... correct ?
On 23 March 2011 07:50, Patrick Diviacco wrote:
> Your answer is quite clear, but my question is a bit more specific:
> as you s
d is probably perferred.
>
> Best
> Erick
>
> On Tue, Mar 22, 2011 at 3:41 AM, Patrick Diviacco
> wrote:
> > OK, so I'm currently doing this:
> >
> > booleanQuery.add(new
> QueryParser(org.apache.lucene.util.Version.LUCENE_40,
> > "tags"
t
> you only cared about this for debugging. What is the use-case
> for having it on all the time?
>
> Best
> Erick
>
> On Tue, Mar 22, 2011 at 12:40 PM, Patrick Diviacco
> wrote:
> > I've been told search explain should be used for debugging only becau
I've been told search explain should be used for debugging only because it
slows down a lot computations. Is it true ?
On 22 March 2011 14:29, Erick Erickson wrote:
> Try Searcher.explain.
>
> Best
> Erick
>
> On Tue, Mar 22, 2011 at 4:34 AM, Patrick Diviacco
> wr
the queries after they're assembled. I believe you'll
> find that the difference is that the PhraseQuery would find text like
> "Term1 Term2 Term3" but not text like "Term1 some stuff Term2 more
> stuff Term3" whereas BooleanQuery would.
>
> Best
> Eri
ll' documents or only docs matching your query?
> 2. if its about fetching all docs, why not use the matchalldocs query?
> 3. did you try using a collector instead of topdocs?
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Mar 22, 2011 at 4:46
w
> this
> > is scored by extending MatchAllDocsQuery and writing a custom scorer.
> >
> > Karl
> >
> > -Original Message-
> > From: ext Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sent: Tuesday, March 22, 2011 4:23 AM
> > To: java-
Is there a way to display Lucene scores per field instead of the global one
?
Both my query and my docs have 3 fields.
I would like to see the scores for each field in the results. Can I ?
Or should I run the query 3 times for each single field ?
thanks
I'm using the following code because I want to see the entire collection in
my query results:
//adding wildcards-term to see all results
rest = new TermQuery(new Term("*","*"));
booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
But it doesn't work, I only see the relevant docs and not all the o
OK, so I'm currently doing this:
booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40,
"tags", new
WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]);
, BooleanClause.Occur.SHOULD);
I just want to add single terms to my booleanQuery. if I pass a q
I'm new to Lucene and I would like to know what's the difference (if there
is any) between
PhraseQuery.add(Term1)
PhraseQuery.add(Term2)
PhraseQuery.add(Term3)
and
term1 = new TermQuery(new Term(...));
booleanQuery.add(term1, BooleanClause.Occur.SHOULD);
term2 = new TermQuery(new Term(...));
bo
One more thing: It is actually not clear to me how to use PhraseQuery... I
thought I can just pass a phrase to it, but I see only add(Term) method...
should I parse the string by myself to single terms ?
On 21 March 2011 18:05, Patrick Diviacco wrote:
>
>> If description field is
>
>
> If description field is tokenized/analyzed during indexing you need to use
> PhraseQuery.
>
Uhm yeah I'm using a WhitespaceAnalyzer. This is the code using for
indexing:
writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new
IndexWriterConfig(org.apache.lucene.util.Version.LUCENE_40, new
I'm combining several scores for my queries performed with Lucene and other
software.
My issue is that I have lucene scores + other scores (not related to Lucene)
for each query result.
The other scores are all normalized between 1 and 0.
I need to normalize Lucene scores (over all queries) beca
I'm new to Lucene. If I use
description = new TermQuery(new Term("description", "my string"));
I ask Lucene to consider "my string" as unique word, right ?
I actually need to consider each word, should I use PhraseQuery instead ? Or
is it correct ?
thanks
:07, Simon Willnauer wrote:
> Why do you want to replace the WhitespaceAnalyzer? I don't really
> understand what you are up to.
>
> simon
>
> On Fri, Mar 4, 2011 at 3:21 PM, Patrick Diviacco
> wrote:
> > What's the best way to replace WhitespaceAnalyzer in this li
Nevermind, I've finally solved.
I just now need to figure out how to retrieve the scores per fields in my
results.
I need to know how much similar each field is. I know I can use explain()
but it slows down computations...
thanks
On 4 March 2011 21:21, Patrick Diviacco wrote:
> ok tha
20:39, Robert Muir wrote:
> On Fri, Mar 4, 2011 at 2:12 PM, Patrick Diviacco
> wrote:
> > hey Robert,
> >
> > I know there is the documentation, I'm sorry I've confused setSimilarity
> > with setSimilarityProvider.
> >
> > However, my questio
ass implementing the SimilarityProvider
and then implement the get method ?
Also, inside the get method should I check the passed string field and
return different custom similarities classes ?
thanks
Patrick
On 4 March 2011 19:57, Robert Muir wrote:
> On Fri, Mar 4, 2011 at 1:18 PM, Patrick Di
)
thanks
On 3 March 2011 16:34, Robert Muir wrote:
> On Thu, Mar 3, 2011 at 10:25 AM, Patrick Diviacco
> wrote:
> > I've downloaded Lucene nightly build because I need to customize the
> > similarity *per field*.
> >
> > However I don't see the
etaphi.de
>
> > -Original Message-
> > From: Patrick Diviacco [mailto:patrick.divia...@gmail.com]
> > Sent: Friday, March 04, 2011 2:34 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: WhitespaceAnalyzer in Lucene nightly build ?
> >
>
What's the best way to replace WhitespaceAnalyzer in this line in Lucene
nightly build 4.0 ? Is there a generic analyzer I can use ?
writer = new IndexWriter(FSDirectory.open(INDEX_DIR), new
WhitespaceAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
thanks
the modules as JARs):
>
> https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/lastSuccessfulB
> uild/artifact/maven_artifacts/org/apache/lucene/
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> e
I've downloaded Lucene nightly build because I need to customize the
similarity *per field*.
However I don't see the field parameter passed to the methods to compute the
score such as "tf" and "idf"...
how can I implement different similarities score per document field then ?
thanks
Can i read the javadocs for Lucene nightly build 4.0 ?
How ?
thanks
; return sim.lengthNorm(fieldName, numTokens);
>}
> }
> // same for scorePayload. For the others, I just delegate
> // to defaultSimilarity (all I really need is scorePayload in
> // my case).
> }
>
> and in the schema.xml, I just set this class to be the similarity
I need to define different similarity scores per document field.
For example for field A I want to use Lucene tf.idf score, for the numerical
field B I want to use a different metric (difference between values) and so
on...
thanks
71 matches
Mail list logo