Re: Question about scoring normalisation

2005-11-06 Thread Otis Gospodnetic
> if so the top score should always be 1.0. Isn't so.

Perhaps you are right.  Can you send some code that shows this,
preferably writen as a JUnit test and attached to a JIRA issues?

> Or does boosting multiple individual fields wreck that ?

I didn't think so, but I could be wrong.

Otis

> sameer
> 
> On 11/6/05, Chris Lamprecht <[EMAIL PROTECTED]> wrote:
> > Lucene just takes the highest score returned, and divides all
> scores
> > by this max_score.  So max_score / max_score = 1.0, and voila.
> >
> >
> --
> Sameer Shisodia  Bangalore
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: require new comment for IndexWriter.mergeFactor

2005-11-06 Thread Otis Gospodnetic
I think the best description of various IndexWriter parameters is in
Lucene in Action (sorry if this sounds like a plug, but it's not the
intention) in section 2.7.1., page 42 (
http://www.lucenebook.com/search?query=mergeFactor )

mergeFactor controls how often Lucene merges segments while indexing.
minMergeDocs (I think it's renamed now) controls how many documents to
buffer in RAM before writing them to disk.

Only the later affects RAM usage.  Both affect speed (check out free
code from Lucene in Action at http://lucenebook.com/ to see how
changing various parameters changes indexing performance).

Otis


--- Kerang Lv <[EMAIL PROTECTED]> wrote:

> Does the IndexWriter.mergeFactor remain the same
> effect on the RAM use after the introduce of
> IndexWriter.minMergeDocs?
> 
> The minMergeDocs was introduced into
> IndexWriter(Revision 1.21 in cvs) in order to control
> the number of
> Documents merged in RAMDirectory independently of the
> mergeFactor (see
> http://issues.apache.org/bugzilla/show_bug.cgi?id=23754).
> And the IndexWriter.maybeMergeSegments changed from
> then on:
> 
> @@ -375,7 +385,7 @@
>  
>/** Incremental segment merger.  */
>private final void maybeMergeSegments() throws
> IOException {
> -long targetMergeDocs = mergeFactor;
> +long targetMergeDocs = minMergeDocs;
> 
> But the comment of mergeFactor remains:
> 
> The following is the comment of
> IndexWriter.mergeFactor in the 1.2 RC6:
>   /** Determines how often segment indexes are merged
> by addDocument().  With
>* smaller values, less RAM is used while indexing,
> and searches on
>* unoptimized indexes are faster, but indexing
> speed is slower.  With larger
>* values more RAM is used while indexing and
> searches on unoptimized indexes
>* are slower, but indexing is faster.  Thus larger
> values (> 10) are best
>* for batched index creation, and smaller values (<
> 10) for indexes that are
>* interactively maintained.
>*
>* This must never be less than 2.  The default
> value is 10.*/
>   public int mergeFactor = 10;
> 
> 
> and now, it's in 1.4.3:
>   /** Determines how often segment indices are merged
> by addDocument().  With
>* smaller values, less RAM is used while indexing,
> and searches on
>* unoptimized indices are faster, but indexing
> speed is slower.  With larger
>* values, more RAM is used during indexing, and
> while searches on unoptimized
>* indices are slower, indexing is faster.  Thus
> larger values (> 10) are best
>* for batch index creation, and smaller values (<
> 10) for indices that are
>* interactively maintained.
>*
>* This must never be less than 2.  The default
> value is 10.*/
>   public int mergeFactor = DEFAULT_MERGE_FACTOR;
> 
> Does the IndexWriter.mergeFactor remain the same
> effect on the RAM use?
> 
> 
>   
> __ 
> Start your day with Yahoo! - Make it your home page! 
> http://www.yahoo.com/r/hs
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene and jsp

2005-11-06 Thread Valerio Schiavoni
it seems like the problem is from here:

java.security.AccessControlException: access denied
(java.util.PropertyPermission org.apache.lucene.writeLockTimeout read)

try to use this properly:
System.setProperty("disableLuceneLocks", "true");



2005/11/5, Gaston <[EMAIL PROTECTED]>:
>
> Hallo,
>
> I know my topic is a little bit out of topic. but I am trying and trying
> to do something without no effort. I have a very simple application.I
> tested this application on my homepc with tomcat 3.3.2 and it worked.
> But on the the server off my webhosting agency it does not work. I
> putted the jar in the right directory and so on and so I have no idea
> why it doesn't work. Perhaps somebody out of you had the same problem
> and has a hint for my, what the reason for my failure can be.
>
> My code:
> <%@ page import="java.io.*,javax.servlet.*,
> javax.servlet.http.*,org.apache.lucene.analysis.Analyzer,
> org.apache.lucene.analysis.standard.StandardAnalyzer,
> org.apache.lucene.document.Document,org.apache.lucene.document.Field,
> org.apache.lucene.index.IndexWriter"
> %>
> <%
>
>
>
> try
> {
> String[] text = { "Indexierung mit Lucene", "Suche mit Lucene" };
> String indexDir = application.getRealPath("/")+"myindex";
> Analyzer analyzer = new StandardAnalyzer();
> boolean create = true;
>
> IndexWriter writer = new IndexWriter(indexDir, analyzer, create);
> out.println(indexDir);
> for (int i = 0; i < text.length; i++)
> {
> Document document = new Document();
> document.add(Field.Text("textfeld", text[i]));
> writer.addDocument(document);
> out.println("Es klappt");
> }
> writer.close();
> out.println("hallozwei");
> }
> catch(IOException e)
> {
> e.printStackTrace();
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
>
> %>
>
> Error:
>
> http://gasizwei.meintestaccount.de:9080/gagamodi/indexaufserver.jsp
>
>
> Thank you in advance.
>
> Greetings
>
> Gaston
>
> P.S. I asked this in j2ee forums but the answers I get didn't help me.
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


Re: Is There Other Ports of Nutch?

2005-11-06 Thread Stefan Groschupf

No! Porting nutch in general makes no sense.
Since nutch is not a library as lucene but a complete ready to use  
application you can download and start.
There is a kind of  'webservice' (open search rss) to be able to  
integrate nutch search results in third party applications.


Stefan
... and no from the performance point of view it is also senseless to  
port nutch :-)





Am 06.11.2005 um 07:22 schrieb Victor Lee:


Hi,
  I know that there are several ports of Lucene, like
cLucene, pLucene, etc.  Are there other ports of Nutch
besides java?

Many thanks.




__
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question about scoring normalisation

2005-11-06 Thread Yonik Seeley
On 11/5/05, Sameer Shisodia <[EMAIL PROTECTED]> wrote:
> if so the top score should always be 1.0. Isn't so.
> Or does boosting multiple individual fields wreck that ?
> sameer

The top score is scaled back to 1.0 *only* if it's greater than 1.0

So hits with scores of 4.0,2.0 will be normalized to 1.0,0.5
while hits of 0.4,0.2 will be normalized to 0.4,0.2

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question about scoring normalisation

2005-11-06 Thread Karl Koch
Hello Ira,

I am not sure if I know exactly what pivoted normalisation is. I can tell
you what I do, in the meantime I will have a look to your paper and I hope
that we can discuss this issue further.

I work in personalised searching where a second model - a user model - 
expresses extra relevance of a document; however from a different
prespective. I also value the use of TF/IDF to represent the content sided
view on content relevance. Therefore, I would like to combine both scores
(the one from Lucene and the one from my model) into a combined score.
Furthermore, I need to normalise this score so it cannot get higher than
1.0. At the moment, I have a visulisation scheme that works on this premise.

In Lucene, scores seem to be normalised only if they exceed the 1.0 maximum
border. If they do not exceed this max value, they are left where they are. 

Based on that I have now solved my problem to combine my two scores (Lucene
and mine) and if they exceed, I normalise the scores like Lucene does. I
think this is the most accurate think I could do in this case where I do not
violoate the overall meaning of scoring for the user. 

I realise that if I always normalise (to a 0.0 to 1.0 range) I will
introduce a dangerous feature that basically boosts a bunch of low scored
documents (e.g. all between 0.1 and 0.2) unnecessarily high (in this case
right up to between 0.5 and 1.0).

What would you say about that? Does is make sense?

Kind Regards,
Karl


> --- Ursprüngliche Nachricht ---
> Von: Ira Goldstein <[EMAIL PROTECTED]>
> An: "Karl Koch" <[EMAIL PROTECTED]>
> Betreff: Re: Question about scoring normalisation
> Datum: Sun, 06 Nov 2005 08:08:59 -0500
> 
> Karl --
>   Hi.  I've been thinking about adding a pivoted normalization to Lucene
> (see
> attached paper).  I've just started to look at the current code to see how
> the
> tf-idf (sum of squares?) has been implemented.  I wanted to do that before
> begining any coding?  
>   Is pivoted normalization the sort of thing you were asking about?
>   Take care
> --Ira
> 
> -- Original Message --
> Received: Sat, 05 Nov 2005 03:26:12 PM EST
> From: "Karl Koch" <[EMAIL PROTECTED]>
> To: "User " 
> Subject: Question about scoring normalisation
> 
> > Hello all,
> > 
> > I am wondering how many of you actually work with own scoring mechanism
> > (overwriting Lucenes standard scoring) and how many of you do work on
> how
> to
> > normalise this score. 
> > 
> > I would like to add a second score on top of Lucenes TF/IDF score. The
> > resulting score is most likely higher then 1.0. However, the score
> should
> be
> > between 0.0 and 1.0. What is the best way to do that? If Lucene is
> > normalising its score (if no boosting is applied) to a maximium of 1.0,
> how
> > is this done (in Lucene 1.2 and/or beyond) ?
> > 
> > Regards,
> > Karl
> > 
> > 
> > -- 
> > Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
> > DSL-Flatrate für nur 4,99 Euro/Monat*  http://www.gmx.net/de/go/dsl
> > 
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> 

-- 
Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,
DSL-Flatrate für nur 4,99 Euro/Monat*  http://www.gmx.net/de/go/dsl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multiple terms with the same position in PhraseQuery

2005-11-06 Thread Ahmed El-dawy
I have used the PhrasePrefixQuery but it has some problems. It
sometimes throw an exception (OperationNotAllowed) when it is added to
a boolean query with required flag set. Also it sometimes throws a
null pointer exception and I don't know why.
  I am trying to get the latest version (from SVN) but I can't do it.
I need your help. How can I checkout the project?
  I use eclipse and it asks me for these information
Host: -
Repository Path: -
User: -
Password: --
Connection type: (pserver | ext | extssh)
port: (default | custom)

Thanks very much

On 11/5/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> On 4 Nov 2005, at 23:08, Ahmed El-dawy wrote:
> > BTW, I think there's a newer version of Lucene that I can't get, my
> > version is 1.4.3 and I didn't find any newer version at the site. For
> > example, the QueryParser in my version doesn't care with term position
> > and I had to override it by myself to support this.
> > You may be referring to the CVS version, but I want to release my app.
> > with a stable version.
>
> For the record, Subversion trunk (no longer CVS) is stable and being
> used in many production projects already.
>
> The only difference between Subversion trunk and a released version
> is the time and effort someone has taken to build it, package it,
> sign it, and upload it (and of course a consensus vote authorizing
> it).  While I know that many environments demand that such blessing
> has occurred, I cannot say that I altogether understand it.  I much
> prefer, personally, to be on the trunk and know that any issues I do
> happen to encounter can be easily reported, likely fixed if
> identified specifically enough, fixed, and integrated back into my
> projects right away.
>
> I certainly do feel a bit bad that I'm not personally being
> aggressive about pushing a new release, but please don't let my
> insane schedule hold you back from using the latest and best version
> of Lucene.
>
> Erik
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


--
regards,
Ahmed Saad

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multiple terms with the same position in PhraseQuery

2005-11-06 Thread Erik Hatcher


On 6 Nov 2005, at 16:13, Ahmed El-dawy wrote:


I have used the PhrasePrefixQuery but it has some problems. It
sometimes throw an exception (OperationNotAllowed) when it is added to
a boolean query with required flag set. Also it sometimes throws a
null pointer exception and I don't know why.
  I am trying to get the latest version (from SVN) but I can't do it.
I need your help. How can I checkout the project?
  I use eclipse and it asks me for these information
Host: -
Repository Path: -
User: -
Password: --
Connection type: (pserver | ext | extssh)
port: (default | custom)


It looks like this is a CVS dialog as none of those connection types  
apply.  Subversion is over HTTP(S).


There are lots of details about accessing the Subversion repository  
at Apache here:




Nothing Eclipse-specific that I've found, but this is the command- 
line used in a shell:


svn co http://svn.apache.org/repos/asf/lucene/java/trunk/ lucene

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multiple terms with the same position in PhraseQuery

2005-11-06 Thread Chris Hostetter
:  
:
: Nothing Eclipse-specific that I've found, but this is the command-
: line used in a shell:
:
:  svn co http://svn.apache.org/repos/asf/lucene/java/trunk/ lucene

I've never used eclipse, but a google search for "subversion eclipse"
lists many promising options.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Best Way To Index Database Using Lucene?

2005-11-06 Thread Victor Lee
Hi,
  I use php and mysql.  The visitors enters data
through the web and the data is stored in the
database.  I want to make portions of that data to be
searchable using Lucene.  

I am thinking of giving that data to Lucene for
indexing at the same time of inputing that same data
into the database.  Is it a good idea?  or should I
just make php to input all the data into the mysql db
without indexing, and then just periodally update the
index?  

What's your suggestion?   Many thanks.




__ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Way To Index Database Using Lucene?

2005-11-06 Thread Victor Lee
I forgot to mention that if I use php-java-bridge to
use Lucene to index at the same time I input the data
into the mysql db, I don't even need to use JDBC.  If
I  index inside the business logic layer which is
java, then I will have to use JDBC.

--- Victor Lee <[EMAIL PROTECTED]> wrote:

> Hi,
>   I use php and mysql.  The visitors enters data
> through the web and the data is stored in the
> database.  I want to make portions of that data to
> be
> searchable using Lucene.  
> 
> I am thinking of giving that data to Lucene for
> indexing at the same time of inputing that same data
> into the database.  Is it a good idea?  or should I
> just make php to input all the data into the mysql
> db
> without indexing, and then just periodally update
> the
> index?  
> 
> What's your suggestion?   Many thanks.
> 
> 
>   
>   
> __ 
> Yahoo! Mail - PC Magazine Editors' Choice 2005 
> http://mail.yahoo.com
> 
>
-
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 




__ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multiple terms with the same position in PhraseQuery

2005-11-06 Thread Ken Krugler

I have used the PhrasePrefixQuery but it has some problems. It
sometimes throw an exception (OperationNotAllowed) when it is added to
a boolean query with required flag set. Also it sometimes throws a
null pointer exception and I don't know why.
  I am trying to get the latest version (from SVN) but I can't do it.
I need your help. How can I checkout the project?
  I use eclipse and it asks me for these information


Subclipse is the Eclipse plug-in for Subversion. Though I've found 
that things work best if I:


a. Use the command line or a client like SmartSVN to get the project 
files - and I don't put them into the Eclipse Workspace directory.


b. Then launch Eclipse and create a new Java project, importing the 
files from the external (SVN-controlled) location.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-470-9200

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Way To Index Database Using Lucene?

2005-11-06 Thread Manoj Kr. Sheoran
It depends on your requirement. If you want the realtime searching then you
should go to your first approach otherwiser second is fine.

--Manoj


- Original Message - 
From: "Victor Lee" <[EMAIL PROTECTED]>
To: 
Sent: Monday, November 07, 2005 7:21 AM
Subject: Re: Best Way To Index Database Using Lucene?


> I forgot to mention that if I use php-java-bridge to
> use Lucene to index at the same time I input the data
> into the mysql db, I don't even need to use JDBC.  If
> I  index inside the business logic layer which is
> java, then I will have to use JDBC.
>
> --- Victor Lee <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >   I use php and mysql.  The visitors enters data
> > through the web and the data is stored in the
> > database.  I want to make portions of that data to
> > be
> > searchable using Lucene.
> >
> > I am thinking of giving that data to Lucene for
> > indexing at the same time of inputing that same data
> > into the database.  Is it a good idea?  or should I
> > just make php to input all the data into the mysql
> > db
> > without indexing, and then just periodally update
> > the
> > index?
> >
> > What's your suggestion?   Many thanks.
> >
> >
> >
> >
> > __
> > Yahoo! Mail - PC Magazine Editors' Choice 2005
> > http://mail.yahoo.com
> >
> >
> -
> > To unsubscribe, e-mail:
> > [EMAIL PROTECTED]
> > For additional commands, e-mail:
> > [EMAIL PROTECTED]
> >
> >
>
>
>
>
> __
> Yahoo! FareChase: Search multiple travel sites in one click.
> http://farechase.yahoo.com
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene and jsp

2005-11-06 Thread Manoj Kr. Sheoran
You should check your logs on server side(on tomcat). I am sure some
exception is generated over there.


--Manoj

- Original Message - 
From: "Gaston" <[EMAIL PROTECTED]>
To: 
Sent: Saturday, November 05, 2005 10:51 PM
Subject: lucene and jsp


> Hallo,
>
> I know my topic is a little bit out of topic. but I am trying and trying
> to do something without no effort. I have a very simple application.I
> tested this application on my homepc with tomcat 3.3.2 and it worked.
> But on the the server off my webhosting agency it does not work. I
> putted the jar in the right directory and so on and so I have no idea
> why it doesn't work. Perhaps somebody out of you had the same problem
> and has a hint for my, what the reason for my failure can be.
>
> My code:
> <%@ page import="java.io.*,javax.servlet.*,
>
javax.servlet.http.*,org.apache.lucene.analysis.Analyzer,org.apache.lucene.a
nalysis.standard.StandardAnalyzer,org.apache.lucene.document.Document,org.ap
ache.lucene.document.Field,org.apache.lucene.index.IndexWriter"
> %>
> <%
>
>
>
> try
> {
> String[] text = { "Indexierung mit Lucene", "Suche mit Lucene" };
> String indexDir = application.getRealPath("/")+"myindex";
> Analyzer analyzer = new StandardAnalyzer();
> boolean create = true;
>
> IndexWriter writer = new IndexWriter(indexDir, analyzer, create);
> out.println(indexDir);
> for (int i = 0; i < text.length; i++)
> {
> Document document = new Document();
> document.add(Field.Text("textfeld", text[i]));
> writer.addDocument(document);
> out.println("Es klappt");
> }
> writer.close();
> out.println("hallozwei");
> }
> catch(IOException e)
> {
> e.printStackTrace();
> }
> catch(Exception e)
> {
> e.printStackTrace();
> }
>
> %>
>
> Error:
>
> http://gasizwei.meintestaccount.de:9080/gagamodi/indexaufserver.jsp
>
>
> Thank you in advance.
>
> Greetings
>
> Gaston
>
> P.S. I asked this in j2ee forums but the answers I get didn't help me.
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]