implementation of lucene into opencms

2007-03-30 Thread mohamed hadj taieb

Hi
i have implemented lucene with tomcat
the application demo gives an interface to write the word to search and when
the search is launched it returns the path of the pages which contain that
word like that :
Document Summary  C:\Program Files\Apache Software Foundation\Tomcat
5.5\webapps\jsp-examples\jsp2\simpletag\hello.jsp null
but i want to give me the resullt like that
http://localhost/jsp-examples/jsp2/simpletag\hello.jsp


if u have an idea about that it will help me

but my objectif is to integrate lucene into opencms
so is there some documentation to to this
and have you some recommandation for that : if there is simple exemples just
for test i will be happy to get them
my skype id : mohamed.hadj.taieb
thx a lot


Re: normalized scores

2007-03-30 Thread Donna L Gresh
I'm well aware that some queries will return no results due to my 
filtering by 0.3. 
That's the point. I expect that some of my input queries will not be a 
good match
to *any* of the documents in my second index. 

I'm really doing something much like
the "Books Like This" example in Chapter 5 of Lucene in Action (which I 
saw after I wrote this). 
It is unfortunate that some scores are being normalized and some may not 
be. Is there a
way to obtain the unnormalized score?


Donna Gresh





Chris Hostetter <[EMAIL PROTECTED]> 
03/29/2007 06:26 PM
Please respond to
java-user@lucene.apache.org


To
java-user@lucene.apache.org
cc

Subject
Re: normalized scores






: For a given query (for a single input document), the highest score is
: *not* always 1 (which is just how
: I want it). Is this because I am using a Boolean query? Here is my code
: snippet.

the Hits class only normalizes scores if the highest score is greater then
one, if it's less then 1 no normalization happens.

as to your more general question...

: Recent questions about whether/how scores are normalized got me 
wondering
: how my application (happily) seems to be doing what I want. I have two

it's all a question of what you want ... what you've got is throwing
things out with a score less then 0.3 ... but that's an arbitrary
decision -- there is no mathematical basis for assuming a
documentwhich scores "0.31" agaisnt query A is better match on A then a
doc which scores 0.29 against query B is for B ... they are apples and
oranges.

you can be as arbitrary as you want ... you could decide to ignore every
even numbered hit if you want -- it's entirely your choice, but it's not a
ratinal choice.


BTW: i hope you realize based on your comment about not all Hits having a
max score of 1, for some queries, the highest scoring doc might not even
have a score above 0.3, in which case you would be ignoring all matches.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: normalized scores

2007-03-30 Thread Erik Hatcher


On Mar 30, 2007, at 8:48 AM, Donna L Gresh wrote:
It is unfortunate that some scores are being normalized and some  
may not

be. Is there a
way to obtain the unnormalized score?


Any IndexSearcher.search method that does not return Hits keeps the  
raw scores.  Try out the TopDocs returning ones or use a HitCollector.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



setBoost on Field

2007-03-30 Thread DECAFFMEYER MATHIEU
Hi,

I am parsing this file called Logistics.htm
I have a field named "headlines" that contains word "clients" among others.

When I don't put a boost on this field, I have as score 0.06 when searching for 
clients.
Then when I put a boost of "10", I have a score of 0.21
Yet I was expecting a score of 0.60

Could anyone clarify this behaviour to me ?
Thank u for any help.

__

   Matt




Internet communications are not secure and therefore Fortis Banque Luxembourg 
S.A. does not accept legal responsibility for the contents of this message. The 
information contained in this e-mail is confidential and may be legally 
privileged. It is intended solely for the addressee. If you are not the 
intended recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be unlawful. 
Nothing in the message is capable or intended to create any legally binding 
obligations on either party and it is not intended to provide legal advice.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Retrieving the index format

2007-03-30 Thread Dan Climan
Is there a way to retrieve the tell which format an index is in?  The file 
formats documentation 
http://lucene.apache.org/java/docs/fileformats.html#Segments%20File indicates 
that the segments file stores a Format value that can be used to determine the 
type.

 Format is -1 as of Lucene 1.4 and -3 (SemgentInfos.FORMAT_SINGLE_NORM_FILE) as 
of Lucene 2.1.

However, there doesn't seem to be an API to retrieve this value. Is it not 
exposed because it is intended only for internal code to maintain backward 
compatibility?  Since I have a mix of old and new indices in test environments, 
I thought

Re: normalized scores

2007-03-30 Thread Donna L Gresh
Thanks Erik, that works great--
Donna



>> It is unfortunate that some scores are being normalized and some 
>> may not
>> be. Is there a
>> way to obtain the unnormalized score?

>Any IndexSearcher.search method that does not return Hits keeps the 
>raw scores.  Try out the TopDocs returning ones or use a HitCollector.

Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]


Re: normalized scores

2007-03-30 Thread Chris Hostetter

: I'm well aware that some queries will return no results due to my
: filtering by 0.3.
: That's the point. I expect that some of my input queries will not be a
: good match
: to *any* of the documents in my second index.

what i'm trying ot make sure you understand is that picking 0.3 as an
arbitrary number might make sense for soem queries, but not others ... the
scores are inherently not comparable between queries, if you can't
compare score(queryA) with score(queryB) then you also can't fairly
comparse score(queryA) with a constant N which you also compare to the
score(queryB).

with so many similar threads, i get confused as to what's already been
said sometimes, it doesn't look like i ever pointed out the FAQ on this
(assuming you haven't already seen it)...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03
http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



how to index a large database

2007-03-30 Thread Mohammad Norouzi

Hi all
I am going to index our database. one approach is to join them and then
index the fields. but the information are very large say more than 3
millions. so the Sql Server fails to select them.

I want to know if anyone has such this experience to indexing huge
information of database using lucene.

can anyone give me some advice?

--
Regards,
Mohammad


Re: how to search over another search

2007-03-30 Thread Mohammad Norouzi

Hi Erick,


Why not combine the indexes? That would be the "lucene way"...


I combined them by joining tables but it gets very very large and ResaltSet
failed to retrieve the fields!!!

On 3/26/07, Erick Erickson <[EMAIL PROTECTED]> wrote:


The short form is no. Lucene is emphatically NOT a relational database.
Of course, you could take the results of the first search, collect the IDs
and query the second, but for large sets this may not be practical

Why not combine the indexes? That would be the "lucene way"...

There has been extensive discussion of embedding Lucene in a DB,
but I can't remember who. Search the archive for Oracle and you'll find
an extensive discussion

Best
Erick


On 3/26/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
>
> I mean when I get result from the first index, find the common records
> from
> the second index depending on first result.
> something like relation between two database tables, relation by primary
> key
>
> index1:
> id   name somefield1
> 1 jack   value1
> 2Michael  value2
> 3 Sara  value3
> 4Josephvalue4
> ...
>
>
> index2:
> id  field1  field2  field3
> 2   fval1-1fval2-1 fval3-1
> 4   fval1-2fval2-2 fval3-2
> ...
>
> now a user puts query : name:Michael
>
> I should return the result wrapped in following document:
> document:
> id   name   somefield1
field1  field2field3
>
>
--
> 2Michaelvalue2 fval1-1fval2-1
> fval3-1
>
>
>
> On 3/26/07, jafarim <[EMAIL PROTECTED]> wrote:
> >
> > what do you mean by "applying the result to the second one"?
> >
> > On 3/26/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote:
> > >
> > > hi
> > > I have two separated index but there are some fields that are common
> > > between
> > > them. now I want to search from one index and then apply the result
to
> > the
> > > second one. what solution do you suggest?
> > > what happens on fields? I mean the first document has some fields
that
> > are
> > > not present in the second one so I need the final document has all
the
> > > fields of both indexes.
> > >
> > > thanks
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
>





--
Regards,
Mohammad