Re: Terms given a filter?

2005-09-16 Thread mark harwood
Erik,
It may be worth looking at the code here:

http://issues.apache.org/jira/browse/LUCENE-328

The Bitsets in your example are likely to be very
sparse (I imagine you know only too well how long it
takes to write a book and therefore how many books
there are likely to be per author! :))With such a
sparse set per author BitSets could use a lot of
memory. In this example I imagine a SortedVIntList per
author would be a much more compact format.
The code in the link contains a standard interface for
a sorted list of ints with bitset,int array and VInt
encoded implementations. The AndDocNrSkipper and
OrDocNrSkipper classes can be used to perform set
intersections on any combination of these int sets.




Cheers,
Mark




___ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question: force a field must be matched?

2005-09-16 Thread Miles Barr
On Thu, 2005-09-15 at 11:56 -0700, James Huang wrote:
> Yes, "+" is what I missed! Thanks.
> 
> Suppose there is a book published by 3 publishers (I
> don't know how that works in real world):
> 
> // At index time:
>   doc.add( Field.Keyword("publisher", "Manning") );
>   doc.add( Field.Keyword("publisher", "SAMS") );
>   doc.add( Field.Keyword("publisher", "O'Reilly") );
> 
> // At search time:
>   queryString += " +publisher:SAMS";
>   ...
> 
> should find me that Document.

That may or may not work depending on your analyzer. 

If you're using the query parser with the standard analyzer it will
search the 'publisher' field for 'sams' not 'SAMS', and hence get no
matches back.

If you want to use the query parser instead of building the query by
hand you can use the PerFieldAnalyzerWrapper class and write a
KeywordAnalyzer, i.e.:

package org.apache.lucene.analysis;

import java.io.IOException;
import java.io.Reader;

/** "Tokenizes" the entire stream as a single token. */
public class KeywordAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, final Reader reader) {

return new TokenStream() {
private boolean done;
private final char[] buffer = new char[1024];

public Token next() throws IOException {
if (!done) {
done = true;
StringBuffer sb = new StringBuffer();
int length;
while (true) {
length = reader.read(this.buffer);
if (length == -1) break;

sb.append(this.buffer, 0, length);
}
String text = sb.toString();
return new Token(text, 0, text.length());
}
return null;
}
};
}
}



PerFieldAnalyzerWrapper result =
new PerFieldAnalyzerWrapper(new StandardAnalyzer());

result.addAnalyzer("publisher", new KeywordAnalyzer());

QueryParser parser = new QueryParser(, result);




-- 
Miles Barr <[EMAIL PROTECTED]>
Runtime Collective Ltd.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



some general question about Nutch Search engine.

2005-09-16 Thread Legolas Woodland
Hi
Thank you for reading my post
I have some general question :
1-does Nutch support multilanguage indexing and searching ?
2-does it has capability to index and search more than 500,000 site in a 
timely manner?
3-does it have capabilities to add ADs System , sponsored result first and 
other features that for example google search engine has?
4-does licensing allow me to use/modefy it for my own purpose without 
sharing the source ?
5-does its robot support site list / domainextension list (for example 
searching and indexing all UK extension)

Thank you.


Re: some general question about Nutch Search engine.

2005-09-16 Thread Andrzej Bialecki

Legolas Woodland wrote:

Hi
Thank you for reading my post
I have some general question :


Please see http://nutch.org for information about Nutch.


1-does Nutch support multilanguage indexing and searching ?


Yes, to large degree (there are always issues when making assumptions 
about the query language).


2-does it has capability to index and search more than 500,000 site in a 
timely manner?


Sure, no problem. I typically work with instances that collect data from 
5 mln pages, others run installations that have ~100 mln pages.


3-does it have capabilities to add ADs System , sponsored result first and 
other features that for example google search engine has?


Requires coding, but not so complicated.

4-does licensing allow me to use/modefy it for my own purpose without 
sharing the source ?


Yes, ASL-2.0 license, same as Lucene.

5-does its robot support site list / domainextension list (for example 
searching and indexing all UK extension)


Yes.

--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Small problem in searching

2005-09-16 Thread Vanlerberghe, Luc
You could also add a field with all the terms reversed during the
indexation.

So documents containing "tirupathireddy" or "venkatreddy" would have
"ydderihtapurit" and "yddertaknev" in the reversed field.
If you detect that the user entered a suffix query like "*reddy",
transform it into a prefix query like "ydder*" on the reversed field.

Luc


-Original Message-
From: jian chen [mailto:[EMAIL PROTECTED] 
Sent: donderdag 15 september 2005 18:22
To: java-user@lucene.apache.org
Subject: Re: Small problem in searching

Hi,

I think Lucene transforms the prefix match query into all sub queries
where 
the searching for a prefix could result into search for all terms that
begin 
with that prefix.

For "postfix" match, I think you need to do more work than relying on 
Lucene's query parser. 

You can iterate over the terms and do an "endsWith()" call, and if there
is 
a match, then, perform a normal Lucene search for that term. 

So, effectively, you do the same thing as prefix match, conceptually
loop 
over all available terms in your dictionary and find all the terms to be

prepared for actual searching.

This might be slow. What you might want to speed up the performance is,
you 
can store all the available terms in-memory, and looping through all
unique 
terms is a breeze. This is what google used for their prototype search 
engine when they were way back in the 1998s. (I guess :-)

Cheers,

Jian

On 9/15/05, tirupathi reddy <[EMAIL PROTECTED]> wrote:
> 
> Hi guys,
> 
> I have some problem while searching using Lucene. Say I have some
thing 
> like "tirupathireddy" or "venkatreddy" in the index. When i search for

> string "reddy" I have to get those things (i.e. "tirupathireddy" and 
> "venkatreddy"). I have read in Query syntax of Lucene that * will not
be 
> given at the starting of the search string. SO how can I achiev that.
I am 
> in very much need of that. So please help me out.
> 
> 
> WIth Regards,
> TirupatiReddy Manyam.
> 
> 
> Tirupati Reddy Manyam
> 24-06-08,
> Sundugaullee-24,
> 79110 Freiburg
> GERMANY.
> 
> Phone: 00497618811257
> cell : 004917624649007
> 
> 
> -
> Yahoo! for Good
> Click here to donate to the Hurricane Katrina relief effort.
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Sorting results by both score and date

2005-09-16 Thread Tim.Wright
Hi,

I'm working in an industry which is fairly time sensitive, and older
documents are inherently less valuable. I'd like to be able to "weight"
the score of search results, so that older documents score lower. I
don't just want to sort by date, though - I'd still like results to be
ordered by score, just an "adjusted" score. 

I've read the excellent LIA, including the chapter on custom sort
methods, but from what I can tell that still only implements a sort on
one field - I really want to be able to sort on a "blend" of fields (one
of this is the actual document score). 

Could anyone suggest how I could implement this? I considered explicitly
weighting the documents with a function of their date at index time, but
this would mean the "weight" of the new documents would have to increase
exponentially over time, and I suspect things would get messy! (Our
dataset is around 250k documents, growing by a few thousand a month.)

Cheers,

Tim.




The information contained in this email message may be confidential. If you are 
not the intended recipient, any use, interference with, disclosure or copying 
of this material is unauthorised and prohibited. Although this message and any 
attachments are believed to be free of viruses, no responsibility is accepted 
by T&F Informa for any loss or damage arising in any way from receipt or use 
thereof.  Messages to and from the company are monitored for operational 
reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and 
delete the message and any attachments.  Further enquiries/returns can be sent 
to [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sorting results by both score and date

2005-09-16 Thread Mordo, Aviran (EXP N-NANNATEK)
You can write a query and add a date range to it giving the date field a
boost.

For instance you can do "+content:foo date:[{Today's date} TO null]^5
date:[{Yesterday's Date} TO {Today's Date}]^4 date:[{Last Week's Date}
TO Yesterday's Date}]^3 and so on

Aviran
http://www.aviransplace.com

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 16, 2005 9:43 AM
To: java-user@lucene.apache.org
Subject: Sorting results by both score and date

Hi,

I'm working in an industry which is fairly time sensitive, and older
documents are inherently less valuable. I'd like to be able to "weight"
the score of search results, so that older documents score lower. I
don't just want to sort by date, though - I'd still like results to be
ordered by score, just an "adjusted" score. 

I've read the excellent LIA, including the chapter on custom sort
methods, but from what I can tell that still only implements a sort on
one field - I really want to be able to sort on a "blend" of fields (one
of this is the actual document score). 

Could anyone suggest how I could implement this? I considered explicitly
weighting the documents with a function of their date at index time, but
this would mean the "weight" of the new documents would have to increase
exponentially over time, and I suspect things would get messy! (Our
dataset is around 250k documents, growing by a few thousand a month.)

Cheers,

Tim.





The information contained in this email message may be confidential. If
you are not the intended recipient, any use, interference with,
disclosure or copying of this material is unauthorised and prohibited.
Although this message and any attachments are believed to be free of
viruses, no responsibility is accepted by T&F Informa for any loss or
damage arising in any way from receipt or use thereof.  Messages to and
from the company are monitored for operational reasons and in accordance
with lawful business practices. 
If you have received this message in error, please notify us by return
and delete the message and any attachments.  Further enquiries/returns
can be sent to [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sorting results by both score and date

2005-09-16 Thread Erik Hatcher
Tim, check out p. 155 in LIA where we discuss "Sorting by multiple  
fields".


However, what you're really after it seems is boosting documents.   
Check out TheServerSide's case study (online or in LIA) - Dion  
discusses how he implemented boosting for more recent documents.  If  
you're indexing documents in ascending date order, perhaps you could  
leverage the document id in such a boosting factor?


Erik


On Sep 16, 2005, at 9:43 AM, <[EMAIL PROTECTED]>  
<[EMAIL PROTECTED]> wrote:



Hi,

I'm working in an industry which is fairly time sensitive, and older
documents are inherently less valuable. I'd like to be able to  
"weight"

the score of search results, so that older documents score lower. I
don't just want to sort by date, though - I'd still like results to be
ordered by score, just an "adjusted" score.

I've read the excellent LIA, including the chapter on custom sort
methods, but from what I can tell that still only implements a sort on
one field - I really want to be able to sort on a "blend" of fields  
(one

of this is the actual document score).

Could anyone suggest how I could implement this? I considered  
explicitly
weighting the documents with a function of their date at index  
time, but
this would mean the "weight" of the new documents would have to  
increase

exponentially over time, and I suspect things would get messy! (Our
dataset is around 250k documents, growing by a few thousand a month.)

Cheers,

Tim.



** 
**
The information contained in this email message may be  
confidential. If you are not the intended recipient, any use,  
interference with, disclosure or copying of this material is  
unauthorised and prohibited. Although this message and any  
attachments are believed to be free of viruses, no responsibility  
is accepted by T&F Informa for any loss or damage arising in any  
way from receipt or use thereof.  Messages to and from the company  
are monitored for operational reasons and in accordance with lawful  
business practices.
If you have received this message in error, please notify us by  
return and delete the message and any attachments.  Further  
enquiries/returns can be sent to [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Sorting results by both score and date

2005-09-16 Thread Tim.Wright
Ah - the one bit of LIA I haven't read yet is the case studies section!
Many thanks, I'll check it out. Sorting by multiple fields isn't quite
what I want - that sorts entirely by field A, then uses field B for
records where A is identical, correct? 

What I really want to do is sort by "A * (1-(B/700))", where A is the
score, and B is the age (in days) of the document. IE - the score is
basically "scaled down" with date. 

Cheers,

Tim.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: 16 September 2005 14:54
To: java-user@lucene.apache.org
Subject: Re: Sorting results by both score and date


Tim, check out p. 155 in LIA where we discuss "Sorting by multiple  
fields".

However, what you're really after it seems is boosting documents.   
Check out TheServerSide's case study (online or in LIA) - Dion  
discusses how he implemented boosting for more recent documents.  If  
you're indexing documents in ascending date order, perhaps you could  
leverage the document id in such a boosting factor?

 Erik






The information contained in this email message may be confidential. If you are 
not the intended recipient, any use, interference with, disclosure or copying 
of this material is unauthorised and prohibited. Although this message and any 
attachments are believed to be free of viruses, no responsibility is accepted 
by T&F Informa for any loss or damage arising in any way from receipt or use 
thereof.  Messages to and from the company are monitored for operational 
reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and 
delete the message and any attachments.  Further enquiries/returns can be sent 
to [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Sorting results by both score and date

2005-09-16 Thread Erik Hatcher


On Sep 16, 2005, at 10:14 AM, <[EMAIL PROTECTED]>  
<[EMAIL PROTECTED]> wrote:
Ah - the one bit of LIA I haven't read yet is the case studies  
section!

Many thanks, I'll check it out. Sorting by multiple fields isn't quite
what I want - that sorts entirely by field A, then uses field B for
records where A is identical, correct?


Correct.


What I really want to do is sort by "A * (1-(B/700))", where A is the
score, and B is the age (in days) of the document. IE - the score is
basically "scaled down" with date.


Maybe the TSS case study will help, though they rebuild their index  
nightly and can adjust the boost based on the current day.


I've not come across a really clean way to do this sort of age-based  
boosting other than how TSS does it.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Deleting documents

2005-09-16 Thread Bogdan Munteanu
I have a problem when deleting documents.
Lets say I have a Document object doc.
doc.add(Field.Text("id","index1,DML"));
doc.add(Field.Text("contents","some records"));
IndexWriter.addDocument(doc);
 Now if I want to delete the document with id:index1,DML I do something like 
this:
IndexReader.delete(new Term("id", "index1,DML"));
 And it is not deleted.
 I have debuged it and noticed that lucene compares my "index1,DML" 
parameter with it's internal value "index1,dml".
 So when I do:
 IndexReader.delete(new Term("id", "index1,dml"));
the document is deleted.
 Now please explain me why is there a lower case value for my "id"?
And excuse my poor english!


RE: Sorting results by both score and date

2005-09-16 Thread Tim.Wright
>> What I really want to do is sort by "A * (1-(B/700))", where A is the
>> score, and B is the age (in days) of the document. IE - the score is
>> basically "scaled down" with date.

> Maybe the TSS case study will help, though they rebuild their index  
> nightly and can adjust the boost based on the current day.

Just read this - it looks like the best option for us. I think we could 
get away with only periodically reindexing by just inflating the boost
marginally over time. Are there limits to boost? Any reason we can't 
use a boost of, say, 0.0001 or 10,000? 

Cheers,

Tim.




The information contained in this email message may be confidential. If you are 
not the intended recipient, any use, interference with, disclosure or copying 
of this material is unauthorised and prohibited. Although this message and any 
attachments are believed to be free of viruses, no responsibility is accepted 
by T&F Informa for any loss or damage arising in any way from receipt or use 
thereof.  Messages to and from the company are monitored for operational 
reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and 
delete the message and any attachments.  Further enquiries/returns can be sent 
to [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Deleting documents

2005-09-16 Thread Tim.Wright
If you're indexing a field like this in order to be able to use it as a
reference later, you should normally index it using Field.Keyword
instead of Field.Text - if you use Text, it will go through your
Analyzer, which is probably what's changing the case. (I think this is
right - I'm sure someone will correct me if I'm wrong!)

Cheers,

Tim.

-Original Message-
From: Bogdan Munteanu [mailto:[EMAIL PROTECTED] 
Sent: 16 September 2005 15:40
To: java-user@lucene.apache.org
Subject: Deleting documents


I have a problem when deleting documents.
Lets say I have a Document object doc.
doc.add(Field.Text("id","index1,DML"));
doc.add(Field.Text("contents","some records"));
IndexWriter.addDocument(doc);
 Now if I want to delete the document with id:index1,DML I do something
like 
this:
IndexReader.delete(new Term("id", "index1,DML"));
 And it is not deleted.
 I have debuged it and noticed that lucene compares my "index1,DML" 
parameter with it's internal value "index1,dml".
 So when I do:
 IndexReader.delete(new Term("id", "index1,dml"));
the document is deleted.
 Now please explain me why is there a lower case value for my "id"?
And excuse my poor english!




The information contained in this email message may be confidential. If you are 
not the intended recipient, any use, interference with, disclosure or copying 
of this material is unauthorised and prohibited. Although this message and any 
attachments are believed to be free of viruses, no responsibility is accepted 
by T&F Informa for any loss or damage arising in any way from receipt or use 
thereof.  Messages to and from the company are monitored for operational 
reasons and in accordance with lawful business practices. 
If you have received this message in error, please notify us by return and 
delete the message and any attachments.  Further enquiries/returns can be sent 
to [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Deleting documents

2005-09-16 Thread Mordo, Aviran (EXP N-NANNATEK)
Because when you add a document, the id is going thru an Analyzer, which
in your case uses a low case filter, but when you create a Term object
the term is not lower cased by an Analyzer.

If instead of using Field.Text for your ID, you'll use Keyword, then the
Analyzer will not lower case the ID

HTH

Aviran
http://www.aviransplace.com

-Original Message-
From: Bogdan Munteanu [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 16, 2005 10:40 AM
To: java-user@lucene.apache.org
Subject: Deleting documents

I have a problem when deleting documents.
Lets say I have a Document object doc.
doc.add(Field.Text("id","index1,DML"));
doc.add(Field.Text("contents","some records"));
IndexWriter.addDocument(doc);  Now if I want to delete the document with
id:index1,DML I do something like
this:
IndexReader.delete(new Term("id", "index1,DML"));  And it is not
deleted.
 I have debuged it and noticed that lucene compares my "index1,DML" 
parameter with it's internal value "index1,dml".
 So when I do:
 IndexReader.delete(new Term("id", "index1,dml")); the document is
deleted.
 Now please explain me why is there a lower case value for my "id"?
And excuse my poor english!


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Question: force a field must be matched?

2005-09-16 Thread Erik Hatcher


On Sep 15, 2005, at 12:55 PM, James Huang wrote:

Thanks Jason.

I wonder if that's the same as

  queryString + " publisher:Manning"

and pass on to the query parser?


I will emphasize the other comments made on this regarding the  
Analyzer.  I recommend against programatically adding to the string  
passed to QueryParser because of these types of issues.  You can  
aggregate a parsed expression Query into a BooleanQuery with other  
programmatically created Query objects (such as TermQuery in this case).


Erik





-James

--- Jason Haruska <[EMAIL PROTECTED]> wrote:



On 9/15/05, James Huang <[EMAIL PROTECTED]> wrote:



Suppose I have a book index with


field="publisher", field="title", etc.


I want to search for books only from "Manning", do


I have to do anything


special? how?




add new BooleanClause(new TermQuery(new
Term("publisher","Manning")), true,
false) to your BooleanQuery





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Small problem in searching

2005-09-16 Thread Erik Hatcher
Lucene's WildcardQuery *does* support "postfix" queries - however  
QueryParser does not allow such an expression to pass through.  You  
can create a WildcardQuery with a Term("field", "*whatever") and  
search with that.  All caveats about WildcardQuery, performance, and  
maximum number of boolean clauses apply.


Erik


On Sep 15, 2005, at 12:22 PM, jian chen wrote:


Hi,

I think Lucene transforms the prefix match query into all sub  
queries where
the searching for a prefix could result into search for all terms  
that begin

with that prefix.

For "postfix" match, I think you need to do more work than relying on
Lucene's query parser.

You can iterate over the terms and do an "endsWith()" call, and if  
there is

a match, then, perform a normal Lucene search for that term.

So, effectively, you do the same thing as prefix match,  
conceptually loop
over all available terms in your dictionary and find all the terms  
to be

prepared for actual searching.

This might be slow. What you might want to speed up the performance  
is, you
can store all the available terms in-memory, and looping through  
all unique

terms is a breeze. This is what google used for their prototype search
engine when they were way back in the 1998s. (I guess :-)

Cheers,

Jian

On 9/15/05, tirupathi reddy <[EMAIL PROTECTED]> wrote:



Hi guys,

I have some problem while searching using Lucene. Say I have some  
thing
like "tirupathireddy" or "venkatreddy" in the index. When i search  
for

string "reddy" I have to get those things (i.e. "tirupathireddy" and
"venkatreddy"). I have read in Query syntax of Lucene that * will  
not be
given at the starting of the search string. SO how can I achiev  
that. I am

in very much need of that. So please help me out.


WIth Regards,
TirupatiReddy Manyam.


Tirupati Reddy Manyam
24-06-08,
Sundugaullee-24,
79110 Freiburg
GERMANY.

Phone: 00497618811257
cell : 004917624649007


-
Yahoo! for Good
Click here to donate to the Hurricane Katrina relief effort.






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Text is not indexed when passed as a StringReader

2005-09-16 Thread Matthias Bräuer

Hello,

this question seems to have occured in the mailing list before but I 
wasn't able to find a satisfying answer. So please excuse if I'm asking 
something that has already been discussed.


My problem is as follows:
If I use the Field.Text(String,Reader) method to create an indexed, but 
unstored field and the passed in Reader happens to be a StringReader 
(e.g. when extracting Word documents using the Textmining library) the 
field is not indexed at all. That means Luke shows no terms for this 
field and, consequently, searches do not yield any result. For 
FileReaders, however, everything seems to work fine.


Of course, I could just convert the reader back into a string (e.g. with 
Jakarta Commons IO - IOTools.toString()) and use the 
Unstored(String,String) method but then again it wouldn't make sense to 
use a StringReader in the first place.


Thanks for your help,
Matthias



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Text is not indexed when passed as a StringReader

2005-09-16 Thread Chris Hostetter

I think you may be having another problem somewhere, usinga StringReader
works just fine for me (in fact: when you create a field with
a plain String, it is wrapped in a StringReader to pass to
your analyzer.

Note the following demo works just fine...

public static void main(String[] args) throws Exception {
RAMDirectory index = new RAMDirectory();
IndexWriter writer = new IndexWriter(index,
 new WhitespaceAnalyzer(),
 true);
Document doc = new Document();
doc.add(Field.Text("foo", new StringReader("a b c d")));
writer.addDocument(doc);
writer.close();
IndexSearcher s = new IndexSearcher(IndexReader.open(index));
Hits h = s.search(new TermQuery(new Term("foo","a")));
System.out.println(h.length() == 1 ? "FOUND" : "ERROR");
}






: Date: Sat, 17 Sep 2005 03:51:28 +0800
: From: "[ISO-8859-15] Matthias Bräuer" <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED]
: To: java-user@lucene.apache.org
: Subject: Text is not indexed when passed as a StringReader
:
: Hello,
:
: this question seems to have occured in the mailing list before but I
: wasn't able to find a satisfying answer. So please excuse if I'm asking
: something that has already been discussed.
:
: My problem is as follows:
: If I use the Field.Text(String,Reader) method to create an indexed, but
: unstored field and the passed in Reader happens to be a StringReader
: (e.g. when extracting Word documents using the Textmining library) the
: field is not indexed at all. That means Luke shows no terms for this
: field and, consequently, searches do not yield any result. For
: FileReaders, however, everything seems to work fine.
:
: Of course, I could just convert the reader back into a string (e.g. with
: Jakarta Commons IO - IOTools.toString()) and use the
: Unstored(String,String) method but then again it wouldn't make sense to
: use a StringReader in the first place.
:
: Thanks for your help,
: Matthias
:
:
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



JIRA bug messages

2005-09-16 Thread Yonik Seeley
I just updated a bug via JIRA,
http://issues.apache.org/jira/browse/LUCENE-383
and I didn't see it come to any mailing list like it used to with bugzilla. 
Should it have? Is there a new mailing list to sign up for?

-Yonik
Now hiring -- http://tinyurl.com/7m67g


problems with lucene on a webhost account

2005-09-16 Thread Gasi
Hallo everybody,

 I had a problem with lucene demo on my webhosting account. Because I think 
more people have the same problem,and perhaps somebody will get the same 
problem in the futurek, so now I want describe how I solved it!

Well in my case I used a lucene webdemo on my homepc with windows xp and tomcat 
3.3.2. the lucene webdemo worked perfectly on my homepc. After uploading these 
on a real webserver , it didn't work because for every search I had null 
results. So I found a solution-not a good one-but it works: I indexed my data 
on the webhostingaccount. Of course it is a bad solution, because for big 
amounts of data it is complicated to upload all documents you need.
But for test cases it works. Here are my scripts:

The one for index:

<%@ page import=" 
org.apache.lucene.analysis.Analyzer,org.apache.lucene.analysis.standard.StandardAnalyzer,org.apache.lucene.document.Document,org.apache.lucene.document.Field,org.apache.lucene.index.IndexWriter"
 %>
<%

 
  String[] text = { "index", "lucene","ramon","gasi" };
  String indexDir = "path/onthe/webserver";
  Analyzer analyzer = new StandardAnalyzer();
  boolean create = true;
  IndexWriter writer = new IndexWriter(indexDir, analyzer, create);
  for (int i = 0; i < text.length; i++)
  {
   Document document = new Document();
   document.add(Field.Text("textfeld", text[i]));
   writer.addDocument(document);
  }
  writer.close();
 
%>


The another one for searching:


<%@ page import = "  javax.servlet.*, javax.servlet.http.*, java.io.*, 
org.apache.lucene.analysis.*, org.apache.lucene.document.*, 
org.apache.lucene.index.*, org.apache.lucene.search.*, 
org.apache.lucene.queryParser.*,java.net.URLEncoder" %>
<%

String indexName ="path/onthe/webserver";   //local copy of the 
configuration variable
IndexSearcher searcher = null;  //the searcher used to 
open/search the index
Query query = null;
String myQuery="lucene";
Hits hits = null;   

searcher = new IndexSearcher(IndexReader.open(indexName));
Analyzer analyzer = new StopAnalyzer(); 
query = QueryParser.parse(myQuery,"textfeld",analyzer);
hits = searcher.search(query);
   
if (hits.length() == 0) { 

%>
 Nothing found 
<%
   }
   else
   {
%>
Some results found
<%
for(int i=0;i


This is a very simple example for newbies in lucene, I hope this will be a 
little helpful for somebody.


Greetings



Gaston

Re: Small problem in searching

2005-09-16 Thread tirupathi reddy
Hello,
 
 I read the following statement :
Note: You cannot use a * or ? symbol as the first character of a search.

in this page:  http://lucene.apache.org/java/docs/queryparsersyntax.html

So that's why I thought of that. And at present I am using QueryParser. So it 
is giving error for *reddy*. I am very new to this. And I have to submit my 
application by next week. So please help me how can I use WildcardQuery method 
instead of QueryParser.
 
At this time I have query like:  id:manyam* AND author:*reddy* OR 
title:"measurement procedure".
 
and I am passing it to QueryParser as follows
 
query = QueryParser.parse(query1,"ALL",analyzer);
 
and calling the search method of Searcher class as follows
 
Hits hits = searcher.search(query);
 
So can u please help me to modify this code to use WildcardQuery so that I can 
use *reddy*.
 
  Thanx,
MTREDDY


Tirupati Reddy Manyam 
24-06-08, 
Sundugaullee-24, 
79110 Freiburg 
GERMANY. 

Phone: 00497618811257 
cell : 004917624649007

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

RE: Small problem in searching

2005-09-16 Thread tirupathi reddy
Hello Luc,
 
   You are correct in that case. But if I have a string like manyamreddyvenkat. 
If I want to search for reddy, then I can't get that though I index all the 
entries in the reverse order. Is there any other way.
 
Thanx,
MTREDDY


Tirupati Reddy Manyam 
24-06-08, 
Sundugaullee-24, 
79110 Freiburg 
GERMANY. 

Phone: 00497618811257 
cell : 004917624649007

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Text is not indexed when passed as a StringReader

2005-09-16 Thread Daniel Naber
On Friday 16 September 2005 21:51, Matthias Bräuer wrote:

> but
> unstored field and the passed in Reader happens to be a StringReader
> (e.g. when extracting Word documents using the Textmining library) the
> field is not indexed at all. That means Luke shows no terms for this
> field and, consequently, searches do not yield any result.

Luke only shows terms if the field is *stored* (which it isn't for a 
reader). You need to click the "Reconstruct & Edit" button to see if the 
text really isn't *indexed*.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: problems with lucene on a webhost account

2005-09-16 Thread Daniel Naber
On Friday 16 September 2005 23:32, Gasi wrote:

>  After uploading these on a real webserver , it didn't work because for
> every search I had null results. So I found a solution-not a good
> one-but it works: I indexed my data on the webhostingaccount.

There must have been a different problem. Lucene indexes should be 
system-independent, i.e. it should be possible to index on e.g. Windows 
and upload to Unix or vice versa. Maybe the fields where different in your 
searcher than the ones in the index (see the FAQ at 
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71).

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene database bindings

2005-09-16 Thread markharw00d
I know there have been some posts discussing how to integrate Lucene 
with Derby recently.


I've added an example project that works with both HSQLDB and Derby 
here: http://issues.apache.org/jira/browse/LUCENE-434


The bindings allow you to use SQL that mixes database and Lucene 
functionality in ways like this:


   select top 10 lucene_score(id) as SCORE,
   lucene_highlight(adText) from ads
  where pricePounds <200 and pricePounds >1
  and lucene_query('"drum kit"',id)>0
   order by SCORE DESC, pricePounds ASC

See the readme.txt in the zip file for details.

Cheers,
Mark








___ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: JIRA bug messages

2005-09-16 Thread Paul Elschot
Yonik,

On Friday 16 September 2005 23:30, Yonik Seeley wrote:
> I just updated a bug via JIRA,
> http://issues.apache.org/jira/browse/LUCENE-383
> and I didn't see it come to any mailing list like it used to with bugzilla. 
> Should it have? Is there a new mailing list to sign up for?

I had a similar experience with this (SpanNotQuery not patched,
but previous bug is in fixed status):
http://issues.apache.org/jira/browse/LUCENE-433
and I would also prefer to have a mailing list for changes to
Lucene issues in JIRA.

Btw. the list general@lucene.apache.org might be better for this subject.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]