Re: Terms given a filter?
Erik, It may be worth looking at the code here: http://issues.apache.org/jira/browse/LUCENE-328 The Bitsets in your example are likely to be very sparse (I imagine you know only too well how long it takes to write a book and therefore how many books there are likely to be per author! :))With such a sparse set per author BitSets could use a lot of memory. In this example I imagine a SortedVIntList per author would be a much more compact format. The code in the link contains a standard interface for a sorted list of ints with bitset,int array and VInt encoded implementations. The AndDocNrSkipper and OrDocNrSkipper classes can be used to perform set intersections on any combination of these int sets. Cheers, Mark ___ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Question: force a field must be matched?
On Thu, 2005-09-15 at 11:56 -0700, James Huang wrote: > Yes, "+" is what I missed! Thanks. > > Suppose there is a book published by 3 publishers (I > don't know how that works in real world): > > // At index time: > doc.add( Field.Keyword("publisher", "Manning") ); > doc.add( Field.Keyword("publisher", "SAMS") ); > doc.add( Field.Keyword("publisher", "O'Reilly") ); > > // At search time: > queryString += " +publisher:SAMS"; > ... > > should find me that Document. That may or may not work depending on your analyzer. If you're using the query parser with the standard analyzer it will search the 'publisher' field for 'sams' not 'SAMS', and hence get no matches back. If you want to use the query parser instead of building the query by hand you can use the PerFieldAnalyzerWrapper class and write a KeywordAnalyzer, i.e.: package org.apache.lucene.analysis; import java.io.IOException; import java.io.Reader; /** "Tokenizes" the entire stream as a single token. */ public class KeywordAnalyzer extends Analyzer { public TokenStream tokenStream(String fieldName, final Reader reader) { return new TokenStream() { private boolean done; private final char[] buffer = new char[1024]; public Token next() throws IOException { if (!done) { done = true; StringBuffer sb = new StringBuffer(); int length; while (true) { length = reader.read(this.buffer); if (length == -1) break; sb.append(this.buffer, 0, length); } String text = sb.toString(); return new Token(text, 0, text.length()); } return null; } }; } } PerFieldAnalyzerWrapper result = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); result.addAnalyzer("publisher", new KeywordAnalyzer()); QueryParser parser = new QueryParser(, result); -- Miles Barr <[EMAIL PROTECTED]> Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
some general question about Nutch Search engine.
Hi Thank you for reading my post I have some general question : 1-does Nutch support multilanguage indexing and searching ? 2-does it has capability to index and search more than 500,000 site in a timely manner? 3-does it have capabilities to add ADs System , sponsored result first and other features that for example google search engine has? 4-does licensing allow me to use/modefy it for my own purpose without sharing the source ? 5-does its robot support site list / domainextension list (for example searching and indexing all UK extension) Thank you.
Re: some general question about Nutch Search engine.
Legolas Woodland wrote: Hi Thank you for reading my post I have some general question : Please see http://nutch.org for information about Nutch. 1-does Nutch support multilanguage indexing and searching ? Yes, to large degree (there are always issues when making assumptions about the query language). 2-does it has capability to index and search more than 500,000 site in a timely manner? Sure, no problem. I typically work with instances that collect data from 5 mln pages, others run installations that have ~100 mln pages. 3-does it have capabilities to add ADs System , sponsored result first and other features that for example google search engine has? Requires coding, but not so complicated. 4-does licensing allow me to use/modefy it for my own purpose without sharing the source ? Yes, ASL-2.0 license, same as Lucene. 5-does its robot support site list / domainextension list (for example searching and indexing all UK extension) Yes. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Small problem in searching
You could also add a field with all the terms reversed during the indexation. So documents containing "tirupathireddy" or "venkatreddy" would have "ydderihtapurit" and "yddertaknev" in the reversed field. If you detect that the user entered a suffix query like "*reddy", transform it into a prefix query like "ydder*" on the reversed field. Luc -Original Message- From: jian chen [mailto:[EMAIL PROTECTED] Sent: donderdag 15 september 2005 18:22 To: java-user@lucene.apache.org Subject: Re: Small problem in searching Hi, I think Lucene transforms the prefix match query into all sub queries where the searching for a prefix could result into search for all terms that begin with that prefix. For "postfix" match, I think you need to do more work than relying on Lucene's query parser. You can iterate over the terms and do an "endsWith()" call, and if there is a match, then, perform a normal Lucene search for that term. So, effectively, you do the same thing as prefix match, conceptually loop over all available terms in your dictionary and find all the terms to be prepared for actual searching. This might be slow. What you might want to speed up the performance is, you can store all the available terms in-memory, and looping through all unique terms is a breeze. This is what google used for their prototype search engine when they were way back in the 1998s. (I guess :-) Cheers, Jian On 9/15/05, tirupathi reddy <[EMAIL PROTECTED]> wrote: > > Hi guys, > > I have some problem while searching using Lucene. Say I have some thing > like "tirupathireddy" or "venkatreddy" in the index. When i search for > string "reddy" I have to get those things (i.e. "tirupathireddy" and > "venkatreddy"). I have read in Query syntax of Lucene that * will not be > given at the starting of the search string. SO how can I achiev that. I am > in very much need of that. So please help me out. > > > WIth Regards, > TirupatiReddy Manyam. > > > Tirupati Reddy Manyam > 24-06-08, > Sundugaullee-24, > 79110 Freiburg > GERMANY. > > Phone: 00497618811257 > cell : 004917624649007 > > > - > Yahoo! for Good > Click here to donate to the Hurricane Katrina relief effort. > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Sorting results by both score and date
Hi, I'm working in an industry which is fairly time sensitive, and older documents are inherently less valuable. I'd like to be able to "weight" the score of search results, so that older documents score lower. I don't just want to sort by date, though - I'd still like results to be ordered by score, just an "adjusted" score. I've read the excellent LIA, including the chapter on custom sort methods, but from what I can tell that still only implements a sort on one field - I really want to be able to sort on a "blend" of fields (one of this is the actual document score). Could anyone suggest how I could implement this? I considered explicitly weighting the documents with a function of their date at index time, but this would mean the "weight" of the new documents would have to increase exponentially over time, and I suspect things would get messy! (Our dataset is around 250k documents, growing by a few thousand a month.) Cheers, Tim. The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting results by both score and date
You can write a query and add a date range to it giving the date field a boost. For instance you can do "+content:foo date:[{Today's date} TO null]^5 date:[{Yesterday's Date} TO {Today's Date}]^4 date:[{Last Week's Date} TO Yesterday's Date}]^3 and so on Aviran http://www.aviransplace.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 16, 2005 9:43 AM To: java-user@lucene.apache.org Subject: Sorting results by both score and date Hi, I'm working in an industry which is fairly time sensitive, and older documents are inherently less valuable. I'd like to be able to "weight" the score of search results, so that older documents score lower. I don't just want to sort by date, though - I'd still like results to be ordered by score, just an "adjusted" score. I've read the excellent LIA, including the chapter on custom sort methods, but from what I can tell that still only implements a sort on one field - I really want to be able to sort on a "blend" of fields (one of this is the actual document score). Could anyone suggest how I could implement this? I considered explicitly weighting the documents with a function of their date at index time, but this would mean the "weight" of the new documents would have to increase exponentially over time, and I suspect things would get messy! (Our dataset is around 250k documents, growing by a few thousand a month.) Cheers, Tim. The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sorting results by both score and date
Tim, check out p. 155 in LIA where we discuss "Sorting by multiple fields". However, what you're really after it seems is boosting documents. Check out TheServerSide's case study (online or in LIA) - Dion discusses how he implemented boosting for more recent documents. If you're indexing documents in ascending date order, perhaps you could leverage the document id in such a boosting factor? Erik On Sep 16, 2005, at 9:43 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: Hi, I'm working in an industry which is fairly time sensitive, and older documents are inherently less valuable. I'd like to be able to "weight" the score of search results, so that older documents score lower. I don't just want to sort by date, though - I'd still like results to be ordered by score, just an "adjusted" score. I've read the excellent LIA, including the chapter on custom sort methods, but from what I can tell that still only implements a sort on one field - I really want to be able to sort on a "blend" of fields (one of this is the actual document score). Could anyone suggest how I could implement this? I considered explicitly weighting the documents with a function of their date at index time, but this would mean the "weight" of the new documents would have to increase exponentially over time, and I suspect things would get messy! (Our dataset is around 250k documents, growing by a few thousand a month.) Cheers, Tim. ** ** The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Sorting results by both score and date
Ah - the one bit of LIA I haven't read yet is the case studies section! Many thanks, I'll check it out. Sorting by multiple fields isn't quite what I want - that sorts entirely by field A, then uses field B for records where A is identical, correct? What I really want to do is sort by "A * (1-(B/700))", where A is the score, and B is the age (in days) of the document. IE - the score is basically "scaled down" with date. Cheers, Tim. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 16 September 2005 14:54 To: java-user@lucene.apache.org Subject: Re: Sorting results by both score and date Tim, check out p. 155 in LIA where we discuss "Sorting by multiple fields". However, what you're really after it seems is boosting documents. Check out TheServerSide's case study (online or in LIA) - Dion discusses how he implemented boosting for more recent documents. If you're indexing documents in ascending date order, perhaps you could leverage the document id in such a boosting factor? Erik The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Sorting results by both score and date
On Sep 16, 2005, at 10:14 AM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: Ah - the one bit of LIA I haven't read yet is the case studies section! Many thanks, I'll check it out. Sorting by multiple fields isn't quite what I want - that sorts entirely by field A, then uses field B for records where A is identical, correct? Correct. What I really want to do is sort by "A * (1-(B/700))", where A is the score, and B is the age (in days) of the document. IE - the score is basically "scaled down" with date. Maybe the TSS case study will help, though they rebuild their index nightly and can adjust the boost based on the current day. I've not come across a really clean way to do this sort of age-based boosting other than how TSS does it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Deleting documents
I have a problem when deleting documents. Lets say I have a Document object doc. doc.add(Field.Text("id","index1,DML")); doc.add(Field.Text("contents","some records")); IndexWriter.addDocument(doc); Now if I want to delete the document with id:index1,DML I do something like this: IndexReader.delete(new Term("id", "index1,DML")); And it is not deleted. I have debuged it and noticed that lucene compares my "index1,DML" parameter with it's internal value "index1,dml". So when I do: IndexReader.delete(new Term("id", "index1,dml")); the document is deleted. Now please explain me why is there a lower case value for my "id"? And excuse my poor english!
RE: Sorting results by both score and date
>> What I really want to do is sort by "A * (1-(B/700))", where A is the >> score, and B is the age (in days) of the document. IE - the score is >> basically "scaled down" with date. > Maybe the TSS case study will help, though they rebuild their index > nightly and can adjust the boost based on the current day. Just read this - it looks like the best option for us. I think we could get away with only periodically reindexing by just inflating the boost marginally over time. Are there limits to boost? Any reason we can't use a boost of, say, 0.0001 or 10,000? Cheers, Tim. The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Deleting documents
If you're indexing a field like this in order to be able to use it as a reference later, you should normally index it using Field.Keyword instead of Field.Text - if you use Text, it will go through your Analyzer, which is probably what's changing the case. (I think this is right - I'm sure someone will correct me if I'm wrong!) Cheers, Tim. -Original Message- From: Bogdan Munteanu [mailto:[EMAIL PROTECTED] Sent: 16 September 2005 15:40 To: java-user@lucene.apache.org Subject: Deleting documents I have a problem when deleting documents. Lets say I have a Document object doc. doc.add(Field.Text("id","index1,DML")); doc.add(Field.Text("contents","some records")); IndexWriter.addDocument(doc); Now if I want to delete the document with id:index1,DML I do something like this: IndexReader.delete(new Term("id", "index1,DML")); And it is not deleted. I have debuged it and noticed that lucene compares my "index1,DML" parameter with it's internal value "index1,dml". So when I do: IndexReader.delete(new Term("id", "index1,dml")); the document is deleted. Now please explain me why is there a lower case value for my "id"? And excuse my poor english! The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by T&F Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Deleting documents
Because when you add a document, the id is going thru an Analyzer, which in your case uses a low case filter, but when you create a Term object the term is not lower cased by an Analyzer. If instead of using Field.Text for your ID, you'll use Keyword, then the Analyzer will not lower case the ID HTH Aviran http://www.aviransplace.com -Original Message- From: Bogdan Munteanu [mailto:[EMAIL PROTECTED] Sent: Friday, September 16, 2005 10:40 AM To: java-user@lucene.apache.org Subject: Deleting documents I have a problem when deleting documents. Lets say I have a Document object doc. doc.add(Field.Text("id","index1,DML")); doc.add(Field.Text("contents","some records")); IndexWriter.addDocument(doc); Now if I want to delete the document with id:index1,DML I do something like this: IndexReader.delete(new Term("id", "index1,DML")); And it is not deleted. I have debuged it and noticed that lucene compares my "index1,DML" parameter with it's internal value "index1,dml". So when I do: IndexReader.delete(new Term("id", "index1,dml")); the document is deleted. Now please explain me why is there a lower case value for my "id"? And excuse my poor english! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Question: force a field must be matched?
On Sep 15, 2005, at 12:55 PM, James Huang wrote: Thanks Jason. I wonder if that's the same as queryString + " publisher:Manning" and pass on to the query parser? I will emphasize the other comments made on this regarding the Analyzer. I recommend against programatically adding to the string passed to QueryParser because of these types of issues. You can aggregate a parsed expression Query into a BooleanQuery with other programmatically created Query objects (such as TermQuery in this case). Erik -James --- Jason Haruska <[EMAIL PROTECTED]> wrote: On 9/15/05, James Huang <[EMAIL PROTECTED]> wrote: Suppose I have a book index with field="publisher", field="title", etc. I want to search for books only from "Manning", do I have to do anything special? how? add new BooleanClause(new TermQuery(new Term("publisher","Manning")), true, false) to your BooleanQuery __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Small problem in searching
Lucene's WildcardQuery *does* support "postfix" queries - however QueryParser does not allow such an expression to pass through. You can create a WildcardQuery with a Term("field", "*whatever") and search with that. All caveats about WildcardQuery, performance, and maximum number of boolean clauses apply. Erik On Sep 15, 2005, at 12:22 PM, jian chen wrote: Hi, I think Lucene transforms the prefix match query into all sub queries where the searching for a prefix could result into search for all terms that begin with that prefix. For "postfix" match, I think you need to do more work than relying on Lucene's query parser. You can iterate over the terms and do an "endsWith()" call, and if there is a match, then, perform a normal Lucene search for that term. So, effectively, you do the same thing as prefix match, conceptually loop over all available terms in your dictionary and find all the terms to be prepared for actual searching. This might be slow. What you might want to speed up the performance is, you can store all the available terms in-memory, and looping through all unique terms is a breeze. This is what google used for their prototype search engine when they were way back in the 1998s. (I guess :-) Cheers, Jian On 9/15/05, tirupathi reddy <[EMAIL PROTECTED]> wrote: Hi guys, I have some problem while searching using Lucene. Say I have some thing like "tirupathireddy" or "venkatreddy" in the index. When i search for string "reddy" I have to get those things (i.e. "tirupathireddy" and "venkatreddy"). I have read in Query syntax of Lucene that * will not be given at the starting of the search string. SO how can I achiev that. I am in very much need of that. So please help me out. WIth Regards, TirupatiReddy Manyam. Tirupati Reddy Manyam 24-06-08, Sundugaullee-24, 79110 Freiburg GERMANY. Phone: 00497618811257 cell : 004917624649007 - Yahoo! for Good Click here to donate to the Hurricane Katrina relief effort. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Text is not indexed when passed as a StringReader
Hello, this question seems to have occured in the mailing list before but I wasn't able to find a satisfying answer. So please excuse if I'm asking something that has already been discussed. My problem is as follows: If I use the Field.Text(String,Reader) method to create an indexed, but unstored field and the passed in Reader happens to be a StringReader (e.g. when extracting Word documents using the Textmining library) the field is not indexed at all. That means Luke shows no terms for this field and, consequently, searches do not yield any result. For FileReaders, however, everything seems to work fine. Of course, I could just convert the reader back into a string (e.g. with Jakarta Commons IO - IOTools.toString()) and use the Unstored(String,String) method but then again it wouldn't make sense to use a StringReader in the first place. Thanks for your help, Matthias - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Text is not indexed when passed as a StringReader
I think you may be having another problem somewhere, usinga StringReader works just fine for me (in fact: when you create a field with a plain String, it is wrapped in a StringReader to pass to your analyzer. Note the following demo works just fine... public static void main(String[] args) throws Exception { RAMDirectory index = new RAMDirectory(); IndexWriter writer = new IndexWriter(index, new WhitespaceAnalyzer(), true); Document doc = new Document(); doc.add(Field.Text("foo", new StringReader("a b c d"))); writer.addDocument(doc); writer.close(); IndexSearcher s = new IndexSearcher(IndexReader.open(index)); Hits h = s.search(new TermQuery(new Term("foo","a"))); System.out.println(h.length() == 1 ? "FOUND" : "ERROR"); } : Date: Sat, 17 Sep 2005 03:51:28 +0800 : From: "[ISO-8859-15] Matthias Bräuer" <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, [EMAIL PROTECTED] : To: java-user@lucene.apache.org : Subject: Text is not indexed when passed as a StringReader : : Hello, : : this question seems to have occured in the mailing list before but I : wasn't able to find a satisfying answer. So please excuse if I'm asking : something that has already been discussed. : : My problem is as follows: : If I use the Field.Text(String,Reader) method to create an indexed, but : unstored field and the passed in Reader happens to be a StringReader : (e.g. when extracting Word documents using the Textmining library) the : field is not indexed at all. That means Luke shows no terms for this : field and, consequently, searches do not yield any result. For : FileReaders, however, everything seems to work fine. : : Of course, I could just convert the reader back into a string (e.g. with : Jakarta Commons IO - IOTools.toString()) and use the : Unstored(String,String) method but then again it wouldn't make sense to : use a StringReader in the first place. : : Thanks for your help, : Matthias : : : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
JIRA bug messages
I just updated a bug via JIRA, http://issues.apache.org/jira/browse/LUCENE-383 and I didn't see it come to any mailing list like it used to with bugzilla. Should it have? Is there a new mailing list to sign up for? -Yonik Now hiring -- http://tinyurl.com/7m67g
problems with lucene on a webhost account
Hallo everybody, I had a problem with lucene demo on my webhosting account. Because I think more people have the same problem,and perhaps somebody will get the same problem in the futurek, so now I want describe how I solved it! Well in my case I used a lucene webdemo on my homepc with windows xp and tomcat 3.3.2. the lucene webdemo worked perfectly on my homepc. After uploading these on a real webserver , it didn't work because for every search I had null results. So I found a solution-not a good one-but it works: I indexed my data on the webhostingaccount. Of course it is a bad solution, because for big amounts of data it is complicated to upload all documents you need. But for test cases it works. Here are my scripts: The one for index: <%@ page import=" org.apache.lucene.analysis.Analyzer,org.apache.lucene.analysis.standard.StandardAnalyzer,org.apache.lucene.document.Document,org.apache.lucene.document.Field,org.apache.lucene.index.IndexWriter" %> <% String[] text = { "index", "lucene","ramon","gasi" }; String indexDir = "path/onthe/webserver"; Analyzer analyzer = new StandardAnalyzer(); boolean create = true; IndexWriter writer = new IndexWriter(indexDir, analyzer, create); for (int i = 0; i < text.length; i++) { Document document = new Document(); document.add(Field.Text("textfeld", text[i])); writer.addDocument(document); } writer.close(); %> The another one for searching: <%@ page import = " javax.servlet.*, javax.servlet.http.*, java.io.*, org.apache.lucene.analysis.*, org.apache.lucene.document.*, org.apache.lucene.index.*, org.apache.lucene.search.*, org.apache.lucene.queryParser.*,java.net.URLEncoder" %> <% String indexName ="path/onthe/webserver"; //local copy of the configuration variable IndexSearcher searcher = null; //the searcher used to open/search the index Query query = null; String myQuery="lucene"; Hits hits = null; searcher = new IndexSearcher(IndexReader.open(indexName)); Analyzer analyzer = new StopAnalyzer(); query = QueryParser.parse(myQuery,"textfeld",analyzer); hits = searcher.search(query); if (hits.length() == 0) { %> Nothing found <% } else { %> Some results found <% for(int i=0;i This is a very simple example for newbies in lucene, I hope this will be a little helpful for somebody. Greetings Gaston
Re: Small problem in searching
Hello, I read the following statement : Note: You cannot use a * or ? symbol as the first character of a search. in this page: http://lucene.apache.org/java/docs/queryparsersyntax.html So that's why I thought of that. And at present I am using QueryParser. So it is giving error for *reddy*. I am very new to this. And I have to submit my application by next week. So please help me how can I use WildcardQuery method instead of QueryParser. At this time I have query like: id:manyam* AND author:*reddy* OR title:"measurement procedure". and I am passing it to QueryParser as follows query = QueryParser.parse(query1,"ALL",analyzer); and calling the search method of Searcher class as follows Hits hits = searcher.search(query); So can u please help me to modify this code to use WildcardQuery so that I can use *reddy*. Thanx, MTREDDY Tirupati Reddy Manyam 24-06-08, Sundugaullee-24, 79110 Freiburg GERMANY. Phone: 00497618811257 cell : 004917624649007 __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
RE: Small problem in searching
Hello Luc, You are correct in that case. But if I have a string like manyamreddyvenkat. If I want to search for reddy, then I can't get that though I index all the entries in the reverse order. Is there any other way. Thanx, MTREDDY Tirupati Reddy Manyam 24-06-08, Sundugaullee-24, 79110 Freiburg GERMANY. Phone: 00497618811257 cell : 004917624649007 __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Text is not indexed when passed as a StringReader
On Friday 16 September 2005 21:51, Matthias Bräuer wrote: > but > unstored field and the passed in Reader happens to be a StringReader > (e.g. when extracting Word documents using the Textmining library) the > field is not indexed at all. That means Luke shows no terms for this > field and, consequently, searches do not yield any result. Luke only shows terms if the field is *stored* (which it isn't for a reader). You need to click the "Reconstruct & Edit" button to see if the text really isn't *indexed*. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: problems with lucene on a webhost account
On Friday 16 September 2005 23:32, Gasi wrote: > After uploading these on a real webserver , it didn't work because for > every search I had null results. So I found a solution-not a good > one-but it works: I indexed my data on the webhostingaccount. There must have been a different problem. Lucene indexes should be system-independent, i.e. it should be possible to index on e.g. Windows and upload to Unix or vice versa. Maybe the fields where different in your searcher than the ones in the index (see the FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71). Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene database bindings
I know there have been some posts discussing how to integrate Lucene with Derby recently. I've added an example project that works with both HSQLDB and Derby here: http://issues.apache.org/jira/browse/LUCENE-434 The bindings allow you to use SQL that mixes database and Lucene functionality in ways like this: select top 10 lucene_score(id) as SCORE, lucene_highlight(adText) from ads where pricePounds <200 and pricePounds >1 and lucene_query('"drum kit"',id)>0 order by SCORE DESC, pricePounds ASC See the readme.txt in the zip file for details. Cheers, Mark ___ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JIRA bug messages
Yonik, On Friday 16 September 2005 23:30, Yonik Seeley wrote: > I just updated a bug via JIRA, > http://issues.apache.org/jira/browse/LUCENE-383 > and I didn't see it come to any mailing list like it used to with bugzilla. > Should it have? Is there a new mailing list to sign up for? I had a similar experience with this (SpanNotQuery not patched, but previous bug is in fixed status): http://issues.apache.org/jira/browse/LUCENE-433 and I would also prefer to have a mailing list for changes to Lucene issues in JIRA. Btw. the list general@lucene.apache.org might be better for this subject. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]