Writing and searching same time
Hello, I have trouble with writing and searching on lucene index same time, all I did so far is making a class which has 2 methods: private String indexLocation; public void addDocument(int id,String title, String body) throws IOException{ IndexWriter indexWriter = new IndexWriter(indexLocation, new SimpleAnalyzer(), false); Document doc = new Document(); doc.add(new Field("id",Integer.toString(id),Store.YES,Index.NO)); doc.add(new Field("title",title,Store.NO,Index.TOKENIZED)); doc.add(new Field("body",body,Store.NO,Index.TOKENIZED)); indexWriter.addDocument(doc); indexWriter.close(); } public List search(String query) throws IOException, ParseException{ IndexSearcher indexSearcher = new IndexSearcher(indexLocation); MultiFieldQueryParser queryparser = new MultiFieldQueryParser(new String[]{"title","body"}, new SimpleAnalyzer()); Query q = queryparser.parse(query); Hits hits = indexSearcher.search(q); Iterator it = hits.iterator(); List output = new ArrayList(); while(it.hasNext()){ output.add(Integer.parseInt(((Hit)it.next()).getDocument().get("id"))); } indexSearcher.close(); return output; } What I don't like is that I have in each method opening IndexWriter and IndexSearcher, I try to open them once and keep opened throught whole lifecycle of application (which would be very long cause it would be search for news working as webservice), but when I wasn't close IndexWriter then IndexSearcher wasn't seen any new documents in index. Next step was keeping IndexWriter open and reopen only IndexSearcher but in this case also IndexSearcher was seen old index without new documents. So my final version is this above, but could it be better, without closing IndexWriter after each addition, and opening IndexSearcher before each search query? What is the best pattern of doing such systems? Another question: do I need provide any synchronization on indexWriter.addDocument(doc) method? I see that it isn't synchronized, so maybe programmer need to do it himself? Best regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene change field values to wrong ones when indexing
Hello, I have problem with my search code - i try to index some data with searching simultanously. Everything goes fine till some number of data are indexed then my fields are bugged. Eg. I have field with title indexed as "Nowitzki führt "Mavs" zum ersten Heimsieg" and inner id "15" (not doc id, just field called id). At the end of indexing this field disappear, and some other values in id field appear. I provide full listing of my program, without AXIS part which is responsible for data transmitting. If you can watch my code, maybe somewhere is wrong locking mechanism, or any other bug - please help me if you can. import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import javax.servlet.ServletContext; import javax.servlet.http.HttpServlet; import org.apache.axis.MessageContext; import org.apache.axis.transport.http.HTTPConstants; import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.Field.Index; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.IndexModifier; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.ParseException; import org.apache.lucene.search.BooleanClause; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.RangeQuery; import org.apache.lucene.search.Sort; public class SNewsSearch { private static SNewsSearch search = new SNewsSearch(); private static IndexSearcher indexSearcher; private String indexLocation; private String adminLogin; private String adminPass; private SNewsSearch(){ System.out.println("Constructor SNewsSearch"); Properties props = new Properties(); HttpServlet srv = (HttpServlet) MessageContext.getCurrentContext().getProperty(HTTPConstants.MC_HTTP_SERVLET); ServletContext context = srv.getServletContext (); InputStream is = context.getResourceAsStream("/WEB-INF/lucene.properties"); try { props.load(is); System.out.println("Using index: " + props.getProperty("index.location")); this.indexLocation = props.getProperty("index.location"); this.adminLogin = props.getProperty("admin.login"); this.adminPass = props.getProperty("admin.pass"); } catch (IOException e) { e.printStackTrace(); } } public static SNewsSearch getInstance(){ return search; } public void setIndexLocation(String indexLocation) { this.indexLocation = indexLocation; } private static Object lock = new Object(); public void addDocument(int id,String title, String body, int pubDate) throws IOException{ Document doc = new Document(); doc.add(new Field("id",Integer.toString(id),Store.YES,Index.UN_TOKENIZED)); doc.add(new Field("title",title,Store.NO,Index.TOKENIZED)); doc.add(new Field("body",body,Store.NO,Index.TOKENIZED)); doc.add(new Field("pubDate",Integer.toString(pubDate),Store.NO,Index.UN_TOKENIZED)); synchronized(lock){ IndexWriter indexWriter = new IndexWriter(indexLocation, new SimpleAnalyzer(), false); indexWriter.addDocument(doc); indexWriter.close(); if(indexSearcher!=null) indexSearcher.close(); indexSearcher = null; } } public Map search(String query, int pubDate, int page, int results, boolean reverse) throws IOException, ParseException{ MultiFieldQueryParser queryparser = new MultiFieldQueryParser(new String[]{"title","body"}, new SimpleAnalyzer()); Query q; // = queryparser.parse(query); if(pubDate!=0){ BooleanQuery bq = new BooleanQuery(); bq.add(queryparser.parse(query),BooleanClause.Occur.MUST); bq.add(new RangeQuery( new Term("pubDate",Integer.toString(pubDate)), new Term("pubDate",""), true),BooleanClause.
Speedup indexing process
Hi, Maybe this question is trivial but I need to ask it. I've some problem with indexing large number of documents, and I seek for better solution. Task is to index about 33GB text data CSV (each record about 30kB), it possible of course to index these data but I'm not very happy with timings (about 26 hours), so I want to know how can i speed up this process. First I think about splitting CVS file into smaller ones, eg 5GB and index them on 6 indexing computers, but now is my question - can I join such parts into one index after indexing jobs on each computer is finished? I saw example wit RAMDirectory which could be merged with FSDirectory, but this example was about same IndexWriter, in my case I need some separate IndexWriters on few computers. So does it possible with Lucene? Thx in advance for hints, Adrian
Hitmaps of results (number of results for category/filter or groupping results)
Hello, I'm quite new to Lucene but I pretty amazed with her abilities. My question consider something called commonly as "hitmap" is it possible to do it in Lucene, maybe someone did it already, or maybe even it is built-in? By term hitmap we describe e.g. ability to group results into category, for example when searching eBay you get also an hitmap of results within category (Computers [10], Sports [111], etc.) . So is it possible to do this in Lucene, without huge CPU/time consumption? I appreciate for any hints for this problem, Best Regards Adrian
Searching in paths
Hello, I have problem with indexing / quering paths eg I put "/home/users/apache/txt/qqq__docu.txt" in field called "path", I wanted to submit query to find all documents which are provided by my user apache, so i tried to query Lucene as AND path:/home/users/* but not results were find by such query if I asked any other field without / the results are provided eg AND title natio*. Where am I doing mistake? What I can do to ask for paths (and all what is below of them)? Best Regards, Adr
Re: Searching in paths
On 3/14/06, Mordo, Aviran (EXP N-NANNATEK) <[EMAIL PROTECTED]> wrote: > You need to index the field as a keyword, or use an analyzer that will > not strip the / from the string > > Aviran > http://www.aviransplace.com Field is indexed as Keyword, I was using StandardAnalyzer(), but currently I try to send queries directly via API, so I use this code for doing job: PrefixQuery query1 = null; if(cat!=""){ Term term = new Term("category",cat); query1 = new PrefixQuery(term); Hits hits = is.search(query1); } Variable cat takes path like strings as arguments so eg it gets: Top/World/Poland/ which is translated into: category:Top/World/Poland/* Everything started to work, but I get such Exception: org.apache.lucene.search.BooleanQuery$TooManyClauses org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:79) org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:71) org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:50) org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:166) org.apache.lucene.search.Query.weight(Query.java:84) org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) org.apache.lucene.search.Hits.(Hits.java:43) org.apache.lucene.search.Searcher.search(Searcher.java:33) org.apache.lucene.search.Searcher.search(Searcher.java:27) when I cut the path, so in other words - I don't get exception when query is Top/World/Poland/ but I start to get them when query is Top/World/. I have also getMaxClauseCount() = 1024. Lucene probably try to find all keywords which are good for Top/World/* and put them in boolean query, am I right? If yes, I have problem because from root category there are about 3-4kk children categories, which I also want to search (and show category as url, so user could get deeper in category tree if he wants). What can I do with this, besides set up higher values in maxClauseCount? Best Regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching in paths
Reply to myself hate this :( What about such solution: Split path like string into smaller tokens and index them as seperate words eg: #Top/World/Poland/# #Top/World/# #Top/# so if I ask about word #Top/# I will get all the results for this category, without making so many boolean queries. Is there a better solution for my problem? Best Regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Grouping results by choosen field
Hello, I tried to search myself for soultion, but without any good result, so I want to ask group. My problem concerns result grouping, the best example will be Google search where you have results sorted by relevance, and also grouped by domain (they have little indent/margin). In my project I want to get similar functionality, without very huge CPU consumption. Can you share any helpful hints ? Best Regards, Adr
Re: Grouping results by choosen field
On 3/17/06, Java Programmer <[EMAIL PROTECTED]> wrote: > > Hello, > I tried to search myself for soultion, but without any good result, so I want > to ask group. > My problem concerns result grouping, the best example will be Google search > where you have results sorted by relevance, and also grouped by domain (they > have little indent/margin). In my project I want to get similar > functionality, without very huge CPU consumption. > > Can you share any helpful hints ? > > Best Regards, > Adr > Hello, I have written some code to do sorting form me (it's not perfect, maybe it's even very poor solution, but I'm still learning). So if you have a time please take a look: long sort_start = new Date().getTime(); Map> domains = new HashMap>(); List results = new ArrayList(); int i = 0; while(i()); results.add(url); } domains.get(url).add(i); i++; } long sort_end = new Date().getTime(); so I'm grouping results for domains in Lists to prevent order of score, such ordered groups I put into Map and key of that Map I put into another List to prevent order of most scored domains, so in result I get: - domain A score 1.0 -- domain A score 0.6 - domain B score 0.9 etc. I put this code into servlet (Tomcat 5.5) and it's working but ... when I made first query it take a long time to run whole sorting process eg. 4900 ms, but when I run same query again (eg with paging), it's run very quickly eg 40 ms - why such thing is happen? Is there any optimizing in Lucene, or any kind of caching? When restarting servlet for queries which already were asked results are at once, but new queries always take long time to process. Maybe I'm miss something when read documentation - someone can give me an explanation? Best regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Return all distinct values
Hello, I created small Lucene's application which stores lot of my users infomation, on of it is zipcode in numeric format eg. 50501, 63601 - zip codes are stored in Text fields so they are fully searchable what I want now to do is getting all unique zipcodes which was stored so far. Something like SQL: select distinct zipcode from ... Is it possible? Best regards, Adr - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]