This is the code for searching: String index = "index"; String field = "contents"; IndexReader reader = IndexReader.open(index); Searcher searcher = new IndexSearcher(reader);
System.out.println("Enter query: "); String line = ".IN.";//in jakarta regexp this is like * IN * RegexQuery rxquery = new RegexQuery(new Term(field,line)); Hits hits = searcher.search(rxquery); if(hits!=null){ for(int k = 0; k<100 && k<hits.length(); k++){ if(hits.doc(k)!=null) System.out.println(hits.doc(k).getField("contents").stringValue()); } } And this is the part of creating the index: File directory = new File("index"); IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED); List<String> records = getRecords();//returns a list of record values from database, all of them are phrases Iterator<String> i = records.iterator(); while(i.hasNext()){ Document doc = new Document(); doc.add(new Field(field, i.next(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } writer.optimize(); writer.close(); This code works as I want but just matching with the first word of the phrase. I think the problem is the index building, but I don't know how to fix it... Any ideas? Thank you so much!! Steven A Rowe wrote: > > On 5/8/2009 at 9:13 AM, Ian Lee wrote: >> I'm surprised that it matches either - don't you need ".*in" where .* >> means match any character zero or more times? See the javadoc for >> java.util.regex.Pattern, or for Jakarta Regexp if you are using that >> package. >> >> Unless you're an expert in regexps it is probably worth playing with >> them outside your lucene code to start with e.g. with simple >> String.matches(regexp) calls. They can take some getting used to. >> And try to avoid anything with backslashes if you can! > > The java.util.regex.Pattern implementation (the default RegexQuery > implementation) actually uses Matcher.lookingAt(), which is equivalent to > prepending a "^" anchor to the beginning of the pattern, so if Huntsman84 > is using the default implementation, then I agree with Ian: I'm surprised > it matches either. > > However, the Jakarta Regexp implementation uses RE.match(), which does > *not* require a beginning-of-string match. > > Hunstman84, are you using the Jakarta Regexp implementation? If so, then > like you, I'm surprised it's not matching both :). > > It would be useful to see some real code, including how you index your > records. > > Steve > >> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com> >> wrote: >> > >> > Hi, >> > >> > I am using RegexQuery for searching in a set of records wich are >> > phrases of several words each. My aim is to find any phrase that >> > contains the given group of letters (e.g. "in"). For that case, >> > I am building the query with the regular expression ".in.", so it >> > should return all phrases with contain "in", but the search only >> > matches with the first word of the phrase. >> > >> > For example, if my records are "Knowing yourself" and "Old >> > clinic", the correct search would return 2 matches, but it only >> > matches with "Knowing yourself". >> > >> > How could I fix this? > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org