That's it!!! The problem was with the regular expression, the one I need is ".*IN"!!
Thank you so much, I was turning mad... =) Ian Lea wrote: > > The little self-contained program below runs regex queries for a few > regexps against a few phrases for both the java.util and jakarta > regexp packages. > > Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is > > Added Knowing yourself > Added Old clinic > Added INSIDE > Added Not INSIDE > > Default > regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 > > org.apache.lucene.search.regex.javautilregexcapabilit...@0 > 0 hits for text:.in > 2 hits for text:.*in > 0 hits for text:.IN > 2 hits for text:.*IN > org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 > 2 hits for text:.in > 2 hits for text:.*in > 1 hits for text:.IN > 2 hits for text:.*IN > > Hope that helps. > > -- > Ian. > > > import org.apache.lucene.index.*; > import org.apache.lucene.store.*; > import org.apache.lucene.document.*; > import org.apache.lucene.analysis.*; > import org.apache.lucene.analysis.standard.*; > import org.apache.lucene.search.*; > import org.apache.lucene.search.regex.*; > > public class luctest { > > public static void main(String[] _args) throws Exception { > RAMDirectory rdir = new RAMDirectory(); > IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), > true); > String[] docterms = { "Knowing yourself", > "Old clinic", > "INSIDE", > "Not INSIDE" }; > > for (String s : docterms) { > Document d = new Document(); > d.add(new Field("text", > s, > Field.Store.YES, > Field.Index.NOT_ANALYZED)); > writer.addDocument(d); > System.out.printf("Added %s\n", s); > } > writer.close(); > > IndexSearcher searcher = new IndexSearcher(rdir); > String[] queries = { ".in", ".*in", ".IN", ".*IN" }; > RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(), > new JakartaRegexpCapabilities() }; > RegexQuery qx = new RegexQuery(new Term("x", "x")); > System.out.printf("\nDefault RegexCapabilities=%s\n\n", > qx.getRegexImplementation()); > for (RegexCapabilities rcap : rcaps) { > System.out.println(rcap); > for (String s : queries) { > Term t = new Term("text", s); > RegexQuery q = new RegexQuery(t); > q.setRegexImplementation(rcap); > Hits h = searcher.search(q); > System.out.printf("%s hits for %s\n", > h.length(), > q.toString()); > } > } > } > } > > > On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarci...@gmail.com> wrote: >> >> The RegexQuery class uses that package, and for that reason the >> expression >> matches. >> >> If my records contained only one word each, this code would work, but I >> need >> to apply that regular expression to a phrase... >> >> >> Ian Lea wrote: >>> >>> The default regex package is java.util.regex and I can't see anywhere >>> that you tell it to use the Jakarta regexp package. So I don't think >>> that ".in" will match. Also, you are storing your contents field as >>> NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe >>> this is what you want, but maybe not. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarci...@gmail.com> >>> wrote: >>>> >>>> This is the code for searching: >>>> >>>> String index = "index"; >>>> String field = "contents"; >>>> IndexReader reader = IndexReader.open(index); >>>> Searcher searcher = new IndexSearcher(reader); >>>> >>>> System.out.println("Enter query: "); >>>> String line = ".IN.";//in jakarta regexp this is like * IN * >>>> RegexQuery rxquery = new RegexQuery(new Term(field,line)); >>>> Hits hits = searcher.search(rxquery); >>>> >>>> if(hits!=null){ >>>> for(int k = 0; k<100 && k<hits.length(); k++){ >>>> if(hits.doc(k)!=null) >>>> >>>> System.out.println(hits.doc(k).getField("contents").stringValue()); >>>> } >>>> } >>>> >>>> >>>> >>>> And this is the part of creating the index: >>>> >>>> >>>> File directory = new File("index"); >>>> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), >>>> true, >>>> IndexWriter.MaxFieldLength.LIMITED); >>>> List<String> records = getRecords();//returns a list of record values >>>> from >>>> database, all of them are phrases >>>> Iterator<String> i = records.iterator(); >>>> while(i.hasNext()){ >>>> Document doc = new Document(); >>>> doc.add(new Field(field, i.next(), Field.Store.YES, >>>> Field.Index.NOT_ANALYZED)); >>>> writer.addDocument(doc); >>>> } >>>> writer.optimize(); >>>> writer.close(); >>>> >>>> >>>> >>>> This code works as I want but just matching with the first word of the >>>> phrase. I think the problem is the index building, but I don't know how >>>> to >>>> fix it... >>>> >>>> Any ideas? >>>> >>>> Thank you so much!! >>>> >>>> >>>> >>>> Steven A Rowe wrote: >>>>> >>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote: >>>>>> I'm surprised that it matches either - don't you need ".*in" where .* >>>>>> means match any character zero or more times? See the javadoc for >>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that >>>>>> package. >>>>>> >>>>>> Unless you're an expert in regexps it is probably worth playing with >>>>>> them outside your lucene code to start with e.g. with simple >>>>>> String.matches(regexp) calls. They can take some getting used to. >>>>>> And try to avoid anything with backslashes if you can! >>>>> >>>>> The java.util.regex.Pattern implementation (the default RegexQuery >>>>> implementation) actually uses Matcher.lookingAt(), which is equivalent >>>>> to >>>>> prepending a "^" anchor to the beginning of the pattern, so if >>>>> Huntsman84 >>>>> is using the default implementation, then I agree with Ian: I'm >>>>> surprised >>>>> it matches either. >>>>> >>>>> However, the Jakarta Regexp implementation uses RE.match(), which does >>>>> *not* require a beginning-of-string match. >>>>> >>>>> Hunstman84, are you using the Jakarta Regexp implementation? If so, >>>>> then >>>>> like you, I'm surprised it's not matching both :). >>>>> >>>>> It would be useful to see some real code, including how you index your >>>>> records. >>>>> >>>>> Steve >>>>> >>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > Hi, >>>>>> > >>>>>> > I am using RegexQuery for searching in a set of records wich are >>>>>> > phrases of several words each. My aim is to find any phrase that >>>>>> > contains the given group of letters (e.g. "in"). For that case, >>>>>> > I am building the query with the regular expression ".in.", so it >>>>>> > should return all phrases with contain "in", but the search only >>>>>> > matches with the first word of the phrase. >>>>>> > >>>>>> > For example, if my records are "Knowing yourself" and "Old >>>>>> > clinic", the correct search would return 2 matches, but it only >>>>>> > matches with "Knowing yourself". >>>>>> > >>>>>> > How could I fix this? >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org