I need it similar functionality, but while running the above code it breaks after outputing the following ======================================================================== Added Knowing yourself Added Old clinic Added INSIDE Added Not INSIDE
Default regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 org.apache.lucene.search.regex.javautilregexcapabilit...@0 0 hits for text:.in 2 hits for text:.*in 0 hits for text:.IN 2 hits for text:.*IN org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE at org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32) at org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47) at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59) at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162) at org.apache.lucene.search.Query.weight(Query.java:94) at org.apache.lucene.search.Hits.<init>(Hits.java:76) at org.apache.lucene.search.Searcher.search(Searcher.java:50) at org.apache.lucene.search.Searcher.search(Searcher.java:40) at Regex2.main(Regex2.java:43) Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) ... 10 more =================================================================== thanks a lot On 5/11/09, Huntsman84 <tpgarci...@gmail.com> wrote: > > That's it!!! > > The problem was with the regular expression, the one I need is ".*IN"!! > > Thank you so much, I was turning mad... =) > > > Ian Lea wrote: >> >> The little self-contained program below runs regex queries for a few >> regexps against a few phrases for both the java.util and jakarta >> regexp packages. >> >> Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is >> >> Added Knowing yourself >> Added Old clinic >> Added INSIDE >> Added Not INSIDE >> >> Default >> regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 >> >> org.apache.lucene.search.regex.javautilregexcapabilit...@0 >> 0 hits for text:.in >> 2 hits for text:.*in >> 0 hits for text:.IN >> 2 hits for text:.*IN >> org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 >> 2 hits for text:.in >> 2 hits for text:.*in >> 1 hits for text:.IN >> 2 hits for text:.*IN >> >> Hope that helps. >> >> -- >> Ian. >> >> >> import org.apache.lucene.index.*; >> import org.apache.lucene.store.*; >> import org.apache.lucene.document.*; >> import org.apache.lucene.analysis.*; >> import org.apache.lucene.analysis.standard.*; >> import org.apache.lucene.search.*; >> import org.apache.lucene.search.regex.*; >> >> public class luctest { >> >> public static void main(String[] _args) throws Exception { >> RAMDirectory rdir = new RAMDirectory(); >> IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), >> true); >> String[] docterms = { "Knowing yourself", >> "Old clinic", >> "INSIDE", >> "Not INSIDE" }; >> >> for (String s : docterms) { >> Document d = new Document(); >> d.add(new Field("text", >> s, >> Field.Store.YES, >> Field.Index.NOT_ANALYZED)); >> writer.addDocument(d); >> System.out.printf("Added %s\n", s); >> } >> writer.close(); >> >> IndexSearcher searcher = new IndexSearcher(rdir); >> String[] queries = { ".in", ".*in", ".IN", ".*IN" }; >> RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(), >> new JakartaRegexpCapabilities() }; >> RegexQuery qx = new RegexQuery(new Term("x", "x")); >> System.out.printf("\nDefault RegexCapabilities=%s\n\n", >> qx.getRegexImplementation()); >> for (RegexCapabilities rcap : rcaps) { >> System.out.println(rcap); >> for (String s : queries) { >> Term t = new Term("text", s); >> RegexQuery q = new RegexQuery(t); >> q.setRegexImplementation(rcap); >> Hits h = searcher.search(q); >> System.out.printf("%s hits for %s\n", >> h.length(), >> q.toString()); >> } >> } >> } >> } >> >> >> On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarci...@gmail.com> wrote: >>> >>> The RegexQuery class uses that package, and for that reason the >>> expression >>> matches. >>> >>> If my records contained only one word each, this code would work, but I >>> need >>> to apply that regular expression to a phrase... >>> >>> >>> Ian Lea wrote: >>>> >>>> The default regex package is java.util.regex and I can't see anywhere >>>> that you tell it to use the Jakarta regexp package. So I don't think >>>> that ".in" will match. Also, you are storing your contents field as >>>> NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe >>>> this is what you want, but maybe not. >>>> >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarci...@gmail.com> >>>> wrote: >>>>> >>>>> This is the code for searching: >>>>> >>>>> String index = "index"; >>>>> String field = "contents"; >>>>> IndexReader reader = IndexReader.open(index); >>>>> Searcher searcher = new IndexSearcher(reader); >>>>> >>>>> System.out.println("Enter query: "); >>>>> String line = ".IN.";//in jakarta regexp this is like * IN * >>>>> RegexQuery rxquery = new RegexQuery(new Term(field,line)); >>>>> Hits hits = searcher.search(rxquery); >>>>> >>>>> if(hits!=null){ >>>>> for(int k = 0; k<100 && k<hits.length(); k++){ >>>>> if(hits.doc(k)!=null) >>>>> >>>>> System.out.println(hits.doc(k).getField("contents").stringValue()); >>>>> } >>>>> } >>>>> >>>>> >>>>> >>>>> And this is the part of creating the index: >>>>> >>>>> >>>>> File directory = new File("index"); >>>>> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), >>>>> true, >>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>> List<String> records = getRecords();//returns a list of record values >>>>> from >>>>> database, all of them are phrases >>>>> Iterator<String> i = records.iterator(); >>>>> while(i.hasNext()){ >>>>> Document doc = new Document(); >>>>> doc.add(new Field(field, i.next(), Field.Store.YES, >>>>> Field.Index.NOT_ANALYZED)); >>>>> writer.addDocument(doc); >>>>> } >>>>> writer.optimize(); >>>>> writer.close(); >>>>> >>>>> >>>>> >>>>> This code works as I want but just matching with the first word of the >>>>> phrase. I think the problem is the index building, but I don't know how >>>>> to >>>>> fix it... >>>>> >>>>> Any ideas? >>>>> >>>>> Thank you so much!! >>>>> >>>>> >>>>> >>>>> Steven A Rowe wrote: >>>>>> >>>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote: >>>>>>> I'm surprised that it matches either - don't you need ".*in" where .* >>>>>>> means match any character zero or more times? See the javadoc for >>>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that >>>>>>> package. >>>>>>> >>>>>>> Unless you're an expert in regexps it is probably worth playing with >>>>>>> them outside your lucene code to start with e.g. with simple >>>>>>> String.matches(regexp) calls. They can take some getting used to. >>>>>>> And try to avoid anything with backslashes if you can! >>>>>> >>>>>> The java.util.regex.Pattern implementation (the default RegexQuery >>>>>> implementation) actually uses Matcher.lookingAt(), which is equivalent >>>>>> to >>>>>> prepending a "^" anchor to the beginning of the pattern, so if >>>>>> Huntsman84 >>>>>> is using the default implementation, then I agree with Ian: I'm >>>>>> surprised >>>>>> it matches either. >>>>>> >>>>>> However, the Jakarta Regexp implementation uses RE.match(), which does >>>>>> *not* require a beginning-of-string match. >>>>>> >>>>>> Hunstman84, are you using the Jakarta Regexp implementation? If so, >>>>>> then >>>>>> like you, I'm surprised it's not matching both :). >>>>>> >>>>>> It would be useful to see some real code, including how you index your >>>>>> records. >>>>>> >>>>>> Steve >>>>>> >>>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com> >>>>>>> wrote: >>>>>>> > >>>>>>> > Hi, >>>>>>> > >>>>>>> > I am using RegexQuery for searching in a set of records wich are >>>>>>> > phrases of several words each. My aim is to find any phrase that >>>>>>> > contains the given group of letters (e.g. "in"). For that case, >>>>>>> > I am building the query with the regular expression ".in.", so it >>>>>>> > should return all phrases with contain "in", but the search only >>>>>>> > matches with the first word of the phrase. >>>>>>> > >>>>>>> > For example, if my records are "Knowing yourself" and "Old >>>>>>> > clinic", the correct search would return 2 matches, but it only >>>>>>> > matches with "Knowing yourself". >>>>>>> > >>>>>>> > How could I fix this? >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > -- > View this message in context: > http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- "RABI ZIDNI ILMA" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org