Thanks it works
On 5/12/09, Mark Miller <markrmil...@gmail.com> wrote: > Use JavaUtilRegexCapabilities or put the Jakarata RegEx jar on your > classpath: http://jakarta.apache.org/regexp/index.html > > -- > - Mark > > http://www.lucidimagination.com > > > > Seid Mohammed wrote: >> I need it similar functionality, but while running the above code it >> breaks after outputing the following >> ======================================================================== >> Added Knowing yourself >> Added Old clinic >> Added INSIDE >> Added Not INSIDE >> >> Default >> regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 >> >> org.apache.lucene.search.regex.javautilregexcapabilit...@0 >> 0 hits for text:.in >> 2 hits for text:.*in >> 0 hits for text:.IN >> 2 hits for text:.*IN >> org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/regexp/RE >> at >> org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32) >> at >> org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47) >> at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59) >> at >> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55) >> at >> org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162) >> at org.apache.lucene.search.Query.weight(Query.java:94) >> at org.apache.lucene.search.Hits.<init>(Hits.java:76) >> at org.apache.lucene.search.Searcher.search(Searcher.java:50) >> at org.apache.lucene.search.Searcher.search(Searcher.java:40) >> at Regex2.main(Regex2.java:43) >> Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:251) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) >> ... 10 more >> =================================================================== >> >> thanks a lot >> >> On 5/11/09, Huntsman84 <tpgarci...@gmail.com> wrote: >> >>> That's it!!! >>> >>> The problem was with the regular expression, the one I need is ".*IN"!! >>> >>> Thank you so much, I was turning mad... =) >>> >>> >>> Ian Lea wrote: >>> >>>> The little self-contained program below runs regex queries for a few >>>> regexps against a few phrases for both the java.util and jakarta >>>> regexp packages. >>>> >>>> Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is >>>> >>>> Added Knowing yourself >>>> Added Old clinic >>>> Added INSIDE >>>> Added Not INSIDE >>>> >>>> Default >>>> regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 >>>> >>>> org.apache.lucene.search.regex.javautilregexcapabilit...@0 >>>> 0 hits for text:.in >>>> 2 hits for text:.*in >>>> 0 hits for text:.IN >>>> 2 hits for text:.*IN >>>> org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 >>>> 2 hits for text:.in >>>> 2 hits for text:.*in >>>> 1 hits for text:.IN >>>> 2 hits for text:.*IN >>>> >>>> Hope that helps. >>>> >>>> -- >>>> Ian. >>>> >>>> >>>> import org.apache.lucene.index.*; >>>> import org.apache.lucene.store.*; >>>> import org.apache.lucene.document.*; >>>> import org.apache.lucene.analysis.*; >>>> import org.apache.lucene.analysis.standard.*; >>>> import org.apache.lucene.search.*; >>>> import org.apache.lucene.search.regex.*; >>>> >>>> public class luctest { >>>> >>>> public static void main(String[] _args) throws Exception { >>>> RAMDirectory rdir = new RAMDirectory(); >>>> IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), >>>> true); >>>> String[] docterms = { "Knowing yourself", >>>> "Old clinic", >>>> "INSIDE", >>>> "Not INSIDE" }; >>>> >>>> for (String s : docterms) { >>>> Document d = new Document(); >>>> d.add(new Field("text", >>>> s, >>>> Field.Store.YES, >>>> Field.Index.NOT_ANALYZED)); >>>> writer.addDocument(d); >>>> System.out.printf("Added %s\n", s); >>>> } >>>> writer.close(); >>>> >>>> IndexSearcher searcher = new IndexSearcher(rdir); >>>> String[] queries = { ".in", ".*in", ".IN", ".*IN" }; >>>> RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(), >>>> new JakartaRegexpCapabilities() }; >>>> RegexQuery qx = new RegexQuery(new Term("x", "x")); >>>> System.out.printf("\nDefault RegexCapabilities=%s\n\n", >>>> qx.getRegexImplementation()); >>>> for (RegexCapabilities rcap : rcaps) { >>>> System.out.println(rcap); >>>> for (String s : queries) { >>>> Term t = new Term("text", s); >>>> RegexQuery q = new RegexQuery(t); >>>> q.setRegexImplementation(rcap); >>>> Hits h = searcher.search(q); >>>> System.out.printf("%s hits for %s\n", >>>> h.length(), >>>> q.toString()); >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarci...@gmail.com> >>>> wrote: >>>> >>>>> The RegexQuery class uses that package, and for that reason the >>>>> expression >>>>> matches. >>>>> >>>>> If my records contained only one word each, this code would work, but I >>>>> need >>>>> to apply that regular expression to a phrase... >>>>> >>>>> >>>>> Ian Lea wrote: >>>>> >>>>>> The default regex package is java.util.regex and I can't see anywhere >>>>>> that you tell it to use the Jakarta regexp package. So I don't think >>>>>> that ".in" will match. Also, you are storing your contents field as >>>>>> NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe >>>>>> this is what you want, but maybe not. >>>>>> >>>>>> >>>>>> -- >>>>>> Ian. >>>>>> >>>>>> >>>>>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarci...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> This is the code for searching: >>>>>>> >>>>>>> String index = "index"; >>>>>>> String field = "contents"; >>>>>>> IndexReader reader = IndexReader.open(index); >>>>>>> Searcher searcher = new IndexSearcher(reader); >>>>>>> >>>>>>> System.out.println("Enter query: "); >>>>>>> String line = ".IN.";//in jakarta regexp this is like * IN * >>>>>>> RegexQuery rxquery = new RegexQuery(new Term(field,line)); >>>>>>> Hits hits = searcher.search(rxquery); >>>>>>> >>>>>>> if(hits!=null){ >>>>>>> for(int k = 0; k<100 && k<hits.length(); k++){ >>>>>>> if(hits.doc(k)!=null) >>>>>>> >>>>>>> System.out.println(hits.doc(k).getField("contents").stringValue()); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> And this is the part of creating the index: >>>>>>> >>>>>>> >>>>>>> File directory = new File("index"); >>>>>>> IndexWriter writer = new IndexWriter(directory, new >>>>>>> StandardAnalyzer(), >>>>>>> true, >>>>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>>>> List<String> records = getRecords();//returns a list of record values >>>>>>> from >>>>>>> database, all of them are phrases >>>>>>> Iterator<String> i = records.iterator(); >>>>>>> while(i.hasNext()){ >>>>>>> Document doc = new Document(); >>>>>>> doc.add(new Field(field, i.next(), Field.Store.YES, >>>>>>> Field.Index.NOT_ANALYZED)); >>>>>>> writer.addDocument(doc); >>>>>>> } >>>>>>> writer.optimize(); >>>>>>> writer.close(); >>>>>>> >>>>>>> >>>>>>> >>>>>>> This code works as I want but just matching with the first word of >>>>>>> the >>>>>>> phrase. I think the problem is the index building, but I don't know >>>>>>> how >>>>>>> to >>>>>>> fix it... >>>>>>> >>>>>>> Any ideas? >>>>>>> >>>>>>> Thank you so much!! >>>>>>> >>>>>>> >>>>>>> >>>>>>> Steven A Rowe wrote: >>>>>>> >>>>>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote: >>>>>>>> >>>>>>>>> I'm surprised that it matches either - don't you need ".*in" where >>>>>>>>> .* >>>>>>>>> means match any character zero or more times? See the javadoc for >>>>>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using >>>>>>>>> that >>>>>>>>> package. >>>>>>>>> >>>>>>>>> Unless you're an expert in regexps it is probably worth playing >>>>>>>>> with >>>>>>>>> them outside your lucene code to start with e.g. with simple >>>>>>>>> String.matches(regexp) calls. They can take some getting used to. >>>>>>>>> And try to avoid anything with backslashes if you can! >>>>>>>>> >>>>>>>> The java.util.regex.Pattern implementation (the default RegexQuery >>>>>>>> implementation) actually uses Matcher.lookingAt(), which is >>>>>>>> equivalent >>>>>>>> to >>>>>>>> prepending a "^" anchor to the beginning of the pattern, so if >>>>>>>> Huntsman84 >>>>>>>> is using the default implementation, then I agree with Ian: I'm >>>>>>>> surprised >>>>>>>> it matches either. >>>>>>>> >>>>>>>> However, the Jakarta Regexp implementation uses RE.match(), which >>>>>>>> does >>>>>>>> *not* require a beginning-of-string match. >>>>>>>> >>>>>>>> Hunstman84, are you using the Jakarta Regexp implementation? If so, >>>>>>>> then >>>>>>>> like you, I'm surprised it's not matching both :). >>>>>>>> >>>>>>>> It would be useful to see some real code, including how you index >>>>>>>> your >>>>>>>> records. >>>>>>>> >>>>>>>> Steve >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I am using RegexQuery for searching in a set of records wich are >>>>>>>>>> phrases of several words each. My aim is to find any phrase that >>>>>>>>>> contains the given group of letters (e.g. "in"). For that case, >>>>>>>>>> I am building the query with the regular expression ".in.", so it >>>>>>>>>> should return all phrases with contain "in", but the search only >>>>>>>>>> matches with the first word of the phrase. >>>>>>>>>> >>>>>>>>>> For example, if my records are "Knowing yourself" and "Old >>>>>>>>>> clinic", the correct search would return 2 matches, but it only >>>>>>>>>> matches with "Knowing yourself". >>>>>>>>>> >>>>>>>>>> How could I fix this? >>>>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html >>>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>> -- >>> View this message in context: >>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- "RABI ZIDNI ILMA" --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org