One way to do it is to use the RegexTermEnum and iterate through your terms manually.
like the following pseudo code: te = RegexTermEnum(reader, Term("^T.*"), regexpCapabilities) while( te.next() ): t = te.term() td = reader.termDocs(t) while(td.next()): freqOfTermInCurrentDoc = td.freq() doc = td.doc() #...do something with it does that make sense to you? simon On Fri, Jan 15, 2010 at 2:35 PM, Altimatic <chris.stuckl...@gmail.com> wrote: > > Hi All, > > I have an application that has to count the frequency that a specific > regular expression is matched on a particular field for each document in an > indexed directory. > > For example. > > Lets say I have 2 documents in the directory and each document has 3 fields, > "table", "column" and "data". > > Example Doc(s): > //*************************************************************** > Document doc1 = new Document(); > doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO, > Field.Index.ANALYZED); > doc1.add(new Field("column", "F_NAME", Field.Store.NO, > Field.Index.ANALYZED); > doc1.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal", > Field.Store.NO, Field.Index.ANALYZED, > Field.TermVector.WITH_POSITIONS_OFFSETS); > > Document doc2 = new Document(); > doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO, > Field.Index.ANALYZED); > doc2.add(new Field("column", "F_NAME", Field.Store.NO, > Field.Index.ANALYZED); > doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna", > Field.Store.NO, Field.Index.ANALYZED, > Field.TermVector.WITH_POSITIONS_OFFSETS); > > //I know I can create a query to search for a regular expression and that > will return each > //document that contains a match. > > IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), > true, > > IndexWriter.MaxFieldLength.LIMITED); > writer.addDocument(doc); > writer.optimize(); > writer.close(); > searcher = new IndexSearcher(directory); > > RegexQuery query = new RegexQuery( newTerm("data", "^T.*)); > ScoreDoc[] hits = searcher.search(query, null, > maxNumOfHits).scoreDocs;//grab the score docs and go through them to find > the documents that contain a match > > //***************************************************** > > > The code above will tell me that both doc1 and doc2 contain a match for the > constructed query. > > However I need to know how many times the regular expression was matched in > each document. ie. > > doc1 = 3 > doc2 = 2 > > I hope I am being clear...and thanks in advance. > > > Cheers > > -- > View this message in context: > http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175303p27175303.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org