I think so. I'll try to write up a quick demo app to see if it will work for what I require.
Thanks for the prompt reply. Simon Willnauer wrote: > > One way to do it is to use the RegexTermEnum and iterate through your > terms manually. > > like the following pseudo code: > te = RegexTermEnum(reader, Term("^T.*"), regexpCapabilities) > while( te.next() ): > t = te.term() > td = reader.termDocs(t) > while(td.next()): > freqOfTermInCurrentDoc = td.freq() > doc = td.doc() > #...do something with it > > does that make sense to you? > > simon > > > On Fri, Jan 15, 2010 at 2:35 PM, Altimatic <chris.stuckl...@gmail.com> > wrote: >> >> Hi All, >> >> I have an application that has to count the frequency that a specific >> regular expression is matched on a particular field for each document in >> an >> indexed directory. >> >> For example. >> >> Lets say I have 2 documents in the directory and each document has 3 >> fields, >> "table", "column" and "data". >> >> Example Doc(s): >> //*************************************************************** >> Document doc1 = new Document(); >> doc1.add(new Field("table", "EMPLOYEE_US", Field.Store.NO, >> Field.Index.ANALYZED); >> doc1.add(new Field("column", "F_NAME", Field.Store.NO, >> Field.Index.ANALYZED); >> doc1.add(new Field("data", "Chris Hank Tony Cody Tom Tina Crystal", >> Field.Store.NO, Field.Index.ANALYZED, >> Field.TermVector.WITH_POSITIONS_OFFSETS); >> >> Document doc2 = new Document(); >> doc2.add(new Field("table", "EMPLOYEE_CA", Field.Store.NO, >> Field.Index.ANALYZED); >> doc2.add(new Field("column", "F_NAME", Field.Store.NO, >> Field.Index.ANALYZED); >> doc2.add(new Field("data", "Bob Billy Tom Toby Charles Krista Madonna", >> Field.Store.NO, Field.Index.ANALYZED, >> Field.TermVector.WITH_POSITIONS_OFFSETS); >> >> //I know I can create a query to search for a regular expression and >> that >> will return each >> //document that contains a match. >> >> IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), >> true, >> >> IndexWriter.MaxFieldLength.LIMITED); >> writer.addDocument(doc); >> writer.optimize(); >> writer.close(); >> searcher = new IndexSearcher(directory); >> >> RegexQuery query = new RegexQuery( newTerm("data", "^T.*)); >> ScoreDoc[] hits = searcher.search(query, null, >> maxNumOfHits).scoreDocs;//grab the score docs and go through them to find >> the documents that contain a match >> >> //***************************************************** >> >> >> The code above will tell me that both doc1 and doc2 contain a match for >> the >> constructed query. >> >> However I need to know how many times the regular expression was matched >> in >> each document. ie. >> >> doc1 = 3 >> doc2 = 2 >> >> I hope I am being clear...and thanks in advance. >> >> >> Cheers >> >> -- >> View this message in context: >> http://old.nabble.com/Finding-frequency-of-regex-query-match-in-a-field-tp27175303p27175303.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/Re%3A-Finding-frequency-of-regex-query-match-in-a-field-tp27177425p27178763.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org