Hello, I currently am trying to get the following results... let's say I have 3 XML files that I parse using SAX: <?xml version="1.0" encoding="UTF-8"?> <person> <name>bob bob bob </name> <name>3m </name> <height>3m </height> <height>bob </height> </person>
<?xml version="1.0" encoding="UTF-8"?> <person> <name>bob </name> <name>bob </name> <name>bob bob </name> <height>3m </height> <height>bob </height> </person> <?xml version="1.0" encoding="UTF-8"?> <person> <name>bob </name> <name>bob </name> <height>bob </height> </person> I am currently indexing these under separate fields for the duplicate <name> tag. so I have in total 3 /person/name fields: /person/name0, /person/name1, /person/name2. I am wanting to compute how many times, in a given unique field (/person/name) a query appears. Let's say the query is "bob" I want to see, for total times appearing: 9 I want to also see how many times it appeared in all documents): 6 My current solution is to call TermDocs for the first question and iterate through counting the docFreq() of the given field(/person/namex) (there are two loops then). This gets very slow, and ideally, I would like to index them all under /person/name, but I still really need these answers. Does anyone have any ideas? I can offer more clarification and some source code, but my current method is very slow (I need to index ~4million files and run compute these quantities--very slow when you have 150 fields of /person/actor/movie_acted_in and 4 million documents... Thank you very much! -- View this message in context: http://lucene.472066.n3.nabble.com/Computing-document-frequencies-for-specific-queries-in-Lucene-tp3101450p3101450.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org