Monday, June 20, 2005, 3:55:36 PM, Erik Hatcher wrote: > Filters reduce the search space to a subset of the documents in the > index. Which document would you want returned when there are > multiple documents in the index with the same MD5? Or do you want to > cluster them by MD5?
i think cluster by md5 is more appropriate. > Do you want to cluster them by MD5 perhaps, but still return multiple > documents back from a search? i want to return just the 1st image (the more relevant one). no use to show duplicates in an image search app. > I'm not sure if a Filter is the appropriate technique for this > scenario or not. well, i am not sure either. one solution would be when i iterate through the hits collection and send them to the webapp, to group them by md5 or some. is this a good way to do it ? (the bad thing is i would have to do lots of hits.doc(index) in advance, to make this group by md5 thing, and if the results are paginated << which is the case >>, on the 2nd page i would need to keep in session the last "index" or to recalculate it again.. - oh nein !:) in sql this would be: select distinct md5, url, alt from table group by md5 order by score asc; if i had the score in the DB (which is not the case). -- Catalin Constantin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]