I did a small demonstration application using lucene's range query and it worked fine. I didn't use a DB at all
"Mosul_Iraq.html", "E043.13535" "Mosul_Iraq.html", "N36.33608" Having the directional (E, W, N, S) worked out well Andrew -----Original Message----- From: Barry Carter <[EMAIL PROTECTED]> Sent: Jul 27, 2005 8:42 PM To: java-user@lucene.apache.org Subject: Lucene vs Derby (vs MySQL) for spatial indexing Does Lucene optimize range queries that use Sort and/or limit the number of hits? My situation: I have a listing of 2 million cities, with the name, latitude, longitude, and population of each city. I want to efficiently find the 50 most populous cities between (for example) latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9 Assuming I normalize the data to be lexically sorted (in other words, I'll write 7.52 as 007.52, so it comes before 111.01 instead of after it), can I use a range query on the latitude and longitude fields (limiting the number of hits to 50, and sorting by population descending) to efficiently find what I want? If sorting isn't efficient, can I simply boost each record by its population (so that high population cities are returned first) and then limit the number of hits (so I see only the 50 most populous cities in a given area)? I tried this in Derby, the code being: Statement s = DriverManager.getConnection("jdbc:derby:test;create=false").createStatement(); s.setMaxRows(50); rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and lon>19.8 and lon<27.9 ORDER BY population desc"); but Derby inefficiently looks at ALL the cities matching my criteria (even with indexes on lat and lon and population) before returning the top 50 (this is really bad when the condition is "lat>-90 and lat<90 and lon>-180 and lon<-180", for example). The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and lat<41.7 and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with the same indexes is more efficient (it uses the LIMIT condition to optimize the query), and using MySQL w/ spatial indexes is even more efficient. However, I'm doing this as part of a Java application, so need something that can be embedded in Java. Is this a reasonable use of Lucene? Or is coercing Lucene into doing range-based numeric queries a bad idea? (In case anyone's interested, I'm writing a zoomable/pannable world map, so finding the biggest cities in a given area quickly is important) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]