Barry, You may also want to consider PostgreSQL for a few reasons: 1) it's historically known to work well for geo-spatial data, 2) has GIS/geo-spatial data types and such, and 3) it seems that the new versions let you embed Java directly into the database (perhaps something like Oracle's Java-embedding thing).
Otis --- Barry Carter <[EMAIL PROTECTED]> wrote: > Does Lucene optimize range queries that use Sort and/or limit the > number > of hits? > > My situation: I have a listing of 2 million cities, with the name, > latitude, longitude, and population of each city. I want to > efficiently > find the 50 most populous cities between (for example) latitudes 35.2 > and > 41.7 and longitudes 19.8 and 27.9 > > Assuming I normalize the data to be lexically sorted (in other words, > I'll > write 7.52 as 007.52, so it comes before 111.01 instead of after it), > can > I use a range query on the latitude and longitude fields (limiting > the > number of hits to 50, and sorting by population descending) to > efficiently > find what I want? > > If sorting isn't efficient, can I simply boost each record by its > population (so that high population cities are returned first) and > then > limit the number of hits (so I see only the 50 most populous cities > in a > given area)? > > I tried this in Derby, the code being: > > Statement s = > DriverManager.getConnection("jdbc:derby:test;create=false").createStatement(); > s.setMaxRows(50); > rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 > and lon>19.8 and lon<27.9 ORDER BY population desc"); > > but Derby inefficiently looks at ALL the cities matching my criteria > (even > with indexes on lat and lon and population) before returning the top > 50 > (this is really bad when the condition is "lat>-90 and lat<90 and > lon>-180 > and lon<-180", for example). > > The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and > lat<41.7 > and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with > the > same indexes is more efficient (it uses the LIMIT condition to > optimize > the query), and using MySQL w/ spatial indexes is even more > efficient. > However, I'm doing this as part of a Java application, so need > something > that can be embedded in Java. > > Is this a reasonable use of Lucene? Or is coercing Lucene into doing > range-based numeric queries a bad idea? > > (In case anyone's interested, I'm writing a zoomable/pannable world > map, > so finding the biggest cities in a given area quickly is important) > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]