Re: problem in using distanceFilter in booleanFilter (using FilterClause)

2014-04-09 Thread david.w.smi...@gmail.com
You'll be lucky to get help with the spatial module from Lucene 3.x, which
is what you are using.  It was outright replaced in 4.0 because it was
buggy.

p.s. please don't blast emails to multiple lists

Good luck,
~ David


On Thu, Apr 10, 2014 at 1:58 AM, kumaran  wrote:

>
> Hi All,
>
> i am trying to add Termfilter and DistanceFilter in BooleanFilter using
> FilterClause. But i am getting the below mentioned error. Please check my
> code and guide me.
>
>
>
>
> *Code:*
>
>> DistanceQueryBuilder queryBuilder = new DistanceQueryBuilder(latLong[0],
>> latLong[1], radius, "lat", "lon", CartesianTierPlotter.DEFALT_FIELD_PREFIX,
>> true);
>> DistanceFieldComparatorSource distComp = new
>> DistanceFieldComparatorSource(queryBuilder.getDistanceFilter());
>> Sort distSort = new Sort(new SortField("", distComp,true));
>> QueryParser parser = new QueryParser(Version.LUCENE_30, "city",
>> new StandardAnalyzer(Version.LUCENE_30));
>> Query query = parser.parse(strQuery);
>> System.out.println(" distance sort details ::: "+ distSort);
>> BooleanFilter boolFilter = new BooleanFilter();
>> FilterClause filterClause2 = new
>> FilterClause(queryBuilder.getFilter(), BooleanClause.Occur.MUST);
>> boolFilter.add(filterClause2);
>>
>> Term term = new Term("city", "chengalpat");
>> TermsFilter filter = new TermsFilter();
>> filter.addTerm(term);
>> FilterClause filterClause = new FilterClause(filter,
>> BooleanClause.Occur.SHOULD);
>> boolFilter.add(filterClause);
>>
>> TopDocs topDocs = searcher.search(query,boolFilter, 20,distSort);
>
>
>
> *ErrorTrace:*
>
>  java.lang.NullPointerException at
>> org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:105)
>> at
>> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:89)
>> at
>> org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:258)
>> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218) at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199) at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177) at
>> org.apache.lucene.search.Searcher.search(Searcher.java:49) at
>> com.zoho.training.RadialSearch.search(RadialSearch.java:246) at
>> com.zoho.training.RadialSearch.main(RadialSearch.java:281) Exception in
>> thread "main" java.lang.NullPointerException at
>> org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:105)
>> at
>> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:89)
>> at
>> org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:258)
>> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218) at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199) at
>> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177) at
>> org.apache.lucene.search.Searcher.search(Searcher.java:49) at
>> com.zoho.training.RadialSearch.search(RadialSearch.java:246) at
>> com.zoho.training.RadialSearch.main(RadialSearch.java:281)
>
>
>
>
>
> Kumaran R
>
>
>
>


Re: problem in using distanceFilter in booleanFilter (using FilterClause)

2014-04-11 Thread david.w.smi...@gmail.com
I suggest either finding example code out there (try googling) or
https://code.ohloh.net   and/or looking at any tests which can often serve
as illustrative examples.  Failing those approaches; grab a coffee and
break out the debugger.  This is fairly generic advise, I admit, but it's
all I have to offer on the old spatial module.
~ David


On Fri, Apr 11, 2014 at 5:26 AM, kumaran  wrote:

> Hi David
>
>Thanks for your response. Actually i dont know where to post. Thats why.
> i will make sure sending to one list in future. could you please direct me
> on my question?
>
> - Kumaran
>
>
> > You'll be lucky to get help with the spatial module from Lucene 3.x,
> which
> > is what you are using.  It was outright replaced in 4.0 because it was
> > buggy.
> > p.s. please don't blast emails to multiple lists
> > Good luck,
> > ~ David
>
>
>
> On Thu, Apr 10, 2014 at 1:58 AM, kumaran  wrote:
>
> >
> > Hi All,
> >
> > i am trying to add Termfilter and DistanceFilter in BooleanFilter using
> > FilterClause. But i am getting the below mentioned error. Please check my
> > code and guide me.
> >
> >
> >
> >
> > *Code:*
> >
> >> DistanceQueryBuilder queryBuilder = new DistanceQueryBuilder(latLong[0],
> >> latLong[1], radius, "lat", "lon",
> CartesianTierPlotter.DEFALT_FIELD_PREFIX,
> >> true);
> >> DistanceFieldComparatorSource distComp = new
> >> DistanceFieldComparatorSource(queryBuilder.getDistanceFilter());
> >> Sort distSort = new Sort(new SortField("", distComp,true));
> >> QueryParser parser = new QueryParser(Version.LUCENE_30, "city",
> >> new StandardAnalyzer(Version.LUCENE_30));
> >> Query query = parser.parse(strQuery);
> >> System.out.println(" distance sort details ::: "+ distSort);
> >> BooleanFilter boolFilter = new BooleanFilter();
> >> FilterClause filterClause2 = new
> >> FilterClause(queryBuilder.getFilter(), BooleanClause.Occur.MUST);
> >> boolFilter.add(filterClause2);
> >>
> >> Term term = new Term("city", "chengalpat");
> >> TermsFilter filter = new TermsFilter();
> >> filter.addTerm(term);
> >> FilterClause filterClause = new FilterClause(filter,
> >> BooleanClause.Occur.SHOULD);
> >> boolFilter.add(filterClause);
> >>
> >> TopDocs topDocs = searcher.search(query,boolFilter,
> 20,distSort);
> >
> >
> >
> > *ErrorTrace:*
> >
> >  java.lang.NullPointerException at
> >>
> org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:105)
> >> at
> >>
> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:89)
> >> at
> >>
> org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:258)
> >> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218) at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199) at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177) at
> >> org.apache.lucene.search.Searcher.search(Searcher.java:49) at
> >> com.zoho.training.RadialSearch.search(RadialSearch.java:246) at
> >> com.zoho.training.RadialSearch.main(RadialSearch.java:281) Exception in
> >> thread "main" java.lang.NullPointerException at
> >>
> org.apache.lucene.spatial.tier.DistanceFieldComparatorSource$DistanceScoreDocLookupComparator.copy(DistanceFieldComparatorSource.java:105)
> >> at
> >>
> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:89)
> >> at
> >>
> org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:258)
> >> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:218) at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:199) at
> >> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:177) at
> >> org.apache.lucene.search.Searcher.search(Searcher.java:49) at
> >> com.zoho.training.RadialSearch.search(RadialSearch.java:246) at
> >> com.zoho.training.RadialSearch.main(RadialSearch.java:281)
> >
> >
> >
> >
> >
> > Kumaran R
> >
> >
> >
> >
>
>
>
> Kumaran R
> Inspire...Love...Achieve...
>
>
>
>
> On Thu, Apr 10, 2014 at 11:28 AM, kumaran  wrote:
>
> >
> > Hi All,
> >
> > i am trying to add Termfilter and DistanceFilter in BooleanFilter using
> > FilterClause. But i am getting the below mentioned error. Please check my
> > code and guide me.
> >
> >
> >
> >
> > *Code:*
> >
> >> DistanceQueryBuilder queryBuilder = new DistanceQueryBuilder(latLong[0],
> >> latLong[1], radius, "lat", "lon",
> CartesianTierPlotter.DEFALT_FIELD_PREFIX,
> >> true);
> >> DistanceFieldComparatorSource distComp = new
> >> DistanceFieldComparatorSource(queryBuilder.getDistanceFilter());
> >> Sort distSort = new Sort(new SortField("", distComp,true));
> >> QueryParser parser = new QueryParser(Version.LUCENE_30, "city",
> >> new StandardAnalyzer(Version.LUCENE_30));
> >> 

Re: Lucene Spatial Question: How to retrieve all results within a bounding box?

2014-06-08 Thread david.w.smi...@gmail.com
Hi.

Your question is actually not particularly spatial; it’s more
circumstantial to your particular query. You want to know how to do a query
and collect *all* the results, in no particular order.  To do this
efficiently, you need to use a Collector.  Also, I noticed you are using
the “IsWithin” predicate.  If all of your data consists of points, then
“Intersects” is semantically equivalent and faster.  Here’s some sample
code I temporarily threw into SpatialExample.java that works on Lucene
trunk.  You’ll see a difference of Document vs StoredDocument with 4x:

{
  SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
  ctx.makeRectangle(-90, -60, 30, 40));

  indexSearcher.search(strategy.makeQuery(args),
  new SimpleCollector() {
public AtomicReader reader;

@Override
public boolean acceptsDocsOutOfOrder() {
  return true;
}

@Override
protected void doSetNextReader(AtomicReaderContext context)
throws IOException {
  this.reader = context.reader();
}

@Override
public void collect(int docId) throws IOException {
  StoredDocument doc = reader.document(docId);
  System.out.println(doc.get("id") + "\t" +
doc.get("myGeoField"));
}
  });
}



~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 1:10 AM, parth_n  wrote:

> Hi everyone,
>
> I am trying to retrieve all results within a given bounding box in a 2-D
> space. I understand that the scoring function is based on the distance from
> the center of the query. I am not looking to retrieve top-k results, but
> all
> of them.
>
> I have read previous forums on this similar question, and the solutions are
> either out-dated (for previous versions) or inefficient (Option 1: input k
> as INTEGER.MAX_VALUE, Option 2: use a TotalHitCountCollector and get the
> total number of results using getTotalHits and then pass on this number to
> the top-k search).
>
> I am looking for all the results in the bounding box, and do not care for
> the order. I do not want to waste any computation, if possible, on any
> sorting needed for top-k functionality.
>
> Question: Is there any better solution out there that I can use instead of
> the above mentioned solutions?
>
> Any reply is much appreciated. Thanks!
>
>
> Snippet of the code of the above mentioned Option 1:
>
> SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
> ctx.makeRectangle(minX, maxX, minY, maxY));
>
> Filter filter = strategy.makeFilter(args);
> TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), filter,
> Integer.MAX_VALUE);
>
> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
> for (ScoreDoc s : scoreDocs)
> {
>Document doc = searcher.doc(s.doc);
>System.out.println(doc.get("id") + "\t" + doc.get("name"));
> }
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-retrieve-all-results-within-a-bounding-box-tp4140616.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Spatial Question: How to retrieve all results within a bounding box?

2014-06-08 Thread david.w.smi...@gmail.com
Yes; as I said in my last sentence: "You’ll see a difference of Document vs
StoredDocument with 4x”.

As to SimpleCollector not being in 4x (I didn’t check but I’ll take your
word for it) — the bottom line is that you need to write a Collector, and a
simple one at that.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Jun 8, 2014 at 5:00 PM, parth_n  wrote:

> Thanks a lot for the reply David!
>
> I am having some problems executing this code. I am using 4.8.1. I tried
> looking for StoredDocument and SimpleCollector in the source code but
> couldn't find them. Am I missing something?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-efficiently-retrieve-all-results-within-a-bounding-box-tp4140616p4140673.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Spatial Question: Is a tree structure explicitly created in the QuadPrefixTree implementation?

2014-10-01 Thread david.w.smi...@gmail.com
Hi Parth,

Lucene’s “terms dictionary” (an inverted index) is the physical
instantiation of the actual PrefixTree/Trie for numeric and spatial data.
It doesn’t know it is — it’s just a sorted list of keys pointing to
matching documents — it just so happens that the keys aren’t textual words
in this case, they are encoded prefixes.  SpatialPrefixTree (base class of
QuadPrefixTree and some others) encodes points and other spatial regions
into one or more prefixes that get indexed during indexing, and which that
are looked up during search.  The “Cell” class (aka a node) are indeed used
during spatial processing in an iterator-like way.  The index & search time
processing iterates the prefixes.  It’s not typically fully materialized
into a tree structure of Cells.

Thanks for the MX-Quad tree pointer; the “MX” notion is new to me.  It
appears that any prefix-tree/trie is also in effect an MX-tree as well,
making Prefix-Tree/Tries a particular case of MX trees.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Oct 1, 2014 at 2:17 AM, parth_n  wrote:

> Hi everyone,
>
> I have a question regarding the quadtree implementation of the spatial
> module of Lucene. Does the quadtree implementation (QuadPrefixTree)
> explicitly build a tree structure and store this information? I have gone
> over the QuadPrefixTree class, but from what I understand it mainly uses
> the
> spatial prefix partitioning strategy of the MX-quadtree (but not the
> internal nodes).
> Are the internal nodes (e.g. for a 4-level tree prefix ABDA, are the
> spatial
> regions of ABD and/or AB) of the quadtree used during query processing?
>
> Any replies are much appreciated.
>
> Thanks!
>
> Parth
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-Is-a-tree-structure-explicitly-created-in-the-QuadPrefixTree-implementation-tp4162038.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Spatial Question: Is there a primary and a secondary filter?

2014-10-02 Thread david.w.smi...@gmail.com
Hi Parth,

Since Lucene 4.7 spatial, there is a “SerializedDVStrategy” for serialized
geometries.  It’s used as a second-pass after RPT (or perhaps
BBoxStrategy).  There was a presentation at FOSS4G about it (I was there
and helped with this one too):
http://vimeo.com/106843184
There’s a small code sample in there.  It’s pretty easy to use.

This article I wrote is also relevant:
http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/
I’m hoping to have time after the new year to get to the optimization
referenced at the end, but who knows.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Oct 2, 2014 at 12:47 PM, parth_n  wrote:

> Hi everyone,
>
> I have a Lucene Spatial code where I query (bounding box) the given data.
> Does Lucene have a primary and a secondary filter (like MS SQL or PostGIS)
> (where the primary filter returns the regions in the index to be looked at,
> and the secondary filter removes the false positives in these regions)?
>
> I am trying to query Lucene such that only the primary filter results are
> returned. With this (since false positives are sometimes okay for faster
> query processing), I can avoid further I/O of accessing the data.
>
> Currently, I have the following code (which was written with the help of
> David Smiley -
>
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-How-to-efficiently-retrieve-all-results-within-a-bounding-box-td4140616.html
> ):
>
> SpatialArgs args = new SpatialArgs(SpatialOperation.Intersects,
> ctx.makeRectangle(minX, maxX, minY, maxY));
> Collector collector = new SimpleCollector();
> searcher.search(strategy.makeQuery(args), collector);
>
> Any reply is much appreciated!
>
> Thanks,
> Parth
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Lucene-Spatial-Question-Is-there-a-primary-and-a-secondary-filter-tp4162357.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene spatial for grid clusters

2014-11-06 Thread david.w.smi...@gmail.com
FYI I plan to implement this in Lucene-spatial & Solr in January.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Nov 5, 2014 at 10:52 PM, Shahak Nagiel 
wrote:

> I need a way to perform a spatial aggregation query against a potentially
> large document store in order to display summary clusters on a map.  The
> query would slice the current map extents (e.g. -180,-90,180,90) into a
> number of X and Y bins (e.g. 20 x 16) and, for each, seek a summary count,
> so that a heat map or series of clusters could be rendered for the grids.
>
> Given Lucene's native usage of prefix trees/geohashes, this seems to be a
> good fit.  As a user pans and zooms the map, new map extents would apply,
> so this would need to support dynamic grids.  However, snapping
> results/bins to existing geohashes (of the appropriate depth/level) would
> be fine, assuming that lines up with how the indexes are structured.
>
> Naively, I could just issue a series of spatial queries, one for each
> grid, and get the count.  But I wonder if there's a better way...
>
> Has anyone encountered this use case?  Any suggestions on the best/most
> efficient way to achieve?
>
> Thanks!


Re: Lucene spatial for grid clusters

2014-11-06 Thread david.w.smi...@gmail.com
>From an API standpoint, I envision you would supply a rectangular region of
interest and some means of specifying the resolution.  It could be the
so-called “grid level” in lucene spatial (1 is biggest most coarsest,
larger numbers yield progressively smaller cells), or it might be expressed
in terms of the minimum number of cells you want.  The response would
include the *actual* rectangular region of the grid that minimally encloses
the region you asked for.  And it would include the width and height of the
cells at this resolution, and the grid level (a #).  Of course it would
include the 2D grid of numbers.  On the Solr side, I’m thinking of
optionally returning a PNG, but I’m not sure if that will turn out to be a
good idea or not.

The implementation will re-use a lot of code already in Lucene-spatial.  In
particular, it will *not* be necessary to write any low-level TermsEnum
iteration code because it can
re-use AbstractVisitingPrefixTreeFilter.VisitorTemplate.  I am in fact
dog-fooding that now because I’m currently working on date-rage faceting of
a DateRangePrefixTree (it’s in trunk).  This is a single-dimensional
heatmap capability.  I’ll tell attendees of my talk about this at
Lucene/Solr Revolution next Friday in DC.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Nov 6, 2014 at 8:44 AM, Shahak Nagiel  wrote:

> Thanks, David.  In the meantime, care to share any thoughts about your
> planned implementation?
>
>
>   On Thursday, November 6, 2014 8:11 AM, "david.w.smi...@gmail.com" <
> david.w.smi...@gmail.com> wrote:
>
>
> FYI I plan to implement this in Lucene-spatial & Solr in January.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
> On Wed, Nov 5, 2014 at 10:52 PM, Shahak Nagiel 
> wrote:
>
> I need a way to perform a spatial aggregation query against a potentially
> large document store in order to display summary clusters on a map.  The
> query would slice the current map extents (e.g. -180,-90,180,90) into a
> number of X and Y bins (e.g. 20 x 16) and, for each, seek a summary count,
> so that a heat map or series of clusters could be rendered for the grids.
>
> Given Lucene's native usage of prefix trees/geohashes, this seems to be a
> good fit.  As a user pans and zooms the map, new map extents would apply,
> so this would need to support dynamic grids.  However, snapping
> results/bins to existing geohashes (of the appropriate depth/level) would
> be fine, assuming that lines up with how the indexes are structured.
>
> Naively, I could just issue a series of spatial queries, one for each
> grid, and get the count.  But I wonder if there's a better way...
>
> Has anyone encountered this use case?  Any suggestions on the best/most
> efficient way to achieve?
>
> Thanks!
>
>
>
>
>


Re: A question on implementing new operators

2014-12-02 Thread david.w.smi...@gmail.com
Hi Prasad,

Firstly, the Lucene ‘general’ list is not the appropriate list; it’s the
java-user lucene list so I’m replying there instead.

This is mostly about query parsing.  If you look at Lucene’s modules,
you’ll see a “queryparser” module.  In there, there’s a “flexible” package
which is named as-such because it’s a flexible query parsing framework.  It
comes with a pre-built instantiation that emulates Lucene’s
standard/classic query parser syntax.  You’ll need to modify the javacc
syntax definition a ‘.jj’ file and plug in a variety of pieces into the
framework.  Ultimately, when you get to the ‘builder’ part, you'll create a
suitable Lucene Query by calling spatialStrategy.makeQuery(…).

I can’t offer much more help on this so I hope this is enough to get you
going.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, Dec 2, 2014 at 4:53 PM, rama44ster  wrote:

> Hi all,
> I have started trying out lucene for one of my projects and I have a table
> where the document has a latitude, longitude associated with it. I would
> like to retrieve only those documents whose latitude and longitude fall
> inside a bounding box that has two co-ordinates, (x_1, y_1) and (x_2, y_2).
>
> Is there any way, I can implement this in lucene? Is it possible to extend
> the parser to understand a new syntax where I can express this in the text
> query and have the parser build a query that will contain all the logic
> necessary to do this check. Something like IN_BOX(latitude, longitude,
> 23.00, 26.00, 50.00, 60.00). Here latitude and longitude are two stored
> fields and 23,26 and 50,60 are the two co-ordinates for bounding box.
>
> I also checked the spatial package of lucene, but couldn’t find any
> pointers.
>
> Thanks,
> Prasad.
>


Re: Lucene Spatial Implementation for Points within Polygon.

2014-12-22 Thread david.w.smi...@gmail.com
Hello.

You have stated the use-case so generically that it’s not clear if you
should index the polygon set and query by the point set, or the reverse.
Generally, you should index the set that is known in-advance and then query
by the other, the set that is generally not known.  Assuming this is the
case, index the stable set with RecursivePrefixTreeStrategy, *and*, for
accuracy, if that set is also the polygon set, use SerializedDVStrategy
*or* simply keep them all in-memory keyed by an identifier (call
JtsGeometry.index() on each as well) that you check against at runtime.  If
you don’t have enough RAM then you’ll do the former.  If neither set seems
to be “stable”, you could really index either, definitely choose to index
the points.  The predicate you should use is INTERSECTS; the others are
intended for polygon against polygons (basically any non-point shape
against another non-point shape).

If your scenario is quite simply, you have a bunch of points and polygons
you get all at once to make this computation and then that’s it (no
long-term need to query again by the same polygons or points in the
future), I suggest using JTS directly in-memory, and its PreparedGeometry
to optimize each polygons, then iterate through your points to see which
polygons they are in.  You might even use JTS's STRtree to index polygon
bounding boxes to avoid looping over all polygons.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Dec 22, 2014 at 12:30 AM,  wrote:
>
> Hello Team,
>
> We are starting off with Lucene Spatial implementation for some of the use
> cases:
>
> A . Given "N" polygons and "M" points, find how many points lie inside
> each of the polygon.
>
> 1st Approach :
>
> For A, we indexed Polygons using WKT and using JtsSpatial strategy. I set
> the Level at 22 . This has resulted in huge number of terms. This was
> needed as I need the search to be near perfect.
>
> For Indexing, I used Point(Supplied as WKT) using Jts again with Level at
> 22 (Although I think specifying level at query time does not make much
> difference).
>
> For this, we used ""CONTAINS" .  Output is coming but I am not sure if I
> am doing it the right way. Need suggestion.
>
> I am having following confusion:
>
> a.   Will CONTAINS and IS WITHIN both work in the same way for the
> given scenario. I am ruling OUT INTERSECTS as that scenario is not
> appropriate.
>
> b.  Second, are we missing something  in getting the correct output.
>
>
> 2nd Approach : (Reversed)
>
> Indexed POINTS in WKT format.
> Passed Polygons in WKT using JTs as query and fired as INTERSECTS and
> WITHIN.
>
> In second approach, we are getting more output than the 1st approach.
>
> However, we are still not sure which is the best way to tackle this
> problem. Please suggest.
>
> "Confidentiality Warning: This message and any attachments are intended
> only for the use of the intended recipient(s).
> are confidential and may be privileged. If you are not the intended
> recipient. you are hereby notified that any
> review. re-transmission. conversion to hard copy. copying. circulation or
> other use of this message and any attachments is
> strictly prohibited. If you are not the intended recipient. please notify
> the sender immediately by return email.
> and delete this message and any attachments from your system.
>
> Virus Warning: Although the company has taken reasonable precautions to
> ensure no viruses are present in this email.
> The company cannot accept responsibility for any loss or damage arising
> from the use of this email or attachment."
>


Re: Distance between 2 points Lucene Spatial

2014-12-22 Thread david.w.smi...@gmail.com
Hi Ankit,

Vincenty is the most accurate one — it is the benchmark for the other 2’s
tests for the true answer.  In theory it produces the same answers as the
other 2 simpler formulas you mention but is “numerically robust” for
computers.  Note that the world model used by Spatial4j when in “geo” mode
is a spherical model.  For more accurate distance computation on Earth, use
an ellipsoidal model.  If you google “Vincenty”, it's easy to find
Vincenty’s ellipsoidal formula with the constants for Earth; that is most
often what he is associated with.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Dec 22, 2014 at 12:35 AM,  wrote:
>
> Dear All,
>
> We are using lucene spatial strategy to find out the distance between a
> pair of Lat/Long.
>
> Given a pair of Lat/Long I need to find the near accurate distance between
> these 2 points.
>
> I have used Haversine, LawOfCosines and Vincernity however unable to
> decide which will provide the best output(accurate output).
>
> There is not just 1 point but millions of points which will need to be
> passed into against  a set of point to find the closest point.
>
> Which might be the best approach. Additionally, I observed from the API,
> that the output of these 3 algorithms are in Degress. Is there any API in
> lucene which can return the output in double,long,int etc. formats.
>
>
> "Confidentiality Warning: This message and any attachments are intended
> only for the use of the intended recipient(s).
> are confidential and may be privileged. If you are not the intended
> recipient. you are hereby notified that any
> review. re-transmission. conversion to hard copy. copying. circulation or
> other use of this message and any attachments is
> strictly prohibited. If you are not the intended recipient. please notify
> the sender immediately by return email.
> and delete this message and any attachments from your system.
>
> Virus Warning: Although the company has taken reasonable precautions to
> ensure no viruses are present in this email.
> The company cannot accept responsibility for any loss or damage arising
> from the use of this email or attachment."
>


Re: Distance between 2 points Lucene Spatial

2014-12-22 Thread david.w.smi...@gmail.com
I forgot this part of your question.

To go from degrees to KM, multiply by DistanceUtils.DEG_TO_KM.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Dec 22, 2014 at 9:35 AM,  wrote:
>
> Thanks for the suggestion.
>
> I am using Lucene Vincenty to find the distance but the output is strange.
> I cannot figure out how to convert the output to metres/kilo metres.
>
> After extensive search on google, I found GeoDesy source code which gives
> me distance in metres. This is also the implementation of Vincenty.
>
> However, I do not intend to use GeoDesy.
>
>  I would prefer to use inbuilt Vincenty of Lucene to get the distance in
> metres but I am unable to find this.
>
> Please suggest.
>
>
> -Original Message-
> From: david.w.smi...@gmail.com [mailto:david.w.smi...@gmail.com]
> Sent: 22 December 2014 19:33
> To: java-user@lucene.apache.org
> Subject: Re: Distance between 2 points Lucene Spatial
>
> Hi Ankit,
>
> Vincenty is the most accurate one — it is the benchmark for the other 2’s
> tests for the true answer.  In theory it produces the same answers as the
> other 2 simpler formulas you mention but is “numerically robust” for
> computers.  Note that the world model used by Spatial4j when in “geo” mode
> is a spherical model.  For more accurate distance computation on Earth, use
> an ellipsoidal model.  If you google “Vincenty”, it's easy to find
> Vincenty’s ellipsoidal formula with the constants for Earth; that is most
> often what he is associated with.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
> On Mon, Dec 22, 2014 at 12:35 AM,  wrote:
> >
> > Dear All,
> >
> > We are using lucene spatial strategy to find out the distance between
> > a pair of Lat/Long.
> >
> > Given a pair of Lat/Long I need to find the near accurate distance
> > between these 2 points.
> >
> > I have used Haversine, LawOfCosines and Vincernity however unable to
> > decide which will provide the best output(accurate output).
> >
> > There is not just 1 point but millions of points which will need to be
> > passed into against  a set of point to find the closest point.
> >
> > Which might be the best approach. Additionally, I observed from the
> > API, that the output of these 3 algorithms are in Degress. Is there
> > any API in lucene which can return the output in double,long,int etc.
> formats.
> >
> >
> > "Confidentiality Warning: This message and any attachments are
> > intended only for the use of the intended recipient(s).
> > are confidential and may be privileged. If you are not the intended
> > recipient. you are hereby notified that any review. re-transmission.
> > conversion to hard copy. copying. circulation or other use of this
> > message and any attachments is strictly prohibited. If you are not the
> > intended recipient. please notify the sender immediately by return
> > email.
> > and delete this message and any attachments from your system.
> >
> > Virus Warning: Although the company has taken reasonable precautions
> > to ensure no viruses are present in this email.
> > The company cannot accept responsibility for any loss or damage
> > arising from the use of this email or attachment."
> >
> "Confidentiality Warning: This message and any attachments are intended
> only for the use of the intended recipient(s).
> are confidential and may be privileged. If you are not the intended
> recipient. you are hereby notified that any
> review. re-transmission. conversion to hard copy. copying. circulation or
> other use of this message and any attachments is
> strictly prohibited. If you are not the intended recipient. please notify
> the sender immediately by return email.
> and delete this message and any attachments from your system.
>
> Virus Warning: Although the company has taken reasonable precautions to
> ensure no viruses are present in this email.
> The company cannot accept responsibility for any loss or damage arising
> from the use of this email or attachment."
>


Re: Lucene Spatial Implementation for Points within Polygon.

2014-12-24 Thread david.w.smi...@gmail.com
One problem is the classic x/y, lat/lon mix-up.  WKT is “x y" order, and so
are Spatial4j methods for that matter.   If you consistently made this
mistake then it might yield correct results provided the point data is
within -90 and +90 longitude.  Maybe this will do it.  Otherwise your code
appears that it should work.

If you want to construct a point then don’t create WKT, simply call
ctx.makePoint(x,y).  There isn’t a makePolygon… but you can use Spatial4j’s
JTSGeometry’s constructor which takes a JTS's “Geometry” which in turn can
be constructed from a JTS GeometryFactory.  That will avoid needless String
WKT encoding and then parsing.

For the accuracy that you clearly want, call SpatialArgs.setDistErr(0.0).

What Lucene version are you using?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Dec 24, 2014 at 5:46 AM,  wrote:
>
> Thanks for the suggestions David..
> However I am in a fix.. Although I am indexing and searching both using
> JTS, I am still getting very less hits. I am very sure that points which
> are indexed, falls inside lot of polygons but hits are not giving me the
> proper result.
>
> For approx. 8 lac polygons, I am getting 4.5 lacs polygons having points.
> For remaining 3.5 lacs I am not getting any HITS. Providing a small snippet
> of the code. Please suggest.
>
> I am indexing points as WKT Shape using the following Code.
>
> JtsSpatialContext spatialContext=JtsSpatialContext.GEO;
> SpatialPrefixTree grid=new GeohashPrefixTree(spatialContext,22);
> spatialStrategy=new RecursivePrefixTreeStrategy(grid,"position");
>
> Shape point = spatialContext.readShape("POINT("+lat+" "+lon+")");
> doc.add(new StoredField("FieldName",value));
> for(IndexableField f: spatialStrategy.createIndexableFields(point))
> {
> doc.add(f);
> }
>
> doc.add(new
> StoredField(spatialStrategy.getFieldName(),lat+";"+lon+";"value));
>
> indexWriter.addDocument(doc);
>
>
> For Searching, since I have polygons, I am using the following code:
>
> JtsSpatialContext spatialContext=JtsSpatialContext.GEO;
> SpatialPrefixTree grid=new GeohashPrefixTree(spatialContext,22);
> spatialStrategy=new RecursivePrefixTreeStrategy(grid,"position");
>
>
> StringBuffer to create polygons like this.
>
> POLYGON((Lat Long,Lat Long pairs))
>
> SpatialArgs args=new
> SpatialArgs(SpatialOperation.Intersects,spatialContext.readShape(StringBuffer.toString());
> ConstantScoreQuery csq=new
> ConstantScoreQuery(spatialStrategy.makeQuery(args));
>
>
> TopDocs docs=indexSearcher.search(csq,10);
>
> If(docs.totalHits>0)
> {
> Process Data
> }
> Else
> {
> PRINT NO DATA FOUND.
> }
>
> Problem is for most of the polygons (approx. 50%) , I am getting NO DATA
> FOUND indicating no HITS. Now, I am pretty sure that there are Lat/Long
> pair's indexed which fall within the supplied polygon but I am unable to
> get all the Hits.
>
> Please help me in identifying where am I going wrong. For every incorrect
> polygon which is present(boundaries intersecting,incomplete), I am printing
> exception which is again I am excluding.. This is not the worry..
>
> Worry is I am getting very polygons which actually have points inside them.
>
> Please correct me where I am going wrong.
>
>
> -Original Message-
> From: david.w.smi...@gmail.com [mailto:david.w.smi...@gmail.com]
> Sent: 22 December 2014 19:19
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Spatial Implementation for Points within Polygon.
>
> Hello.
>
> You have stated the use-case so generically that it’s not clear if you
> should index the polygon set and query by the point set, or the reverse.
> Generally, you should index the set that is known in-advance and then
> query by the other, the set that is generally not known.  Assuming this is
> the case, index the stable set with RecursivePrefixTreeStrategy, *and*, for
> accuracy, if that set is also the polygon set, use SerializedDVStrategy
> *or* simply keep them all in-memory keyed by an identifier (call
> JtsGeometry.index() on each as well) that you check against at runtime.
> If you don’t have enough RAM then you’ll do the former.  If neither set
> seems to be “stable”, you could really index either, definitely choose to
> index the points.  The predicate you should use is INTERSECTS; the others
> are intended for polygon against polygons (basically any non-point shape
> against another non-point shape).
>
> If your scenario is quite simply, you have a bunch of points and polygons
> you get all at once to make this computation and then that’s it (no
> long-term need

Re: Searching for DateRangeField in Lucene 5.0.0

2015-02-25 Thread david.w.smi...@gmail.com
Yeah, Uwe has it basically right.  I was on vacation when the release notes
were developed and missed the opportunity to review them before they were
published.  This bullet references “DateRangeField” but that’s the Solr
side of this feature.  The Lucene side is the combination
of NumberRangePrefixTreeStrategy constructed with DateRangePrefixTree.  See
the docs for the spatial module generally, but note that this
SpatialStrategy & SpatialPrefixTree are unique in that it’s 1 dimensional,
and so you have to do things a little differently — namely you need to
construct Shape instances (equivalent to Calendar or Calendar ranges) via
utility methods on DateRangePrefixTree instead of using a Spatial4j
SpatialContext.

I’ll tweak the wiki release notes.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Feb 25, 2015 at 4:55 AM, Torsten Krah  wrote:

> Hi,
>
> reading the release notes from here:
>
> https://wiki.apache.org/lucene-java/ReleaseNote50
>
> its written that Lucene got a new DateRangeField:
>
> * New DateRangeField type enables Indexing and searching of date ranges,
> particularly multi-valued ones.
>
> However - in which package is this field? Searched in the libraries and
> the source code and found nothing.
> Are the release notes wrong or was the field renamed?
>
> kind regards
>
> Torsten
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Can't get distance sorting to work in Lucene Spatial 4.10.3

2015-02-25 Thread david.w.smi...@gmail.com
Hi Rainer,

I see two issues.  One is that you call makePoint with latitude (Y) then
longitude (X).  Spatial4j is X then Y order.  The second issue is more
stylistic (but in this case it may explain your symptom due to the X & Y
mixup) is that, since you already have a ‘point’, when you call makeCircle
you ought to provide it (it’s overloaded) instead of re-referencing X & Y.
I suggest renaming “point” to “queryPoint” to help clarify what it’s for.

Without me spending more time trying your code, I’m not sure if what I
found fixes what you see or if there is another bug.  If there is still a
bug, perhaps start with the SpatialExample.java in a working state and
iteratively modify it to work how you want it to, testing each time that
the sort works.

~ David

On Wed, Feb 25, 2015 at 7:18 AM, Simon Rainer 
wrote:

> Hi!
>
> I have problems getting distance sorting to work in Lucene Spatial. (I'm
> using v4.10.3.) I'm following the SpatialExample.java from the Lucene docs.
> My code is below (it's Scala, but translates 1:1 into Java). When I run the
> query, results don't seem to be affected by the sorting at all. Changing
> the sort order from ascending to descending has no effect either.
>
> I can't see any difference between what I'm doing and the offical example
> (except the use of the SearcherManager, but I've checked and that doesn't
> make a difference). Any hints appreciated!
>
> Cheers,
> Rainer
>
>
> --- Code
>
> Here's my code (Scala - but translates 1:1 into Java):
>
> val searcher = placeSearcherManager.acquire()
> val point = spatialCtx.makePoint(lat, lon)
> val args =
>   new SpatialArgs(SpatialOperation.Intersects,
> spatialCtx.makeCircle(lon, lat,
> DistanceUtils.dist2Degrees(100,
> DistanceUtils.EARTH_MEAN_RADIUS_KM)))
>
> val filter = spatialStrategy.makeFilter(args)
>
> // Here's what's supposed to set up distance sorting
> val valueSource = spatialStrategy.makeDistanceValueSource(point)
> val distanceSort = new
> Sort(valueSource.getSortField(false)).rewrite(searcher)
>
> try {
>   val topDocs = searcher.search(new MatchAllDocsQuery(), filter,
> limit, distanceSort)
>   val scoreDocs = topDocs.scoreDocs
>
>   // Print the results
>   scoreDocs.foreach(scoreDoc => {
> val doc = searcher.doc(scoreDoc.doc)
> val docPoint =
> spatialCtx.readShape(doc.get(spatialStrategy.getFieldName())).asInstanceOf[Point]
> val distance =
> spatialCtx.getDistCalc().distance(args.getShape.getCenter, docPoint)
> val distanceKM = DistanceUtils.degrees2Dist(distance,
> DistanceUtils.EARTH_EQUATORIAL_RADIUS_KM)
> Logger.info("distance: " + distanceKM)
>   })
> } finally {
>   placeSearcherManager.release(searcher)
> }
>
> --- Log output
>
> [info] application - distance: 406.01578203364323
> [info] application - distance: 327.67269076509876
> [info] application - distance: 218.94951150657565
> [info] application - distance: 251.37927074183852
> [info] application - distance: 140.6570939383426
> [info] application - distance: 460.47502999630586
> [info] application - distance: 462.37676932762116
> [info] application - distance: 489.49001138999256
> [info] application - distance: 392.0773262500455
> [info] application - distance: 227.8864179949065
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Time range facets on documents associated with a time interval

2015-03-25 Thread david.w.smi...@gmail.com
Hi Rainer,

If Solr is an option, then as of 5.0 you can use “DateRangeField” and use
Solr’s standard faceting on that.

If this is at the Lucene level, you can do the same approach as Solr — use
NumberRangePrefixTreeStrategy configured with a DateRangePrefixTree.  Then
for each interval, generate a query to calculate the hits.

FYI There is a more efficient approach in-progress here:
https://issues.apache.org/jira/browse/LUCENE-5735

~ David

On Wed, Mar 25, 2015 at 5:17 AM, Simon Rainer 
wrote:

> Hi,
>
> I'm trying to implement dynamic range facets in Lucene, along the same
> lines as in the org.apache.lucene.demo.facet.RangeFacetsExample. However,
> in my case I'm dealing with documents that don't have a single timestamp,
> but an interval defined by a start- and end-timestamp.
>
> What I'm trying to end up with, I guess, is exactly this
>
>
> http://blog.mikemccandless.com/2013/12/fast-range-faceting-using-segment-trees.html
>
> But I couldn't figure out whether this functionality has really gone into
> Lucene. In discussion, someone also pointed out this to me:
>
> http://wiki.apache.org/solr/SpatialForTimeDurations
>
> i.e. using (start/end) as point coordinates and then use spatial indexing.
> Is this still the recommended approach? Any pointers on how to best
> approach this would be highly appreciated.
>
> Thanks,
> Rainer
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Spatial Search with Nested Polygons

2015-03-26 Thread david.w.smi...@gmail.com
Hi Mike,

The second, (non-easy) part seems like it could be pretty slow:

Additionally, I'd like to have access to the
> numerical value of the smallest polygon which contains the point
> (something like makeDistanceValueSource).


To determine “the smallest polygon which contains the point” for the
current matching document, you’d have to iterate over them in
smallest-to-largest-1 order and check containment, so that you know which
corresponding value to return.  There will be a performance hit for sure.
This sounds like a custom ValueSource/FunctionValues that does that logic…
perhaps by grabbing the shapes from SerializedDVStrategy’s shape providing
ValueSource.  If you provide the shapes using a Spatial4j ShapeCollection
with the order from biggest to smallest, you can know the index of which
shape matches, and then pull the i-th numeric value you need from a list of
numbers in BinaryDocValues.  The largest shape could be kept out of here
since you don’t need it.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Thu, Mar 26, 2015 at 11:47 PM, Mike Hansen  wrote:

> I was wondering about the feasibility / difficultly of implementing a
> solution to the following problem with Lucene.
>
> For each document, I have a series of nested polygons each associated
> with a numerical value.  My search query gives a point, and I want to
> return all of the documents whose largest polygon contains the point
> (that part is easy).  Additionally, I'd like to have access to the
> numerical value of the smallest polygon which contains the point
> (something like makeDistanceValueSource).
>
> Thanks,
> --Mike
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Spatial Search with Nested Polygons

2015-03-27 Thread david.w.smi...@gmail.com
On Fri, Mar 27, 2015 at 12:27 AM, Mike Hansen  wrote:

> There are a few things which could probably help with performance.
> Each document has only around say 30 polygons. You could do a binary
> search which would help reduce the cost. Additionally, I have a
> distinguished point contained inside of all the nested polygons so I
> can pre-compute the minimum and maximum distances from that point to
> the edge of the polygon and use that to also reduce the number of
> containment checks to do.  I expect that there will be on the order to
> 500-1000 documents considered for each search.
>

Oh right, they are *nested*; I overlooked that.  I like your binary-search
plan — makes sense.

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley


Re: question about spatial module in lucene 5

2015-03-30 Thread david.w.smi...@gmail.com
I Anton.

I think you’re right.  PointVectorStrategy has been overlooked.  The
work-around is pretty simple though.  In addition to calling
createIndexableFields, also create two DoubleDocValuesField instances, one
for each dimension that uses the identical names the strategy generates.
Lucene will merge the doc-values from these fields by the same name with
the inverted index purpose of the fields returned by createIndexableFields,
even though they have the same name.

I filed an issue: https://issues.apache.org/jira/browse/LUCENE-6376  I
think the solution should use the same API approach as BBoxStrategy, which
allows you to customize the indexing options via a setFieldType method — to
turn on/off DocValues or the inverted index (IndexOptions).

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Mon, Mar 30, 2015 at 7:49 AM, Anton Lyska 
wrote:

> Hi
>
> I have tried to upgrade lucene from 4.x to 5.0 recently.I found out from
> documentation that FieldCache is removed from lucene 5, and I should use
> DocValues fields for sorting.
> I upgraded my sources, and everything works fine except spatial sorting by
> distance.
> When I looked into PointVectorStrategy I saw that createIndexableFields()
> creates DoubleFields, but class that used during sorting
> (DistanceValueSource) works with NumericDocValues. At runtime I got the
> exception, and advice to use UninveringReader.
>
> I'm a little confused with that.
> So, in my opinion, there is spatial sorting, which is broken by default. Am
> I right or missing something?
>
> Option to use UninveringReader doesnt works for me because according to
> https://issues.apache.org/jira/browse/LUCENE-6370 there is no easy way to
> use UninveringReader with NRT index, which I use.
>


Re: Lucene Spatial: sorg by best fit

2015-04-01 Thread david.w.smi...@gmail.com
Hi Rainer,

The BBoxStrategy is pretty close to this.  It does assume indexed
rectangles and not other shapes, and it’s limited to one rect value per
field, but perhaps this is fine for you nonetheless?  See
the makeOverlapRatioValueSource() method.  If this feature was non-obvious,
I think I may need to make this more prominent from the BBoxStrategy class
level javadocs.  Did you at least find this strategy?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Apr 1, 2015 at 1:49 PM, Simon Rainer  wrote:

> Hi,
>
> I'm trying to implement sorting by 'best fit' in Lucene spatial. I.e. I
> want to query my index for documents that intersect a query rectangle, and
> get my results sorted by the amount overlap between the query rectangle and
> the document shape. I was wondering whether this is a use case that has
> been solved before, but couldn't find anything obvious when googling for it.
>
> I guess I can solve this with a custom FieldComparator? Any hints,
> pointers to prior work, recommended practices, etc. greatly appreciated!
>
> Thanks,
> Rainer
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Spatial: sort by best fit

2015-04-01 Thread david.w.smi...@gmail.com
On Wed, Apr 1, 2015 at 3:21 PM, Simon Rainer  wrote:

> Hi David,
>
> ouch - no, missed that. I'm indexing points and polygons with the
> RecursivePrefixTreeStrategy right now, so simply didn't look properly at
> the BBoxStrategy. (I need to use exact polygons, so that I can make use of
> the ultra-cool 2D facet heatmap feature :-)
>

:-)


> So the best way, I guess, would be to index an additional field, using
> BBoxStrategy, that holds just the bounding box?
>

Absolutely.  A key point of the fact that there are multiple
SpatialStrategy classes is that there are multiple ways to index “spatial
data” with different use-cases.  So by all means, use as many of them that
you need.For example, perhaps you might also use PointVectorStrategy
for distance-sorting to a center-point.  (BBoxStrategy can do that too…
albeit with a bit more work since it must calculate the center-point from
twice as many values 2 -> 4).

p.s. I’ve seen your code that uses Lucene spatial on GitHub (our friend
Bruce pointed me at it) and commented on a commit but I haven’t seen a
response from you.  Perhaps you don’t have notifications set up to alert
you?
https://github.com/pelagios/pelagios-api-v3/commit/f958fe5d8542cf55ddf6382940c5fe3ffa1a9fa6

~ David


Quiz question: Which Character.isSpaceChar but not isWhitespace?

2015-10-30 Thread david.w.smi...@gmail.com
One would think that all “space characters” are by definition
“whitespace”.  Not true!:
http://www.fileformat.info/info/unicode/char/00a0/index.htm

So I’m working on an app where I can no longer use WhitespaceTokenizer
since I need to check for isSpacheChar *OR* isWhitespace.  Alternatively I
could use MappingCharFilter, I realize.

This had trickle-down effects on a search platform I’m working on that was
triggered by a user’s search.  It’s caused all sorts of head-scratching
till we discovered what’s going on.

Craziness.

~ David
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: [spatial] Indexing polygons?

2016-01-10 Thread david.w.smi...@gmail.com
Hi Robert,

First, for the basics of the spatial module, see SpatialExample.java in the
tests area.  It doesn't include a polygon example but shows the concepts of
a SpatialContext, Shape, and SpatialStrategy which are the key abstractions.

The default SpatialContext implementation doesn't handle polygons -- not
yet any way.  So use JtsSpatialContext.GEO or some other configured variant
created via JtsSpatialContextFactory and add JTS to your classpath (LGPL
licensed, by the way).  With that, you can create polygons parsed from WKT
or GeoJSON -- see the SpatialContext.getFormats().getWktReader() for
example.  That'll produce a Shape instance parsed from a polygon WKT string
you give it.  Then re-aquaint yourself with SpatialExample.java to see how
to index a Shape and how to query by a Shape.  An important difference with
the example is the choice of an appropriate SpatialStrategy.  The example
uses RecursivePrefixTreeStrategy which is best for points-only data;
otherwise I recommend CompositeSpatialStrategy.

I'm actively working on a SpatialContext implementation for "Geo3d" which
will be another option.  Your code wouldn't change other than choosing a
different SpatialContext impl.

By the way, the one-liner in your email I see you found in
SpatialExample.java:
  doc.add(new StoredField(strategy.getFieldName, pt.getX + " " + pt.getY))
-- is purely for the "stored" version (for document retrieval in search
results); not for indexing/search.  See the comments preceding it ;-)

Good luck.  And sorry for leaving you hanging for a few days; I overlooked
your email.
~ David

On Wed, Jan 6, 2016 at 10:59 PM Robert Nix  wrote:

> Hi,
>
> Is there an example in the lucene-solr source to show how to index polygons
> and how to search with and for indexed polys? I'm looking in
> lucene/spatial/src/test/ and I see an example of a point and it seems
> obvious:
>
> doc.add(new StoredField(strategy.getFieldName, pt.getX + " " + pt.getY))
>
> But nothing regarding polygons is jumping out at me. If there isn't such an
> example, can someone provide a short one?
>
> Thanks
> --
> --nix
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Is MemoryIndex and Spatial stuff combination supported?

2016-01-20 Thread david.w.smi...@gmail.com
Yup.
Just to clarify for the O.P., after getting the SpatialStrategy instance,
call createIndexableFields() which returns a list of Field instances, which
you can then call tokenStream() on as Alan indicated.  This should work
fine for any of the SpatialStrategy instances.

On Wed, Jan 20, 2016 at 2:09 PM Alan Woodward  wrote:

> Depending on the type of field, you can normally do:
>
> Field myField = …
> index.addField(fieldName, myField.tokenStream(null, null))
>
> I agree that this could be a bit nicer, though.  MemoryIndex doesn't
> support DocValues yet either, although I think there is an open ticket to
> add that.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 20 Jan 2016, at 18:32, xavi jmlucjav wrote:
>
> > Hi,
> >
> > I am on lucene 5.3.1
> >
> > I am running a doc at a time through a bunch of Queries. This is working
> > nicely with the MemoryIndex and combinations of
> > TermQuery/NumericRangeQuery. Now I wanted to add spatial stuff, so I
> > happily added more queries with SpatialStrategy.makeQuery(...).
> >
> > But, when I go to add the corresponding field to the MemoryIndex...I am
> not
> > sure how to do this, the only way to add fields to a MemoryIndex seems:
> >
> > MemoryIndex index = new MemoryIndex();
> > index.addField("content", "Readings about Salmons and other select
> > Alaska fishing Manuals", analyzer);
> > index.addField("author", "Tales of James", analyzer);
> >
> > Is this combination possible?? I expected (naively?) it would work out of
> > the box.
> >
> > thanks
> > xavi
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com