[jira] [Updated] (SOLR-4212) Let facet queries hang off of pivots

Hoss Man (JIRA) Thu, 02 Jul 2015 14:12:19 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-4212:
---------------------------
    Attachment: SOLR-6353-4212.patch

Updated patch wit ha few small changes, details mentioned below.

Comments in no particular order...

----

I reiterate my earlier comment about javadocs -- it's really hard to review a 
patch this size with so many new methods that don't have any javadocs at all.  
Everytime I see a method mentioned in the new code, I look it up to jog my 
memory about what exactly it does and i wind up needing to re-read/skim each 
method over and over to remind myself what it does becaase they don't have any 
javadocs.  (I can't imagine how how any developer will ever be able to make 
sense of any of this code down the road w/o more method level docs)

On the flip side: PivotFacetHelper.mergeQueryCounts _does_ have javadocs, but 
it doesn't look like they match what the method does -- I'm pretty sure either 
the javadocs are wrong, or the method has a bug in it and isn't working as 
documented.

----

bq. I further refactored the code into into three separate classes to divide 
the responsibilities clearly. 

Cool ... This organization makes a lot more sense then previous patches.

----

bq. Fixed. I am not sure why that happened but once I refactored the code to do 
all the parsing and calculation in prepare, I could not reproduce the issue.

Well, the best way to be sure it's fixed is to add a test showing that it works.

I've updated your patch to add a request like this to 
DistributedFacetPivotLargeTest.

In order to get it to work, I had to also modify the 
{{Map<String,RangeFacetRequest>}} created by FacetComponent.prepare to be a 
LinkedHashMap (see more comments on this below) because otherwise it was a huge 
pain in the ass to try and write code to do anything with teh response in a 
clean way.

----

bq. Though I do not agree about the user counting on the same order because 
this is not a list but a complex object in all our responses. But I digress.

I don't really understand the basis for your comment.

My argument is simple:

* Given:
** (top level) Range Facet reults have always been returned as a NamedList, in 
the same order as specified by the facet.range params that requested them.
** This order is modeled in SolrJ's QueryResponse class as {{List<RangeFacet> 
getFacetRanges()}}
* Therefore:
** We should be consistent and return Range Facet results with the same 
predictible order when they are nested under pivots.
** This is already true in the SolrJ modeling in your patch, which adds 
{{List<RangeFacet> getFacetRanges()}} to PivotField.

If nothing else, having the RangeFacets returned in the same order for both the 
top level results and the per-pivot results is the only sane/simple way for 
users to compute ratios between the top level range counts and the per pivot 
value counts when dealing with many/variable ranges -- otherwise they have to 
sort them themselves to corrolate.

----

As I mentioned above regarding my test additions, I fixed the consistent 
ordering problem by changing the {{Map<String,RangeFacetRequest>}} created by 
FacetComponent.prepare to also be a LinkedHashMap.  But I still don't 
understand the purpose of this being a Map (where the keys are the original 
values of the facet.range params, not to be confused with having a Map keyed 
off of the _tags_ associated with the list of RangeFacetRequests that use that 
tag discussed below) at all.

As far as i can tell all usages of this {{Map<String,RangeFacetRequest>}} 
either:

* iterate over every entry in the Map and only look at the values.
* iterate over every facet.range param string (in order), and use it to "get()" 
the RangeFacetRequest object that corrisponds with that param string.

...wouldn't both of those usages be much simplier if instead of this Map there 
was just a {{List<RangeFacetRequest>}} that could be iterated over when the 
caller cares about looping over all RangeFacetRequest instances?

Is there some other purpose for having this {{Map<String,RangeFacetRequest>}} 
that i'm not understanding?

(see below for comments about creating a reused 
{{Map<String,List<RangeFacetRequest>>}} for when dealing with tags ... that's 
one of the current "loop over all RangeFacetRequest" usages that seems wrong to 
me anyway)

----

I think you should revive your earlier idea of having a "FacetContext" object 
to put in the request context and use it to track the RangeFacetRequests and 
FacetBase (FACET_QUERY) objects.  (Either that, or just use the existing 
FacetInfo class for this since that's pretty much it's current purpose.  Is the 
reason you didn't already do that because FacetInfo isn't currently in the 
single node code path?)

Either way (with or w/o a new FacetContext class, with or w/o reusing 
FacetInfo) the code to access the Map/List of all RangeFacetRequests and 
FacetBase objects should really be encapsulated in some way -- if nothing else 
to reduce the amount of casting when consumers have to pull them out of 
req.getContext(...), but also to help future proof us against attempts to 
access those structures before they've been initialized by 
FacetComponent.prepare.  (ie: throw IllegalAccessException if 
{{!req.getContext().containsKey(WHATEVER_CONTEXT_KEY)}}

example...

{code}
class FacetComponent {
  // ...
  public static List<RangeFacetRequest> 
getAllRangeFacetRequestsFromContext(SolrQueryRequest req) {
    List<RangeFacetRequest> result = (List<RangeFacetRequest>) 
req.getContext().get(MAGIC_CONTEXT_KEY);
    if (null == result) {
      throw new IllegalAccessException("RangeFacetRequests can't be accessed 
from context they are initalized");
    }
    return result;
  }
  // etc...
{code}

...or better still, with the FacetContext container...

{code}
class FacetContext {
  public static FacetContext getFacetContext(SolrQueryRequest req) {
    FacetContext result = (FacetContext) 
req.getContext().get(MAGIC_CONTEXT_KEY);
    if (null == result) {
      throw new IllegalAccessException("FacetContext can't be accessed before 
it's initalized in request context");
    }
    return result;
  }
  private final List<RangeFacetRequest> allRangeFacets; // init in constructor
  private final List<FacetBase> allQueryFacets; // init in constructor
  public List<RangeFacetRequest> getAllRangeFacetRequests() { ... }
  public List<FacetBase> getAllQueryFacets() { ... }
 
  // // (see next comment about PivotFacetProcessor.getTaggedRangeFacets)
  // private final Map<String,List<RangeFacetRequest>> taggedRangeFacets; // 
init in constructor
  // private final Map<String,List<FacetBase>> taggedQueryFacets; // init in 
constructor
  // public List<RangeFacetRequest> getRangeFacetRequestsForTag(String tag) { 
... }
  // public List<FacetBase> getAllQueryFacetsForTag(String tag) { ... }
}
{code}

----

PivotFacetProcessor.getTaggedRangeFacets (and getTaggedQueryFacets) seem really 
inefficient to me...

* unless i'm missreading these, every facet.pivot param is going to cause the 
the list(s) of all facet.range RangeFacetRequest objects (and all facet.query 
FacetBase objects) to be iterated over to say "do you match this tag"
** in the case of facet.pivot refinement requests, that's going to mean a *lot* 
of looping over the same lists over and over.

* why doesn't this follow the same pattern as 
PivotFacetProcessor.getTaggedStatsFields (and StatsInfo.getStatsFieldsByTag)?
** when the RangeFacetRequest (or facet.query FacetBase objects) are 
constructed, why isn't there a {{Map<String,List<RangeFacetRequest>>}} (and 
{{Map<String,List<FacetBase>>}}) keyed off of the "tag" values returning each 
each RangeFacetRequest (and each FacetBase) built so that getTaggedRangeFacets 
and getTaggedQueryFacets can just be fast/simple lookups on these Maps instead 
of constantly looping over the same lists repeatedly?

(this "tag->RangeFacetRequest" mapping could easily be accessed via the same 
FacetContext mentioned above, see commented out part of that example code)

----

The way RangeFacetProcessor deals with FacetRangeOther seems really confusing 
and sketchy.  It took me a while to understand why those Exceptions were "safe 
to ignore"

Rather then rely on "expected exceptions" when looping over the List of 
FacetRanges, wouldn't the code be alot easier to understand if we changed 
FacetRange to look something like...

{code}
/** null if this range doesn't corrispond to an {@link FacetRangeOther} */
public final FacetRange other;
// ...other existing vars...
private FacetRange(FacetRangeOther other, String name, ...) {
  ...
}
public FacetRange(String name,...) {
  this(null, name, ...);
}
public FacetRange(FacetRangeOther other, ...) {
  this(other, other.name, ...);
}
{code}

...And then simplified the response building RangeFacetProcessor to look 
something like...

{code}
final NamedList<Object> res = new SimpleOrderedMap<>();
final NamedList<Integer> counts = new NamedList<>();
res.add("counts", counts); // built up below

// explicitly return the gap.  compute this early so we are more
// likely to catch parse errors before attempting math
res.add("gap", rfr.getGapObj());

// explicitly return the start and end so all the counts
// (including before/after/between) are meaningful - even if mincount
// has removed the neighboring ranges
res.add("start", rfr.getStartObj());
res.add("end", rfr.getEndObj());

for (RangeFacetRequest.FacetRange range : rfr.getFacetRanges()) {
  final int count = rangeCount(rfr, range);
  if (null == range.getFacetRangeOther()) {
    // normal range, added to count depending on mincount conditions...
    if (count >= rfr.getMinCount()) {
      counts.add(range.name, count);
    }
  } else {
    // special 'other' count, added to higher level res
    res.add(range.name, count);
  }
}

return res;
{code}

?

If I'm missing something, and there's a reason the current dual loops over all 
FacetRanges w/exception checking is better for some reason, then at a bare 
minimum those empty catch blocks need to make it a lot more clear why they are 
there and why the exceptions are there.

----

{code}
    // TODO: slight optimization would prevent double-parsing of any localParams
    Query qobj = QParser.getParser(parsed.facetValue, null, req).getQuery();
{code}

I added a nocommit here to ensure we don't forget to file a jira to track this.

It's not just an issue of localparams, but also the parsing & construction of 
the Query object -- doing that once per request (just like the way you've 
changed the range faceting to do all the bucket calculation once) could give us 
some serious performance savings when facet.query params are hung of of many 
and/or large pivot trees.  (should just mean a pretty trivial new subclass of 
FacetBase that also has a {{public Query getQuery()}} accessor)

----

RangeFacetRequest implementation weirdness...

* It's really weird that none of the public accessor methods (getStart, getEnd, 
getGap, getGapObj, etc...) have any javadocs, even though many of the protected 
variables they return _do_ have jdocs (and the info in those jdocs seems pretty 
important to the method callers -- like the distinction betwen end vs endObj).

* The method signature & jdocs for createCalculator look dangerous -- it says 
it will return a calculator based on the "rangeFacetRequest" argument, but it's 
a non static method and in a couple places (schemaField, facetOn) it ignores 
the "rangeFacetRequest" arg and uses properties of "this" (and in other places 
vice vesa) -- suggesting either this method should be static so it doesn't 
accidently use "this", or it should be changed to not take any arguments and 
always refer to "this" (which seems to be the intention based on the place i 
see this method being called currently.)

* There's a weird bit of "format and immediately reparse" logic in 
RangeFacetRequest + RangeEndpointCalculator that makes no sense to me...
** RangeEndpointCalculator knows the type specific "end" after looping over the 
ranges in {{computeRanges()}}
** RangeEndpointCalculator uses "formatValue(end)" to build "endStr"
** RangeEndpointCalculator.getComputedEnd() returns "endStr"
** RangeFacetRequest calls getComputedEnd() and then passes that String to 
RangeEndpointCalculator.getValue(String) to parse it back into a type specific 
object
** why not just have a {{public T getComputedEnd()}} method return the original 
(type specific) "end" object the calculator already figured out?
* likewise: the type specific start value...
** RangeEndpointCalculator.computeRanges() already figures out {{final T start 
= getValue(rfr.getStart());}}
** so why make RangeFacetRequest ask the calculator to reparse the original 
start string with {{calculator.getValue(start);}} ?
** why not just have a {{public T getStart()}} method on the calculator to 
return that value it already parsed?
* likewise: gap
** should change RangeEndpointCalculator.getGap(String) to just getGap()
** won't save us any parsing round trips like the start / end changes suggested 
above, but will keep the API consistent
** for the same reason the old code had this comment...{code}
// explicitly return the gap.  compute this early so we are more 
// likely to catch parse errors before attempting math
{code}...RangeEndpointCalculator should call & remember the value of parseGap() 
early in it's constructor, and then getGap() can just be a trivial accessor 
method.

----

PivotFacetProcessor.addPivotQueriesAndRanges concerns...

* this method was doing a lot of null checking against the facetQueries and 
facetRanges Lists -- even though there was never any possibility of those lists 
being null
* (partly because of these null checks that always passed) things like 
SimpleFacets instances and RangeFacetProcessors were getting constructed for 
every Pivot value (even on refinement) even when there are no ranges or facet 
queries hanging off of a pivot.

So in the updated patch I also cleaned this method up a bit to assert the lists 
are non null and keep the logicc minimal when they are empty. (had to try it 
myself to be certain i wans't missing anything)

I also tightened up the error handling to be more specific about where the 
SyntaxErrors are coming from.



> Let facet queries hang off of pivots
> ------------------------------------
>
>                 Key: SOLR-4212
>                 URL: https://issues.apache.org/jira/browse/SOLR-4212
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Steve Molloy
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 5.3, Trunk
>
>         Attachments: SOLR-4212-multiple-q.patch, SOLR-4212-multiple-q.patch, 
> SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, 
> SOLR-4212.patch, SOLR-4212.patch, SOLR-6353-4212.patch, SOLR-6353-4212.patch, 
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch, 
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch, 
> SOLR-6353-6686-4212.patch, patch-4212.txt
>
>
> Facet pivot provide hierarchical support for computing data used to populate 
> a treemap or similar visualization. TreeMaps usually offer users extra 
> information by applying an overlay color on top of the existing square sizes 
> based on hierarchical counts. This second count is based on user choices, 
> representing, usually with gradient, the proportion of the square that fits 
> the user's choices.
> The proposition is to use local parameters to specify facet query to apply 
> for pivot which matches a tag set on facet query. Parameter format would look 
> like:
> facet.pivot={!query=r1}category,manufacturer
> facet.query={!tag=r1}somequery
> facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-4212) Let facet queries hang off of pivots

Reply via email to