[
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-4212:
---------------------------
Attachment: SOLR-6353-4212.patch
Updated patch wit ha few small changes, details mentioned below.
Comments in no particular order...
----
I reiterate my earlier comment about javadocs -- it's really hard to review a
patch this size with so many new methods that don't have any javadocs at all.
Everytime I see a method mentioned in the new code, I look it up to jog my
memory about what exactly it does and i wind up needing to re-read/skim each
method over and over to remind myself what it does becaase they don't have any
javadocs. (I can't imagine how how any developer will ever be able to make
sense of any of this code down the road w/o more method level docs)
On the flip side: PivotFacetHelper.mergeQueryCounts _does_ have javadocs, but
it doesn't look like they match what the method does -- I'm pretty sure either
the javadocs are wrong, or the method has a bug in it and isn't working as
documented.
----
bq. I further refactored the code into into three separate classes to divide
the responsibilities clearly.
Cool ... This organization makes a lot more sense then previous patches.
----
bq. Fixed. I am not sure why that happened but once I refactored the code to do
all the parsing and calculation in prepare, I could not reproduce the issue.
Well, the best way to be sure it's fixed is to add a test showing that it works.
I've updated your patch to add a request like this to
DistributedFacetPivotLargeTest.
In order to get it to work, I had to also modify the
{{Map<String,RangeFacetRequest>}} created by FacetComponent.prepare to be a
LinkedHashMap (see more comments on this below) because otherwise it was a huge
pain in the ass to try and write code to do anything with teh response in a
clean way.
----
bq. Though I do not agree about the user counting on the same order because
this is not a list but a complex object in all our responses. But I digress.
I don't really understand the basis for your comment.
My argument is simple:
* Given:
** (top level) Range Facet reults have always been returned as a NamedList, in
the same order as specified by the facet.range params that requested them.
** This order is modeled in SolrJ's QueryResponse class as {{List<RangeFacet>
getFacetRanges()}}
* Therefore:
** We should be consistent and return Range Facet results with the same
predictible order when they are nested under pivots.
** This is already true in the SolrJ modeling in your patch, which adds
{{List<RangeFacet> getFacetRanges()}} to PivotField.
If nothing else, having the RangeFacets returned in the same order for both the
top level results and the per-pivot results is the only sane/simple way for
users to compute ratios between the top level range counts and the per pivot
value counts when dealing with many/variable ranges -- otherwise they have to
sort them themselves to corrolate.
----
As I mentioned above regarding my test additions, I fixed the consistent
ordering problem by changing the {{Map<String,RangeFacetRequest>}} created by
FacetComponent.prepare to also be a LinkedHashMap. But I still don't
understand the purpose of this being a Map (where the keys are the original
values of the facet.range params, not to be confused with having a Map keyed
off of the _tags_ associated with the list of RangeFacetRequests that use that
tag discussed below) at all.
As far as i can tell all usages of this {{Map<String,RangeFacetRequest>}}
either:
* iterate over every entry in the Map and only look at the values.
* iterate over every facet.range param string (in order), and use it to "get()"
the RangeFacetRequest object that corrisponds with that param string.
...wouldn't both of those usages be much simplier if instead of this Map there
was just a {{List<RangeFacetRequest>}} that could be iterated over when the
caller cares about looping over all RangeFacetRequest instances?
Is there some other purpose for having this {{Map<String,RangeFacetRequest>}}
that i'm not understanding?
(see below for comments about creating a reused
{{Map<String,List<RangeFacetRequest>>}} for when dealing with tags ... that's
one of the current "loop over all RangeFacetRequest" usages that seems wrong to
me anyway)
----
I think you should revive your earlier idea of having a "FacetContext" object
to put in the request context and use it to track the RangeFacetRequests and
FacetBase (FACET_QUERY) objects. (Either that, or just use the existing
FacetInfo class for this since that's pretty much it's current purpose. Is the
reason you didn't already do that because FacetInfo isn't currently in the
single node code path?)
Either way (with or w/o a new FacetContext class, with or w/o reusing
FacetInfo) the code to access the Map/List of all RangeFacetRequests and
FacetBase objects should really be encapsulated in some way -- if nothing else
to reduce the amount of casting when consumers have to pull them out of
req.getContext(...), but also to help future proof us against attempts to
access those structures before they've been initialized by
FacetComponent.prepare. (ie: throw IllegalAccessException if
{{!req.getContext().containsKey(WHATEVER_CONTEXT_KEY)}}
example...
{code}
class FacetComponent {
// ...
public static List<RangeFacetRequest>
getAllRangeFacetRequestsFromContext(SolrQueryRequest req) {
List<RangeFacetRequest> result = (List<RangeFacetRequest>)
req.getContext().get(MAGIC_CONTEXT_KEY);
if (null == result) {
throw new IllegalAccessException("RangeFacetRequests can't be accessed
from context they are initalized");
}
return result;
}
// etc...
{code}
...or better still, with the FacetContext container...
{code}
class FacetContext {
public static FacetContext getFacetContext(SolrQueryRequest req) {
FacetContext result = (FacetContext)
req.getContext().get(MAGIC_CONTEXT_KEY);
if (null == result) {
throw new IllegalAccessException("FacetContext can't be accessed before
it's initalized in request context");
}
return result;
}
private final List<RangeFacetRequest> allRangeFacets; // init in constructor
private final List<FacetBase> allQueryFacets; // init in constructor
public List<RangeFacetRequest> getAllRangeFacetRequests() { ... }
public List<FacetBase> getAllQueryFacets() { ... }
// // (see next comment about PivotFacetProcessor.getTaggedRangeFacets)
// private final Map<String,List<RangeFacetRequest>> taggedRangeFacets; //
init in constructor
// private final Map<String,List<FacetBase>> taggedQueryFacets; // init in
constructor
// public List<RangeFacetRequest> getRangeFacetRequestsForTag(String tag) {
... }
// public List<FacetBase> getAllQueryFacetsForTag(String tag) { ... }
}
{code}
----
PivotFacetProcessor.getTaggedRangeFacets (and getTaggedQueryFacets) seem really
inefficient to me...
* unless i'm missreading these, every facet.pivot param is going to cause the
the list(s) of all facet.range RangeFacetRequest objects (and all facet.query
FacetBase objects) to be iterated over to say "do you match this tag"
** in the case of facet.pivot refinement requests, that's going to mean a *lot*
of looping over the same lists over and over.
* why doesn't this follow the same pattern as
PivotFacetProcessor.getTaggedStatsFields (and StatsInfo.getStatsFieldsByTag)?
** when the RangeFacetRequest (or facet.query FacetBase objects) are
constructed, why isn't there a {{Map<String,List<RangeFacetRequest>>}} (and
{{Map<String,List<FacetBase>>}}) keyed off of the "tag" values returning each
each RangeFacetRequest (and each FacetBase) built so that getTaggedRangeFacets
and getTaggedQueryFacets can just be fast/simple lookups on these Maps instead
of constantly looping over the same lists repeatedly?
(this "tag->RangeFacetRequest" mapping could easily be accessed via the same
FacetContext mentioned above, see commented out part of that example code)
----
The way RangeFacetProcessor deals with FacetRangeOther seems really confusing
and sketchy. It took me a while to understand why those Exceptions were "safe
to ignore"
Rather then rely on "expected exceptions" when looping over the List of
FacetRanges, wouldn't the code be alot easier to understand if we changed
FacetRange to look something like...
{code}
/** null if this range doesn't corrispond to an {@link FacetRangeOther} */
public final FacetRange other;
// ...other existing vars...
private FacetRange(FacetRangeOther other, String name, ...) {
...
}
public FacetRange(String name,...) {
this(null, name, ...);
}
public FacetRange(FacetRangeOther other, ...) {
this(other, other.name, ...);
}
{code}
...And then simplified the response building RangeFacetProcessor to look
something like...
{code}
final NamedList<Object> res = new SimpleOrderedMap<>();
final NamedList<Integer> counts = new NamedList<>();
res.add("counts", counts); // built up below
// explicitly return the gap. compute this early so we are more
// likely to catch parse errors before attempting math
res.add("gap", rfr.getGapObj());
// explicitly return the start and end so all the counts
// (including before/after/between) are meaningful - even if mincount
// has removed the neighboring ranges
res.add("start", rfr.getStartObj());
res.add("end", rfr.getEndObj());
for (RangeFacetRequest.FacetRange range : rfr.getFacetRanges()) {
final int count = rangeCount(rfr, range);
if (null == range.getFacetRangeOther()) {
// normal range, added to count depending on mincount conditions...
if (count >= rfr.getMinCount()) {
counts.add(range.name, count);
}
} else {
// special 'other' count, added to higher level res
res.add(range.name, count);
}
}
return res;
{code}
?
If I'm missing something, and there's a reason the current dual loops over all
FacetRanges w/exception checking is better for some reason, then at a bare
minimum those empty catch blocks need to make it a lot more clear why they are
there and why the exceptions are there.
----
{code}
// TODO: slight optimization would prevent double-parsing of any localParams
Query qobj = QParser.getParser(parsed.facetValue, null, req).getQuery();
{code}
I added a nocommit here to ensure we don't forget to file a jira to track this.
It's not just an issue of localparams, but also the parsing & construction of
the Query object -- doing that once per request (just like the way you've
changed the range faceting to do all the bucket calculation once) could give us
some serious performance savings when facet.query params are hung of of many
and/or large pivot trees. (should just mean a pretty trivial new subclass of
FacetBase that also has a {{public Query getQuery()}} accessor)
----
RangeFacetRequest implementation weirdness...
* It's really weird that none of the public accessor methods (getStart, getEnd,
getGap, getGapObj, etc...) have any javadocs, even though many of the protected
variables they return _do_ have jdocs (and the info in those jdocs seems pretty
important to the method callers -- like the distinction betwen end vs endObj).
* The method signature & jdocs for createCalculator look dangerous -- it says
it will return a calculator based on the "rangeFacetRequest" argument, but it's
a non static method and in a couple places (schemaField, facetOn) it ignores
the "rangeFacetRequest" arg and uses properties of "this" (and in other places
vice vesa) -- suggesting either this method should be static so it doesn't
accidently use "this", or it should be changed to not take any arguments and
always refer to "this" (which seems to be the intention based on the place i
see this method being called currently.)
* There's a weird bit of "format and immediately reparse" logic in
RangeFacetRequest + RangeEndpointCalculator that makes no sense to me...
** RangeEndpointCalculator knows the type specific "end" after looping over the
ranges in {{computeRanges()}}
** RangeEndpointCalculator uses "formatValue(end)" to build "endStr"
** RangeEndpointCalculator.getComputedEnd() returns "endStr"
** RangeFacetRequest calls getComputedEnd() and then passes that String to
RangeEndpointCalculator.getValue(String) to parse it back into a type specific
object
** why not just have a {{public T getComputedEnd()}} method return the original
(type specific) "end" object the calculator already figured out?
* likewise: the type specific start value...
** RangeEndpointCalculator.computeRanges() already figures out {{final T start
= getValue(rfr.getStart());}}
** so why make RangeFacetRequest ask the calculator to reparse the original
start string with {{calculator.getValue(start);}} ?
** why not just have a {{public T getStart()}} method on the calculator to
return that value it already parsed?
* likewise: gap
** should change RangeEndpointCalculator.getGap(String) to just getGap()
** won't save us any parsing round trips like the start / end changes suggested
above, but will keep the API consistent
** for the same reason the old code had this comment...{code}
// explicitly return the gap. compute this early so we are more
// likely to catch parse errors before attempting math
{code}...RangeEndpointCalculator should call & remember the value of parseGap()
early in it's constructor, and then getGap() can just be a trivial accessor
method.
----
PivotFacetProcessor.addPivotQueriesAndRanges concerns...
* this method was doing a lot of null checking against the facetQueries and
facetRanges Lists -- even though there was never any possibility of those lists
being null
* (partly because of these null checks that always passed) things like
SimpleFacets instances and RangeFacetProcessors were getting constructed for
every Pivot value (even on refinement) even when there are no ranges or facet
queries hanging off of a pivot.
So in the updated patch I also cleaned this method up a bit to assert the lists
are non null and keep the logicc minimal when they are empty. (had to try it
myself to be certain i wans't missing anything)
I also tightened up the error handling to be more specific about where the
SyntaxErrors are coming from.
> Let facet queries hang off of pivots
> ------------------------------------
>
> Key: SOLR-4212
> URL: https://issues.apache.org/jira/browse/SOLR-4212
> Project: Solr
> Issue Type: Sub-task
> Components: search
> Affects Versions: 4.0
> Reporter: Steve Molloy
> Assignee: Shalin Shekhar Mangar
> Fix For: 5.3, Trunk
>
> Attachments: SOLR-4212-multiple-q.patch, SOLR-4212-multiple-q.patch,
> SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch,
> SOLR-4212.patch, SOLR-4212.patch, SOLR-6353-4212.patch, SOLR-6353-4212.patch,
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch,
> SOLR-6353-6686-4212.patch, SOLR-6353-6686-4212.patch,
> SOLR-6353-6686-4212.patch, patch-4212.txt
>
>
> Facet pivot provide hierarchical support for computing data used to populate
> a treemap or similar visualization. TreeMaps usually offer users extra
> information by applying an overlay color on top of the existing square sizes
> based on hierarchical counts. This second count is based on user choices,
> representing, usually with gradient, the proportion of the square that fits
> the user's choices.
> The proposition is to use local parameters to specify facet query to apply
> for pivot which matches a tag set on facet query. Parameter format would look
> like:
> facet.pivot={!query=r1}category,manufacturer
> facet.query={!tag=r1}somequery
> facet.query={!tag=r1}somedate:[NOW-1YEAR TO NOW]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]