The confusing thing about the block mask is that it is actually defining the set of things that are "Not Children" as opposed to "Are Parents" ... so in cases where you have more than 2 levels, and you want to tread a middle level as a parent to the lower levels this becomes an important distinction... Also if you have an index with a mixture of hierarchical documents and other non block/join docs. In both of those cases the block mask must match everything above the first level to be considered children and also match any non-hierarchical documents. It needs to mask (hide) anything not a child.
On Fri, Apr 29, 2022 at 8:57 AM Mikhail Khludnev <m...@apache.org> wrote: > Hello, James. > > Excuse me if I didn't fully get all points of your inquiry. > As I grasped the challenge. One can not filter/select certain parents > (types) with `which` param, because block join is a plain nextBitSet() over > dense ordinals. > So, parents bitset should include all parents - disjunct all parent types, > and then, a parent level filter should select a certain parent type. > q={!parent which=$dads}chld_name:ABC&dads=doc_type:(t2 p2)&fq=doc_type:t2 > It should be explained somewhere around > https://solr.apache.org/guide/8_8/other-parsers.html#block-mask pls let me > know if we can add some more caveats there covering your case. > > Have a good join! > > On Thu, Apr 28, 2022 at 5:43 PM James Greene <ja...@jamesaustingreene.com> > wrote: > > > My team is in the process of moving from solr 6.6 to 8.11.1 and have > > noticed some weirdness (wrong parent docs in result) when using the > > {!parent blockjoin query parser. We have multiple 'root' entities > > configured in DIH and i'm wondering if this could be a causation or if > > there is a bug at play with the blockjoin. Any more info on how to > > diagnose the issue is appreciated! > > > > ----------------------------------- > > Example data: > > > > [ > > { > > "_root_": "/t2/1/", > > "doc_id": "/t2/1/", > > "doc_type": "t2", > > "t2_id":1, > > "chldrn": [ > > { > > "_root_": "/t2/1/", > > "_nest_path_": "/chldrn#1", > > "doc_id": "/t2/chld/1/", > > "doc_type": "chld", > > "chld_name": "DEF", > > "chld_t2_id":1 > > } > > ] > > }, > > { > > "_root_": "/p1/1/", > > "doc_id": "/p1/1/", > > "doc_type": "p1", > > "p1_id":1, > > "chldrn": [ > > { > > "_root_": "/p1/1/", > > "_nest_path_": "/chldrn#1", > > "doc_id": "/p1/chld/1/", > > "doc_type": "chld", > > "chld_name": "ABC", > > "chld_p1_id":1 > > }, > > { > > "_root_": "/p1/1/", > > "_nest_path_": "/chldrn#2", > > "doc_id": "/p1/chld/2/", > > "doc_type": "chld", > > "chld_name": "DEF", > > "chld_p1_id": 1 > > } > > ] > > } > > ] > > > > > > ----------------------------------- > > Queries giving the wrong result: > > > > q={!parent which=doc_type:t2}chld_name:ABC > > > > q={!parent which=doc_type:t2}(doc_type:chld AND chld_name:ABC) > > > > q={!parent which=doc_type:t2 v=$qq}chld_name:ABC > > ?qq=doc_type:chld > > > > > > ----------------------------------- > > I found an old thread talking about child docs shouldn't have the same > > field name as parent doc (even with different values) here: > > > > > https://stackoverflow.com/questions/36602638/solr-returning-incorrect-results-when-filtering-child-docuements > > But I got the same results when trying to filter by childen using a > > different field: > > > > q={!parent which=doc_type:t2}(_nest_path_:/chldrn AND chld_name:ABC) > > > > I would expect there would be no match since the parent (doc_type:t2) > does > > not have a child (chld_name:ABC) but i'm actually getting t2 in the > result: > > [ > > { > > "_root_": "/t2/1/", > > "doc_id": "/t2/1/", > > "doc_type": "t2", > > "t2_id":1, > > "chldrn": [ > > { > > "_root_": "/t2/1/", > > "_nest_path_": "/chldrn#1", > > "doc_id": "/t2/chld/1/", > > "doc_type": "chld", > > "chld_name": "DEF", > > "chld_t2_id":1 > > } > > ] > > } > > ] > > > > ----------------------------------- > > Debug for query returning the wrong document when 0 docs are expected: > > > > "debug":{ > > "rawquerystring":"{!parent which=doc_type:t2}chld_name:ABC", > > "querystring":"{!parent which=doc_type:t2}chld_name:ABC", > > "parsedquery":"AllParentsAware(ToParentBlockJoinQuery > > (+chld_name:abc))", > > "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)", > > "explain":{ > > "/t2/1/":"\n0.0 = Score based on 1 child docs in range from 0 to 3, > > best match:\n 0.0 = ConstantScore(chld_name:abc)^0.0\n"}, > > "QParser":"BlockJoinParentQParser", > > ... > > } > > > > > > ----------------------------------- > > If I query using a diffrent parent doc_type (doc_type:p1) and child name > > (chld_name:DEF) I get the expected result (0 docs returned) using query: > > > > q={!parent which=doc_type:p1}chld_name:DEF > > > > > > ----------------------------------- > > If I query using a diffrent parent doc_type (doc_type:p1) and child name > > (chld_name:ABC) I get the expected result (1 docs returned) using query: > > > > q={!parent which=doc_type:p1}chld_name:DEF > > > > ^^Debug query of getting expected 1 doc back (docs in range is 2 to 3 but > > yet the original problematic query has 0 to 3 whatever that means): > > "debug":{ > > "rawquerystring":"{!parent which=doc_type:p1}chld_name:ABC", > > "querystring":"{!parent which=doc_type:p1}chld_name:ABC", > > "parsedquery":"AllParentsAware(ToParentBlockJoinQuery > > (+chld_name:abc))", > > "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)", > > "explain":{ > > "/t2/1/":"\n0.0 = Score based on 2 child docs in range from 2 to 3, > > best match:\n 0.0 = ConstantScore(chld_name:abc)^0.0\n"}, > > "QParser":"BlockJoinParentQParser", > > ... > > } > > > > > > ----------------------------------- > > I have a 'work around' which seems to do the trick but it feels hacky > and I > > wonder if having to qualify the child docs more will affect query > > performance. If I further qualify the child doc using a field that > doesn't > > exist in the other child docs I get the expected (0 matches) result with > > query: > > > > q={!parent which=doc_type:t2}(chld_name:ABC AND chld_t2_id:*) > > > > > > ----------------------------------- > > What's also interesting is that if I remove the child doc > > {"doc_id":"/p1/chld/1/","chld_name":"ABC"} of parent > > {"doc_id":"/p1/1/","doc_type":"p1"} out of the index so that my > collection > > has: > > > > [ > > { > > "_root_": "/t2/1/", > > "doc_id": "/t2/1/", > > "doc_type": "t2", > > "t2_id":1, > > "chldrn": [ > > { > > "_root_": "/t2/1/", > > "_nest_path_": "/chldrn#1", > > "doc_id": "/t2/chld/1/", > > "doc_type": "chld", > > "chld_name": "DEF", > > "chld_t2_id":1 > > } > > ] > > }, > > { > > "_root_": "/p1/1/", > > "doc_id": "/p1/1/", > > "doc_type": "p1", > > "p1_id":1, > > "chldrn": [ > > { > > "_root_": "/p1/1/", > > "_nest_path_": "/chldrn#2", > > "doc_id": "/p1/chld/2/", > > "doc_type": "chld", > > "chld_name": "DEF", > > "chld_p1_id": 1 > > } > > ] > > } > > ] > > > > I get the expected results (no matches found) when I use the query: > > > > q={!parent which=doc_type:t2}chld_name:ABC > > > > > > ----------------------------------- > > Other Notes: > > > > - I've blown away recreated the index multiple times (always using DIH to > > re-import that data) which should rule out an anomaly with index > > linking/block merge. > > - Solrcloud mode is not being used. > > - I have <uniqueKey>doc_id</uniqueKey> in managed-schema and have no docs > > with duplicate doc_id in the index (sample config below). > > - I have _root_ as indexed only (changed it to stored=true for debugging > > but the issue remains). > > - We use the DIH (data import handler) to import the data (sample config > > below). > > - The 't2' doc_type appears as first entity in the DIH so I *think* its > the > > doc that gets indexed first during the DIH full import (may be relevent > in > > identifying a bug with block join/indexing?). > > > > > > ----------------------------------- > > Relevent entries in managed-schema: > > > > <uniqueKey>doc_id</uniqueKey> > > ... > > <fieldType name="nest_path" class="solr.NestPathField" stored="false" /> > > <fieldType name="lowercase" class="solr.TextField" > > positionIncrementGap="100"> > > <analyzer> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LengthFilterFactory" min="1" max="32766"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > <fieldType name="plong" class="solr.LongPointField" docValues="true" > > stored="false"/> > > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > > docValues="true" stored="false"/> > > ... > > <field name="_root_" type="string" docValues="false"/> > > <field name="_nest_path_" type="nest_path"/> > > <field name="_version_" type="plong" indexed="false"/> > > ... > > <field name="doc_id" type="string" stored="true" docValues="false"/> > > <field name="doc_type" type="string"/> > > <field name="chld_name" type="lowercase" stored="true" > docValues="false"/> > > ... > > <dynamicField name="*_id" type="plong"/> > > > > > > ----------------------------------- > > Relevent entries in data-config.xml: > > > > <?xml version="1.0"?> > > <dataConfig> > > <dataSource name="mariadb" driver="org.mariadb.jdbc.Driver" > > batchSize="-1" > > url="jdbc:mysql://host:3306/db?sessionVariables=net_write_timeout=3600" > > user="" password="" /> > > <document> > > <entity dataSource="mariadb" pk="id" name="t2" > > deletedPkQuery="select concat('/t2/',`id`,'/') as id from > `t2` > > where `deleted_at` >= convert_tz('${dataimporter.last_index_time}', > > '+00:00', @@global.time_zone)" > > query="select concat('/t2/',`id`,'/') as `doc_id`, 't2' as > > `doc_type`, `id` as `t2_id` where `deleted_at`is null" > > deltaImportQuery="select concat('/t2/',`id`,'/') as `doc_id`, > > 't2' as `doc_type`, `id` as `t2_id` where `deleted_at` is null and `id` = > > '${dataimporter.delta.id}'" > > deltaQuery="select `id` from `t2` where `updated_at` > > > convert_tz('${dataimporter.last_index_time}', '+00:00', > > @@global.time_zone)"> > > <entity name="chldrn" child="true" query="select > > concat('/t2/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`, > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`, `t2_id` > as > > `chld_t2_id` where `t2_id` = ${t2.t2_id} and `deleted_at` is null" /> > > </entity> > > <entity dataSource="mariadb" pk="id" name="p1" > > deletedPkQuery="select concat('/p1/',`id`,'/') as `id` from > > `p1` where `deleted_at` >= > convert_tz('${dataimporter.last_index_time}', > > '+00:00', @@global.time_zone)" > > query="select concat('/p1/',`id`,'/') as `doc_id`, 'p1' as > > `doc_type`, `id` as `p1_id` where `deleted_at`is null" > > deltaImportQuery="select concat('/p1/',`id`,'/') as `doc_id`, > > 'p1' as `doc_type`, `id` as `p1_id` where `deleted_at` is null and `id` = > > '${dataimporter.delta.id}'" > > deltaQuery="select `id` from `p1` where `updated_at` > > > convert_tz('${dataimporter.last_index_time}', '+00:00', > > @@global.time_zone)"> > > <entity name="chldrn" child="true" query="select > > concat('/p1/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`, > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`, `p1_id` > as > > `chld_p1_id` where `p1_id` = ${p1.p1_id} and `deleted_at` is null" /> > > </entity> > > </document> > > </dataConfig> > > > > > -- > Sincerely yours > Mikhail Khludnev > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)