Of course it's possible. Block mask ("Not Children") just has to include *:* -_node_path_:* (all documents without a _nest_path_) to catch non-block join docs as well as a filter for the levels in the hierarchy that are to be considered "parents" (not children)
If you don't define _node_path_ then you have to plan even more carefully and supply some other custom way to distinguish... which may be hard, so I'd definitely recommend using _node_path_ unless you have a very good reason not to. It has to be possible since even in a parent/child data set there could be parents with no children? On Fri, Apr 29, 2022 at 1:21 PM Mikhail Khludnev <m...@apache.org> wrote: > Hello, Gus. > > On Fri, Apr 29, 2022 at 6:55 PM Gus Heck <gus.h...@gmail.com> wrote: > > > Also if you have an index with a mixture of > > hierarchical documents and other non block/join docs. > > > Such mix is not an option, it should be declared somewhere. Any standalone > docs should be marked as parent. > > > > > > On Fri, Apr 29, 2022 at 8:57 AM Mikhail Khludnev <m...@apache.org> > wrote: > > > > > Hello, James. > > > > > > Excuse me if I didn't fully get all points of your inquiry. > > > As I grasped the challenge. One can not filter/select certain parents > > > (types) with `which` param, because block join is a plain nextBitSet() > > over > > > dense ordinals. > > > So, parents bitset should include all parents - disjunct all parent > > types, > > > and then, a parent level filter should select a certain parent type. > > > q={!parent which=$dads}chld_name:ABC&dads=doc_type:(t2 > p2)&fq=doc_type:t2 > > > It should be explained somewhere around > > > https://solr.apache.org/guide/8_8/other-parsers.html#block-mask pls > let > > me > > > know if we can add some more caveats there covering your case. > > > > > > Have a good join! > > > > > > On Thu, Apr 28, 2022 at 5:43 PM James Greene < > > ja...@jamesaustingreene.com> > > > wrote: > > > > > > > My team is in the process of moving from solr 6.6 to 8.11.1 and have > > > > noticed some weirdness (wrong parent docs in result) when using the > > > > {!parent blockjoin query parser. We have multiple 'root' entities > > > > configured in DIH and i'm wondering if this could be a causation or > if > > > > there is a bug at play with the blockjoin. Any more info on how to > > > > diagnose the issue is appreciated! > > > > > > > > ----------------------------------- > > > > Example data: > > > > > > > > [ > > > > { > > > > "_root_": "/t2/1/", > > > > "doc_id": "/t2/1/", > > > > "doc_type": "t2", > > > > "t2_id":1, > > > > "chldrn": [ > > > > { > > > > "_root_": "/t2/1/", > > > > "_nest_path_": "/chldrn#1", > > > > "doc_id": "/t2/chld/1/", > > > > "doc_type": "chld", > > > > "chld_name": "DEF", > > > > "chld_t2_id":1 > > > > } > > > > ] > > > > }, > > > > { > > > > "_root_": "/p1/1/", > > > > "doc_id": "/p1/1/", > > > > "doc_type": "p1", > > > > "p1_id":1, > > > > "chldrn": [ > > > > { > > > > "_root_": "/p1/1/", > > > > "_nest_path_": "/chldrn#1", > > > > "doc_id": "/p1/chld/1/", > > > > "doc_type": "chld", > > > > "chld_name": "ABC", > > > > "chld_p1_id":1 > > > > }, > > > > { > > > > "_root_": "/p1/1/", > > > > "_nest_path_": "/chldrn#2", > > > > "doc_id": "/p1/chld/2/", > > > > "doc_type": "chld", > > > > "chld_name": "DEF", > > > > "chld_p1_id": 1 > > > > } > > > > ] > > > > } > > > > ] > > > > > > > > > > > > ----------------------------------- > > > > Queries giving the wrong result: > > > > > > > > q={!parent which=doc_type:t2}chld_name:ABC > > > > > > > > q={!parent which=doc_type:t2}(doc_type:chld AND chld_name:ABC) > > > > > > > > q={!parent which=doc_type:t2 v=$qq}chld_name:ABC > > > > ?qq=doc_type:chld > > > > > > > > > > > > ----------------------------------- > > > > I found an old thread talking about child docs shouldn't have the > same > > > > field name as parent doc (even with different values) here: > > > > > > > > > > > > > > https://stackoverflow.com/questions/36602638/solr-returning-incorrect-results-when-filtering-child-docuements > > > > But I got the same results when trying to filter by childen using a > > > > different field: > > > > > > > > q={!parent which=doc_type:t2}(_nest_path_:/chldrn AND chld_name:ABC) > > > > > > > > I would expect there would be no match since the parent (doc_type:t2) > > > does > > > > not have a child (chld_name:ABC) but i'm actually getting t2 in the > > > result: > > > > [ > > > > { > > > > "_root_": "/t2/1/", > > > > "doc_id": "/t2/1/", > > > > "doc_type": "t2", > > > > "t2_id":1, > > > > "chldrn": [ > > > > { > > > > "_root_": "/t2/1/", > > > > "_nest_path_": "/chldrn#1", > > > > "doc_id": "/t2/chld/1/", > > > > "doc_type": "chld", > > > > "chld_name": "DEF", > > > > "chld_t2_id":1 > > > > } > > > > ] > > > > } > > > > ] > > > > > > > > ----------------------------------- > > > > Debug for query returning the wrong document when 0 docs are > expected: > > > > > > > > "debug":{ > > > > "rawquerystring":"{!parent which=doc_type:t2}chld_name:ABC", > > > > "querystring":"{!parent which=doc_type:t2}chld_name:ABC", > > > > "parsedquery":"AllParentsAware(ToParentBlockJoinQuery > > > > (+chld_name:abc))", > > > > "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)", > > > > "explain":{ > > > > "/t2/1/":"\n0.0 = Score based on 1 child docs in range from 0 > to > > 3, > > > > best match:\n 0.0 = ConstantScore(chld_name:abc)^0.0\n"}, > > > > "QParser":"BlockJoinParentQParser", > > > > ... > > > > } > > > > > > > > > > > > ----------------------------------- > > > > If I query using a diffrent parent doc_type (doc_type:p1) and child > > name > > > > (chld_name:DEF) I get the expected result (0 docs returned) using > > query: > > > > > > > > q={!parent which=doc_type:p1}chld_name:DEF > > > > > > > > > > > > ----------------------------------- > > > > If I query using a diffrent parent doc_type (doc_type:p1) and child > > name > > > > (chld_name:ABC) I get the expected result (1 docs returned) using > > query: > > > > > > > > q={!parent which=doc_type:p1}chld_name:DEF > > > > > > > > ^^Debug query of getting expected 1 doc back (docs in range is 2 to 3 > > but > > > > yet the original problematic query has 0 to 3 whatever that means): > > > > "debug":{ > > > > "rawquerystring":"{!parent which=doc_type:p1}chld_name:ABC", > > > > "querystring":"{!parent which=doc_type:p1}chld_name:ABC", > > > > "parsedquery":"AllParentsAware(ToParentBlockJoinQuery > > > > (+chld_name:abc))", > > > > "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)", > > > > "explain":{ > > > > "/t2/1/":"\n0.0 = Score based on 2 child docs in range from 2 > to > > 3, > > > > best match:\n 0.0 = ConstantScore(chld_name:abc)^0.0\n"}, > > > > "QParser":"BlockJoinParentQParser", > > > > ... > > > > } > > > > > > > > > > > > ----------------------------------- > > > > I have a 'work around' which seems to do the trick but it feels hacky > > > and I > > > > wonder if having to qualify the child docs more will affect query > > > > performance. If I further qualify the child doc using a field that > > > doesn't > > > > exist in the other child docs I get the expected (0 matches) result > > with > > > > query: > > > > > > > > q={!parent which=doc_type:t2}(chld_name:ABC AND chld_t2_id:*) > > > > > > > > > > > > ----------------------------------- > > > > What's also interesting is that if I remove the child doc > > > > {"doc_id":"/p1/chld/1/","chld_name":"ABC"} of parent > > > > {"doc_id":"/p1/1/","doc_type":"p1"} out of the index so that my > > > collection > > > > has: > > > > > > > > [ > > > > { > > > > "_root_": "/t2/1/", > > > > "doc_id": "/t2/1/", > > > > "doc_type": "t2", > > > > "t2_id":1, > > > > "chldrn": [ > > > > { > > > > "_root_": "/t2/1/", > > > > "_nest_path_": "/chldrn#1", > > > > "doc_id": "/t2/chld/1/", > > > > "doc_type": "chld", > > > > "chld_name": "DEF", > > > > "chld_t2_id":1 > > > > } > > > > ] > > > > }, > > > > { > > > > "_root_": "/p1/1/", > > > > "doc_id": "/p1/1/", > > > > "doc_type": "p1", > > > > "p1_id":1, > > > > "chldrn": [ > > > > { > > > > "_root_": "/p1/1/", > > > > "_nest_path_": "/chldrn#2", > > > > "doc_id": "/p1/chld/2/", > > > > "doc_type": "chld", > > > > "chld_name": "DEF", > > > > "chld_p1_id": 1 > > > > } > > > > ] > > > > } > > > > ] > > > > > > > > I get the expected results (no matches found) when I use the query: > > > > > > > > q={!parent which=doc_type:t2}chld_name:ABC > > > > > > > > > > > > ----------------------------------- > > > > Other Notes: > > > > > > > > - I've blown away recreated the index multiple times (always using > DIH > > to > > > > re-import that data) which should rule out an anomaly with index > > > > linking/block merge. > > > > - Solrcloud mode is not being used. > > > > - I have <uniqueKey>doc_id</uniqueKey> in managed-schema and have no > > docs > > > > with duplicate doc_id in the index (sample config below). > > > > - I have _root_ as indexed only (changed it to stored=true for > > debugging > > > > but the issue remains). > > > > - We use the DIH (data import handler) to import the data (sample > > config > > > > below). > > > > - The 't2' doc_type appears as first entity in the DIH so I *think* > its > > > the > > > > doc that gets indexed first during the DIH full import (may be > relevent > > > in > > > > identifying a bug with block join/indexing?). > > > > > > > > > > > > ----------------------------------- > > > > Relevent entries in managed-schema: > > > > > > > > <uniqueKey>doc_id</uniqueKey> > > > > ... > > > > <fieldType name="nest_path" class="solr.NestPathField" stored="false" > > /> > > > > <fieldType name="lowercase" class="solr.TextField" > > > > positionIncrementGap="100"> > > > > <analyzer> > > > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > > > <filter class="solr.LengthFilterFactory" min="1" max="32766"/> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > </fieldType> > > > > <fieldType name="plong" class="solr.LongPointField" docValues="true" > > > > stored="false"/> > > > > <fieldType name="string" class="solr.StrField" sortMissingLast="true" > > > > docValues="true" stored="false"/> > > > > ... > > > > <field name="_root_" type="string" docValues="false"/> > > > > <field name="_nest_path_" type="nest_path"/> > > > > <field name="_version_" type="plong" indexed="false"/> > > > > ... > > > > <field name="doc_id" type="string" stored="true" docValues="false"/> > > > > <field name="doc_type" type="string"/> > > > > <field name="chld_name" type="lowercase" stored="true" > > > docValues="false"/> > > > > ... > > > > <dynamicField name="*_id" type="plong"/> > > > > > > > > > > > > ----------------------------------- > > > > Relevent entries in data-config.xml: > > > > > > > > <?xml version="1.0"?> > > > > <dataConfig> > > > > <dataSource name="mariadb" driver="org.mariadb.jdbc.Driver" > > > > batchSize="-1" > > > > > url="jdbc:mysql://host:3306/db?sessionVariables=net_write_timeout=3600" > > > > user="" password="" /> > > > > <document> > > > > <entity dataSource="mariadb" pk="id" name="t2" > > > > deletedPkQuery="select concat('/t2/',`id`,'/') as id from > > > `t2` > > > > where `deleted_at` >= > convert_tz('${dataimporter.last_index_time}', > > > > '+00:00', @@global.time_zone)" > > > > query="select concat('/t2/',`id`,'/') as `doc_id`, 't2' > as > > > > `doc_type`, `id` as `t2_id` where `deleted_at`is null" > > > > deltaImportQuery="select concat('/t2/',`id`,'/') as > > `doc_id`, > > > > 't2' as `doc_type`, `id` as `t2_id` where `deleted_at` is null and > > `id` = > > > > '${dataimporter.delta.id}'" > > > > deltaQuery="select `id` from `t2` where `updated_at` > > > > > convert_tz('${dataimporter.last_index_time}', '+00:00', > > > > @@global.time_zone)"> > > > > <entity name="chldrn" child="true" query="select > > > > concat('/t2/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`, > > > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`, > > `t2_id` > > > as > > > > `chld_t2_id` where `t2_id` = ${t2.t2_id} and `deleted_at` is null" /> > > > > </entity> > > > > <entity dataSource="mariadb" pk="id" name="p1" > > > > deletedPkQuery="select concat('/p1/',`id`,'/') as `id` > from > > > > `p1` where `deleted_at` >= > > > convert_tz('${dataimporter.last_index_time}', > > > > '+00:00', @@global.time_zone)" > > > > query="select concat('/p1/',`id`,'/') as `doc_id`, 'p1' > as > > > > `doc_type`, `id` as `p1_id` where `deleted_at`is null" > > > > deltaImportQuery="select concat('/p1/',`id`,'/') as > > `doc_id`, > > > > 'p1' as `doc_type`, `id` as `p1_id` where `deleted_at` is null and > > `id` = > > > > '${dataimporter.delta.id}'" > > > > deltaQuery="select `id` from `p1` where `updated_at` > > > > > convert_tz('${dataimporter.last_index_time}', '+00:00', > > > > @@global.time_zone)"> > > > > <entity name="chldrn" child="true" query="select > > > > concat('/p1/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`, > > > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`, > > `p1_id` > > > as > > > > `chld_p1_id` where `p1_id` = ${p1.p1_id} and `deleted_at` is null" /> > > > > </entity> > > > > </document> > > > > </dataConfig> > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > > -- > > http://www.needhamsoftware.com (work) > > http://www.the111shift.com (play) > > > > > -- > Sincerely yours > Mikhail Khludnev > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)