Of course it's possible. Block mask ("Not Children") just has to include
*:* -_node_path_:*   (all documents without a _nest_path_) to catch
non-block join docs as well as a filter for the levels in the hierarchy
that are to be considered "parents" (not children)

If you don't define _node_path_ then you have to plan even more carefully
and supply some other custom way to distinguish... which may be hard, so
I'd definitely recommend using _node_path_ unless you have a very good
reason not to.

It has to be possible since even in a parent/child data set there could be
parents with no children?

On Fri, Apr 29, 2022 at 1:21 PM Mikhail Khludnev <m...@apache.org> wrote:

> Hello, Gus.
>
> On Fri, Apr 29, 2022 at 6:55 PM Gus Heck <gus.h...@gmail.com> wrote:
>
> >  Also if you have an index with a mixture of
> > hierarchical documents and other non block/join docs.
> >
> Such mix is not an option, it should be declared somewhere. Any standalone
> docs should be marked as parent.
>
>
> >
> > On Fri, Apr 29, 2022 at 8:57 AM Mikhail Khludnev <m...@apache.org>
> wrote:
> >
> > > Hello, James.
> > >
> > > Excuse me if I didn't fully get all points of your inquiry.
> > > As I grasped the challenge. One can not filter/select certain parents
> > > (types) with `which` param, because block join is a plain nextBitSet()
> > over
> > > dense ordinals.
> > > So, parents bitset should include all parents - disjunct all parent
> > types,
> > > and then, a parent level filter should select a certain parent type.
> > > q={!parent which=$dads}chld_name:ABC&dads=doc_type:(t2
> p2)&fq=doc_type:t2
> > > It should be explained somewhere around
> > > https://solr.apache.org/guide/8_8/other-parsers.html#block-mask pls
> let
> > me
> > > know if we can add some more caveats there covering your case.
> > >
> > > Have a good join!
> > >
> > > On Thu, Apr 28, 2022 at 5:43 PM James Greene <
> > ja...@jamesaustingreene.com>
> > > wrote:
> > >
> > > > My team is in the process of moving from solr 6.6 to 8.11.1 and have
> > > > noticed some weirdness (wrong parent docs in result) when using the
> > > > {!parent blockjoin query parser.  We have multiple 'root' entities
> > > > configured in DIH and i'm wondering if this could be a causation or
> if
> > > > there is a bug at play with the blockjoin.  Any more info on how to
> > > > diagnose the issue is appreciated!
> > > >
> > > > -----------------------------------
> > > > Example data:
> > > >
> > > > [
> > > >     {
> > > >         "_root_": "/t2/1/",
> > > >         "doc_id": "/t2/1/",
> > > >         "doc_type": "t2",
> > > >         "t2_id":1,
> > > >         "chldrn": [
> > > >             {
> > > >                 "_root_": "/t2/1/",
> > > >                 "_nest_path_": "/chldrn#1",
> > > >                 "doc_id": "/t2/chld/1/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "DEF",
> > > >                 "chld_t2_id":1
> > > >             }
> > > >         ]
> > > >     },
> > > >     {
> > > >         "_root_": "/p1/1/",
> > > >         "doc_id": "/p1/1/",
> > > >         "doc_type": "p1",
> > > >         "p1_id":1,
> > > >         "chldrn": [
> > > >             {
> > > >                 "_root_": "/p1/1/",
> > > >                 "_nest_path_": "/chldrn#1",
> > > >                 "doc_id": "/p1/chld/1/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "ABC",
> > > >                 "chld_p1_id":1
> > > >             },
> > > >             {
> > > >                 "_root_": "/p1/1/",
> > > >                 "_nest_path_": "/chldrn#2",
> > > >                 "doc_id": "/p1/chld/2/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "DEF",
> > > >                 "chld_p1_id": 1
> > > >             }
> > > >         ]
> > > >     }
> > > > ]
> > > >
> > > >
> > > > -----------------------------------
> > > > Queries giving the wrong result:
> > > >
> > > > q={!parent which=doc_type:t2}chld_name:ABC
> > > >
> > > > q={!parent which=doc_type:t2}(doc_type:chld AND chld_name:ABC)
> > > >
> > > > q={!parent which=doc_type:t2 v=$qq}chld_name:ABC
> > > > ?qq=doc_type:chld
> > > >
> > > >
> > > > -----------------------------------
> > > > I found an old thread talking about child docs shouldn't have the
> same
> > > > field name as parent doc (even with different values) here:
> > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/36602638/solr-returning-incorrect-results-when-filtering-child-docuements
> > > > But I got the same results when trying to filter by childen using a
> > > > different field:
> > > >
> > > > q={!parent which=doc_type:t2}(_nest_path_:/chldrn AND chld_name:ABC)
> > > >
> > > > I would expect there would be no match since the parent (doc_type:t2)
> > > does
> > > > not have a child (chld_name:ABC) but i'm actually getting t2 in the
> > > result:
> > > > [
> > > >     {
> > > >         "_root_": "/t2/1/",
> > > >         "doc_id": "/t2/1/",
> > > >         "doc_type": "t2",
> > > >         "t2_id":1,
> > > >         "chldrn": [
> > > >             {
> > > >                 "_root_": "/t2/1/",
> > > >                 "_nest_path_": "/chldrn#1",
> > > >                 "doc_id": "/t2/chld/1/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "DEF",
> > > >                 "chld_t2_id":1
> > > >             }
> > > >         ]
> > > >     }
> > > > ]
> > > >
> > > > -----------------------------------
> > > > Debug for query returning the wrong document when 0 docs are
> expected:
> > > >
> > > > "debug":{
> > > >     "rawquerystring":"{!parent which=doc_type:t2}chld_name:ABC",
> > > >     "querystring":"{!parent which=doc_type:t2}chld_name:ABC",
> > > >     "parsedquery":"AllParentsAware(ToParentBlockJoinQuery
> > > > (+chld_name:abc))",
> > > >     "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)",
> > > >     "explain":{
> > > >       "/t2/1/":"\n0.0 = Score based on 1 child docs in range from 0
> to
> > 3,
> > > > best match:\n  0.0 = ConstantScore(chld_name:abc)^0.0\n"},
> > > >     "QParser":"BlockJoinParentQParser",
> > > >     ...
> > > > }
> > > >
> > > >
> > > > -----------------------------------
> > > > If I query using a diffrent parent doc_type (doc_type:p1) and child
> > name
> > > > (chld_name:DEF) I get the expected result (0 docs returned) using
> > query:
> > > >
> > > > q={!parent which=doc_type:p1}chld_name:DEF
> > > >
> > > >
> > > > -----------------------------------
> > > > If I query using a diffrent parent doc_type (doc_type:p1) and child
> > name
> > > > (chld_name:ABC) I get the expected result (1 docs returned) using
> > query:
> > > >
> > > > q={!parent which=doc_type:p1}chld_name:DEF
> > > >
> > > > ^^Debug query of getting expected 1 doc back (docs in range is 2 to 3
> > but
> > > > yet the original problematic query has 0 to 3 whatever that means):
> > > > "debug":{
> > > >     "rawquerystring":"{!parent which=doc_type:p1}chld_name:ABC",
> > > >     "querystring":"{!parent which=doc_type:p1}chld_name:ABC",
> > > >     "parsedquery":"AllParentsAware(ToParentBlockJoinQuery
> > > > (+chld_name:abc))",
> > > >     "parsedquery_toString":"ToParentBlockJoinQuery (+chld_name:abc)",
> > > >     "explain":{
> > > >       "/t2/1/":"\n0.0 = Score based on 2 child docs in range from 2
> to
> > 3,
> > > > best match:\n  0.0 = ConstantScore(chld_name:abc)^0.0\n"},
> > > >     "QParser":"BlockJoinParentQParser",
> > > >     ...
> > > > }
> > > >
> > > >
> > > > -----------------------------------
> > > > I have a 'work around' which seems to do the trick but it feels hacky
> > > and I
> > > > wonder if having to qualify the child docs more will affect query
> > > > performance. If I further qualify the child doc using a field that
> > > doesn't
> > > > exist in the other child docs I get the expected (0 matches) result
> > with
> > > > query:
> > > >
> > > > q={!parent which=doc_type:t2}(chld_name:ABC AND chld_t2_id:*)
> > > >
> > > >
> > > > -----------------------------------
> > > > What's also interesting is that if I remove the child doc
> > > > {"doc_id":"/p1/chld/1/","chld_name":"ABC"} of parent
> > > > {"doc_id":"/p1/1/","doc_type":"p1"} out of the index so that my
> > > collection
> > > > has:
> > > >
> > > > [
> > > >     {
> > > >         "_root_": "/t2/1/",
> > > >         "doc_id": "/t2/1/",
> > > >         "doc_type": "t2",
> > > >         "t2_id":1,
> > > >         "chldrn": [
> > > >             {
> > > >                 "_root_": "/t2/1/",
> > > >                 "_nest_path_": "/chldrn#1",
> > > >                 "doc_id": "/t2/chld/1/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "DEF",
> > > >                 "chld_t2_id":1
> > > >             }
> > > >         ]
> > > >     },
> > > >     {
> > > >         "_root_": "/p1/1/",
> > > >         "doc_id": "/p1/1/",
> > > >         "doc_type": "p1",
> > > >         "p1_id":1,
> > > >         "chldrn": [
> > > >             {
> > > >                 "_root_": "/p1/1/",
> > > >                 "_nest_path_": "/chldrn#2",
> > > >                 "doc_id": "/p1/chld/2/",
> > > >                 "doc_type": "chld",
> > > >                 "chld_name": "DEF",
> > > >                 "chld_p1_id": 1
> > > >             }
> > > >         ]
> > > >     }
> > > > ]
> > > >
> > > > I get the expected results (no matches found) when I use the query:
> > > >
> > > > q={!parent which=doc_type:t2}chld_name:ABC
> > > >
> > > >
> > > > -----------------------------------
> > > > Other Notes:
> > > >
> > > > - I've blown away recreated the index multiple times (always using
> DIH
> > to
> > > > re-import that data) which should rule out an anomaly with index
> > > > linking/block merge.
> > > > - Solrcloud mode is not being used.
> > > > - I have <uniqueKey>doc_id</uniqueKey> in managed-schema and have no
> > docs
> > > > with duplicate doc_id in the index (sample config below).
> > > > - I have _root_ as indexed only (changed it to stored=true for
> > debugging
> > > > but the issue remains).
> > > > - We use the DIH (data import handler) to import the data (sample
> > config
> > > > below).
> > > > - The 't2' doc_type appears as first entity in the DIH so I *think*
> its
> > > the
> > > > doc that gets indexed first during the DIH full import (may be
> relevent
> > > in
> > > > identifying a bug with block join/indexing?).
> > > >
> > > >
> > > > -----------------------------------
> > > > Relevent entries in managed-schema:
> > > >
> > > > <uniqueKey>doc_id</uniqueKey>
> > > > ...
> > > > <fieldType name="nest_path" class="solr.NestPathField" stored="false"
> > />
> > > > <fieldType name="lowercase" class="solr.TextField"
> > > > positionIncrementGap="100">
> > > >     <analyzer>
> > > >     <tokenizer class="solr.KeywordTokenizerFactory"/>
> > > >     <filter class="solr.LengthFilterFactory" min="1" max="32766"/>
> > > >     <filter class="solr.LowerCaseFilterFactory"/>
> > > >     </analyzer>
> > > > </fieldType>
> > > > <fieldType name="plong" class="solr.LongPointField" docValues="true"
> > > > stored="false"/>
> > > > <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> > > > docValues="true" stored="false"/>
> > > > ...
> > > > <field name="_root_" type="string" docValues="false"/>
> > > > <field name="_nest_path_" type="nest_path"/>
> > > > <field name="_version_" type="plong" indexed="false"/>
> > > > ...
> > > > <field name="doc_id" type="string" stored="true" docValues="false"/>
> > > > <field name="doc_type" type="string"/>
> > > > <field name="chld_name" type="lowercase" stored="true"
> > > docValues="false"/>
> > > > ...
> > > > <dynamicField name="*_id" type="plong"/>
> > > >
> > > >
> > > > -----------------------------------
> > > > Relevent entries in data-config.xml:
> > > >
> > > > <?xml version="1.0"?>
> > > > <dataConfig>
> > > >     <dataSource name="mariadb" driver="org.mariadb.jdbc.Driver"
> > > > batchSize="-1"
> > > >
> url="jdbc:mysql://host:3306/db?sessionVariables=net_write_timeout=3600"
> > > > user="" password="" />
> > > >     <document>
> > > >         <entity dataSource="mariadb" pk="id" name="t2"
> > > >             deletedPkQuery="select concat('/t2/',`id`,'/') as id from
> > > `t2`
> > > > where `deleted_at` &gt;=
> convert_tz('${dataimporter.last_index_time}',
> > > > '+00:00', @@global.time_zone)"
> > > >             query="select concat('/t2/',`id`,'/') as `doc_id`, 't2'
> as
> > > > `doc_type`, `id` as `t2_id` where `deleted_at`is null"
> > > >             deltaImportQuery="select concat('/t2/',`id`,'/') as
> > `doc_id`,
> > > > 't2' as `doc_type`, `id` as `t2_id` where `deleted_at` is null and
> > `id` =
> > > > '${dataimporter.delta.id}'"
> > > >             deltaQuery="select `id` from `t2` where `updated_at` &gt;
> > > > convert_tz('${dataimporter.last_index_time}', '+00:00',
> > > > @@global.time_zone)">
> > > >                 <entity name="chldrn" child="true" query="select
> > > > concat('/t2/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`,
> > > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`,
> > `t2_id`
> > > as
> > > > `chld_t2_id` where `t2_id` = ${t2.t2_id} and `deleted_at` is null" />
> > > >         </entity>
> > > >         <entity dataSource="mariadb" pk="id" name="p1"
> > > >             deletedPkQuery="select concat('/p1/',`id`,'/') as `id`
> from
> > > > `p1` where `deleted_at` &gt;=
> > > convert_tz('${dataimporter.last_index_time}',
> > > > '+00:00', @@global.time_zone)"
> > > >             query="select concat('/p1/',`id`,'/') as `doc_id`, 'p1'
> as
> > > > `doc_type`, `id` as `p1_id` where `deleted_at`is null"
> > > >             deltaImportQuery="select concat('/p1/',`id`,'/') as
> > `doc_id`,
> > > > 'p1' as `doc_type`, `id` as `p1_id` where `deleted_at` is null and
> > `id` =
> > > > '${dataimporter.delta.id}'"
> > > >             deltaQuery="select `id` from `p1` where `updated_at` &gt;
> > > > convert_tz('${dataimporter.last_index_time}', '+00:00',
> > > > @@global.time_zone)">
> > > >                 <entity name="chldrn" child="true" query="select
> > > > concat('/p1/chld/',`id`,'/') as `doc_id`, 'chld' as `doc_type`,
> > > > concat('/chldrn#',`id`) as `_nest_path_`, `name` as `chld_name`,
> > `p1_id`
> > > as
> > > > `chld_p1_id` where `p1_id` = ${p1.p1_id} and `deleted_at` is null" />
> > > >     </entity>
> > > >     </document>
> > > > </dataConfig>
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > http://www.the111shift.com (play)
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to