Re: ToChildBlockJoinQuery question

Michael Sokolov Thu, 22 Jan 2015 12:33:41 -0800

Great! Thanks for letting us know

-Mike


On 1/22/15 2:07 PM, McKinley, James T wrote:

Hi Mike,

I guess given the difficulty I've had getting the block join query to work it 
didn't occur to me to try and combine it in a BooleanQuery. :P   Using the BJQ 
in a BooleanQuery with other TermQuerys works fine and does exactly what I 
wanted!  Thanks very much for your help!

Jim
________________________________________
From: Michael Sokolov [msoko...@safaribooksonline.com]
Sent: Thursday, January 22, 2015 11:45 AM
To: java-user@lucene.apache.org
Subject: Re: ToChildBlockJoinQuery question

I think the idea is that you create a blockjoinquery that encapsulates
the join relation, and then you can create additional constraints in the
result document space. In the case of ToChildBJQ, the result documents
are child documents, so any additional query constraints will be applied
to child documents.  For example, you could create the

ToChildBlockJoinQuery bjq = jamesBJQ();
TermQuery tq = new TermQuery (new Term("title", "doctor"));
BooleanQuery bq = new BooleanQuery (bjq, tq);

bq would then match books with parent (ie author) restrictions defined
in jamesBJQ(), and child (ie book) restrictions defined by other queries
like tq (title:doctor)

-Mike

On 1/22/15 11:27 AM, McKinley, James T wrote:

Hi Greg,

Thanks describing how block join queries were intended to work.  Your 
description makes sense to me, however according to the API docs:

http://lucene.apache.org/core/4_8_0/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html

and particularly the naming of the parameters I don't think the API actually 
works as you described:

       ToChildBlockJoinQuery(Query parentQuery, Filter parentsFilter, boolean 
doScores)

If the filter was intended to filter the child docs I think it would be called 
childFilter no?

I think the use of the CachingWrappingFilter in the example I got from Mike 
McCandless' blog post was the real cause of the exception I was seeing (maybe 
things have changed internally since that post).  I finally noticed a mention 
of the FixedBitSetCachingWrapperFilter in the description of the 
ToChildBlockJoinQuery constructor in the API docs.  When I changed to using a 
filter produced by the FixedBitSetCachingWrapperFilter class the 
IllegalStateException no longer occurs and I get the child docs using 
ToChildBlockJoinQuery with a parent doc filter and parent doc query and results 
look correctly limited by the parent constraints.  For example:

...
Gub-Gub's Book: An Encyclopedia of Food (Fictional work), Fictional work, 
119320101
       by: Lofting, Hugh - NP, American, Writer

The Story of Doctor Dolittle, Being the History of His Peculiar Life at Home 
and Astonishing Adventures in Foreign Parts (Novel), Novel, 119200101
       by: Lofting, Hugh - NP, American, Writer

The Voyages of Doctor Dolittle (Novel), Novel, 119220101
       by: Lofting, Hugh - NP, American, Writer

The Story of Doctor Dolittle (Novel), Novel, 119200101
       by: Lofting, Hugh - NP, American, Writer

...
Mister Beers (Poem), Poem, null
       by: Lofting, Hugh - NP, American, Writer

The Twilight of Magic (Novel), Novel, 119300101
       by: Lofting, Hugh - NP, American, Writer

Picnic (Lofting, Hugh) (Poem), Poem, null
       by: Lofting, Hugh - NP, American, Writer

The Impossible Patriotism Project (Picture story), Picture story, 120070101

A Skeleton in God's Closet: A Novel (Novel), Novel, 119940101
       by: Maier, Paul Luther - NP, American, null

Pontius Pilate (Novel), Novel, 119680101
       by: Maier, Paul Luther - NP, American, null

...
Josephus: The Essential Writings (Collection), Collection, 119880101
       by: Maier, Paul Luther - NP, American, null

She Said the Geese (Poem), Poem, null
       by: Lifshin, Lyn - NP, American, Poet

She Said She Could See Music (Poem), Poem, null
       by: Lifshin, Lyn - NP, American, Poet
...

However I see no way to further limit the children as you describe.  If I use "a 
query that matches the set of parents and a filter that matches the set of children" 
as you suggest I get no results back.  I think your description of how it should work 
makes complete sense, but that is not what I'm seeing when I try it.  Here's the code 
that produced the above output:

       private void runToChildBlockJoinQuery(String indexPath) throws 
IOException {
               FSDirectory dir = FSDirectory.open(new File(indexPath));
               IndexReader reader = DirectoryReader.open(dir);
               IndexSearcher searcher = new IndexSearcher(reader);

               TermQuery parentFilterQuery = new TermQuery(new Term("AGTY", 
"np"));
               BooleanQuery parentQuery = new BooleanQuery();
               parentQuery.add(new TermQuery(new Term("AGTY", "np")), 
Occur.MUST);
               parentQuery.add(new TermQuery(new Term("NT", "american")), 
Occur.MUST);

               Filter parentFilter = new FixedBitSetCachingWrapperFilter(new 
QueryWrapperFilter(parentFilterQuery));

               ToChildBlockJoinQuery tcbjq = new 
ToChildBlockJoinQuery(parentQuery, parentFilter, true);

               TopDocs worksDocs = searcher.search(tcbjq, 5000);

               System.out.println("\n*ToChildBlockJoinQuery hit count = " + 
worksDocs.scoreDocs.length);
               displayWorks(reader, searcher, worksDocs);
       }

       private void displayWorks(IndexReader reader, IndexSearcher searcher, 
TopDocs worksDocs) throws IOException {
               for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
                       String agdn = 
reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
                       String tw = 
reader.document(worksDocs.scoreDocs[i].doc).get("TW");
                       String pd = 
reader.document(worksDocs.scoreDocs[i].doc).get("PD");
                       String crid = 
reader.document(worksDocs.scoreDocs[i].doc).get("CRID");
                       TopDocs creatorDocs = searcher.search(new TermQuery(new 
Term("ABID", crid)), Integer.MAX_VALUE);
                       System.out.println("\n" + agdn + ", " + tw + ", " + pd);
                       displayCreators(reader, searcher, creatorDocs);
               }
       }

       private void displayCreators(IndexReader reader, IndexSearcher searcher, 
TopDocs worksDocs) throws IOException {
               for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
                       String agdn = 
reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
                       String agty = 
reader.document(worksDocs.scoreDocs[i].doc).get("AGTY");
                       String nt = 
reader.document(worksDocs.scoreDocs[i].doc).get("NT");
                       String poc = 
reader.document(worksDocs.scoreDocs[i].doc).get("POC");
                       System.out.println("\tby: " + agdn + " - " + agty + ", " +nt + 
", " + poc);
               }
       }

When I try to use ToParentBlockJoinQuery I don't get any results either and it 
is not what I really want anyway, I want the child documents limited by the 
parent documents.

ToChildBlockJoinQuery almost gives me what I want, but I really need to be able 
to filter the child docs returned as well as the parent from which they came.  
If you (or anybody) still thinks I'm doing it wrong please let me know.  If I 
should file a bug report also let me know that, I have a small index I can 
provide if it is useful.  Thanks again for your help.

Jim

________________________________________
From: Gregory Dearing [gregdear...@gmail.com]
Sent: Wednesday, January 21, 2015 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: ToChildBlockJoinQuery question

Jim,

I think you hit the nail on the head... that's not what BlockJoinQueries do.

If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.

If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.

When you add related documents to the index (via addDocuments), make that
children are added before their parents.

The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own).  You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.

Also, you will always get an exception if your query and your filter both
match the same document.  A child can't be its own parent.

BlockJoin is a very powerful feature, but what it's really doing is
modelling relationships using an index that doesn't know what a
relationship is.  The relationships are determined by a combination of the
order that you indexed the block, and the format of your query.  This
disjoin can lead to some weird behavior if you're not absolutely sure how
it works.

Thanks,
Greg





On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
james.mckin...@cengage.com> wrote:

Am I understanding how this is supposed to work?  What I think I am (and
should be) doing is providing a query and filter that specifies the parent
docs and the ToChildBlockJoinQuery should return me all the child docs for
the resulting parent docs.  Is this correct?  The reason I think I'm not
understanding is that I don't see why I need both a filter and a query to
specify the parent docs when a single query or filter should suffice.  Am I
misunderstanding what parentQuery and parentFilter mean, they both refer to
parent docs right?

Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: ToChildBlockJoinQuery question

Reply via email to