Hi Greg,
Thanks describing how block join queries were intended to work. Your
description makes sense to me, however according to the API docs:
http://lucene.apache.org/core/4_8_0/join/org/apache/lucene/search/join/ToChildBlockJoinQuery.html
and particularly the naming of the parameters I don't think the API actually
works as you described:
ToChildBlockJoinQuery(Query parentQuery, Filter parentsFilter, boolean
doScores)
If the filter was intended to filter the child docs I think it would be called
childFilter no?
I think the use of the CachingWrappingFilter in the example I got from Mike
McCandless' blog post was the real cause of the exception I was seeing (maybe
things have changed internally since that post). I finally noticed a mention
of the FixedBitSetCachingWrapperFilter in the description of the
ToChildBlockJoinQuery constructor in the API docs. When I changed to using a
filter produced by the FixedBitSetCachingWrapperFilter class the
IllegalStateException no longer occurs and I get the child docs using
ToChildBlockJoinQuery with a parent doc filter and parent doc query and results
look correctly limited by the parent constraints. For example:
...
Gub-Gub's Book: An Encyclopedia of Food (Fictional work), Fictional work,
119320101
by: Lofting, Hugh - NP, American, Writer
The Story of Doctor Dolittle, Being the History of His Peculiar Life at Home
and Astonishing Adventures in Foreign Parts (Novel), Novel, 119200101
by: Lofting, Hugh - NP, American, Writer
The Voyages of Doctor Dolittle (Novel), Novel, 119220101
by: Lofting, Hugh - NP, American, Writer
The Story of Doctor Dolittle (Novel), Novel, 119200101
by: Lofting, Hugh - NP, American, Writer
...
Mister Beers (Poem), Poem, null
by: Lofting, Hugh - NP, American, Writer
The Twilight of Magic (Novel), Novel, 119300101
by: Lofting, Hugh - NP, American, Writer
Picnic (Lofting, Hugh) (Poem), Poem, null
by: Lofting, Hugh - NP, American, Writer
The Impossible Patriotism Project (Picture story), Picture story, 120070101
A Skeleton in God's Closet: A Novel (Novel), Novel, 119940101
by: Maier, Paul Luther - NP, American, null
Pontius Pilate (Novel), Novel, 119680101
by: Maier, Paul Luther - NP, American, null
...
Josephus: The Essential Writings (Collection), Collection, 119880101
by: Maier, Paul Luther - NP, American, null
She Said the Geese (Poem), Poem, null
by: Lifshin, Lyn - NP, American, Poet
She Said She Could See Music (Poem), Poem, null
by: Lifshin, Lyn - NP, American, Poet
...
However I see no way to further limit the children as you describe. If I use "a
query that matches the set of parents and a filter that matches the set of children"
as you suggest I get no results back. I think your description of how it should work
makes complete sense, but that is not what I'm seeing when I try it. Here's the code
that produced the above output:
private void runToChildBlockJoinQuery(String indexPath) throws
IOException {
FSDirectory dir = FSDirectory.open(new File(indexPath));
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
TermQuery parentFilterQuery = new TermQuery(new Term("AGTY",
"np"));
BooleanQuery parentQuery = new BooleanQuery();
parentQuery.add(new TermQuery(new Term("AGTY", "np")),
Occur.MUST);
parentQuery.add(new TermQuery(new Term("NT", "american")),
Occur.MUST);
Filter parentFilter = new FixedBitSetCachingWrapperFilter(new
QueryWrapperFilter(parentFilterQuery));
ToChildBlockJoinQuery tcbjq = new
ToChildBlockJoinQuery(parentQuery, parentFilter, true);
TopDocs worksDocs = searcher.search(tcbjq, 5000);
System.out.println("\n*ToChildBlockJoinQuery hit count = " +
worksDocs.scoreDocs.length);
displayWorks(reader, searcher, worksDocs);
}
private void displayWorks(IndexReader reader, IndexSearcher searcher,
TopDocs worksDocs) throws IOException {
for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
String agdn =
reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
String tw =
reader.document(worksDocs.scoreDocs[i].doc).get("TW");
String pd =
reader.document(worksDocs.scoreDocs[i].doc).get("PD");
String crid =
reader.document(worksDocs.scoreDocs[i].doc).get("CRID");
TopDocs creatorDocs = searcher.search(new TermQuery(new
Term("ABID", crid)), Integer.MAX_VALUE);
System.out.println("\n" + agdn + ", " + tw + ", " + pd);
displayCreators(reader, searcher, creatorDocs);
}
}
private void displayCreators(IndexReader reader, IndexSearcher searcher,
TopDocs worksDocs) throws IOException {
for (int i = 0; i < worksDocs.scoreDocs.length; i++) {
String agdn =
reader.document(worksDocs.scoreDocs[i].doc).get("AGDN");
String agty =
reader.document(worksDocs.scoreDocs[i].doc).get("AGTY");
String nt =
reader.document(worksDocs.scoreDocs[i].doc).get("NT");
String poc =
reader.document(worksDocs.scoreDocs[i].doc).get("POC");
System.out.println("\tby: " + agdn + " - " + agty + ", " +nt +
", " + poc);
}
}
When I try to use ToParentBlockJoinQuery I don't get any results either and it
is not what I really want anyway, I want the child documents limited by the
parent documents.
ToChildBlockJoinQuery almost gives me what I want, but I really need to be able
to filter the child docs returned as well as the parent from which they came.
If you (or anybody) still thinks I'm doing it wrong please let me know. If I
should file a bug report also let me know that, I have a small index I can
provide if it is useful. Thanks again for your help.
Jim
________________________________________
From: Gregory Dearing [gregdear...@gmail.com]
Sent: Wednesday, January 21, 2015 6:59 PM
To: java-user@lucene.apache.org
Subject: Re: ToChildBlockJoinQuery question
Jim,
I think you hit the nail on the head... that's not what BlockJoinQueries do.
If you're wanting to search for children and join to their parents... then
use ToParentBlockJoinQuery, with a query that matches the set of children
and a filter that matches the set of parents.
If you're searching for parents, then joining to their children... then use
ToChildBlockJoinQuery, with a query that matches the set of parents and a
filter that matches the set of children.
When you add related documents to the index (via addDocuments), make that
children are added before their parents.
The reason all the above is necessary is that it makes it possible to have
a nested hierarchy of relationships (ie. Parents have Children, which have
Children of their own). You need a query to indicate which part of the
hierarchy you're starting from, and a filter indicating which part of the
hierarchy you're joining to.
Also, you will always get an exception if your query and your filter both
match the same document. A child can't be its own parent.
BlockJoin is a very powerful feature, but what it's really doing is
modelling relationships using an index that doesn't know what a
relationship is. The relationships are determined by a combination of the
order that you indexed the block, and the format of your query. This
disjoin can lead to some weird behavior if you're not absolutely sure how
it works.
Thanks,
Greg
On Wed, Jan 21, 2015 at 4:34 PM, McKinley, James T <
james.mckin...@cengage.com> wrote:
Am I understanding how this is supposed to work? What I think I am (and
should be) doing is providing a query and filter that specifies the parent
docs and the ToChildBlockJoinQuery should return me all the child docs for
the resulting parent docs. Is this correct? The reason I think I'm not
understanding is that I don't see why I need both a filter and a query to
specify the parent docs when a single query or filter should suffice. Am I
misunderstanding what parentQuery and parentFilter mean, they both refer to
parent docs right?
Jim
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org