[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937732#comment-13937732
]
Alexander S. commented on SOLR-4787:
------------------------------------
Thank you, Kranti Parisa, I am far from java development, how can I apply this
patch and build solr for linux? I tried to patch, it creates a new folder
"joins" in solr/contrib, installed ivy and launched "ant compile" but got this
error:
{quote}
common.compile-core:
[mkdir] Created dir:
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] Compiling 3 source files to
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with
-source 1.6
[javac]
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883:
error: reached end of file while parsing
[javac] return this.delegate.acceptsDocsOutOfOrder();
[javac] ^
[javac]
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884:
error: reached end of file while parsing
[javac] 2 errors
[javac] 1 warning
BUILD FAILED
/home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred
while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following
error occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed;
see the compiler error output for details.
Total time: 8 minutes 55 seconds
{quote}
> Join Contrib
> ------------
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 4.2.1
> Reporter: Joel Bernstein
> Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-4787-deadlock-fix.patch,
> SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch,
> SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch,
> SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be
> contributed to Solr. This contrib currently includes 3 join implementations.
> The initial patch was generated from the Solr 4.3 tag. Because of changes in
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core
> based on the results of a search in another core. This is similar in
> functionality to the JoinQParserPlugin but the implementation differs in a
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join
> keys only. So, in order to use hjoin, int or long join keys must be included
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are
> used to quickly connect the join keys. So, the hjoin will need more memory
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys
> between cores and provide sub-second response time. The hjoin should work
> well with up to two million results from the fromIndex and tens of millions
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will
> turn on the PostFilter. The PostFilter will typically outperform the Lucene
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the
> filter with threads. This can greatly improve the performance of the query if
> the main query index is very large. The "threads" parameter turns on
> threading. For example *threads=6* will use 6 threads to build the filter.
> This will setup a fixed threadpool with six threads to handle all hjoin
> requests. Once the threadpool is created the hjoin will always use it to
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the
> hashset used to perform the join. If this is set above the number of results
> from the fromIndex then the you can avoid hashset resizing which improves
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a
> filter query within the join. The nested fq will filter the results of the
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache
> and queryResultCache should work properly even with deep nesting of joins.
> Only the queryResultCache comes into play with the PostFilter implementation
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for
> "user:customer1" applying the local fq parameter to filter the results. The
> lucene filter query will be built using 6 threads. This query will generate a
> list of values from the "from" field that will be used to filter the main
> query. Only records from the main query, where the "to" field is present in
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the
> hjoin.
> <queryParser name="hjoin"
> class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib lib jars must be registed in the solrconfig.xml.
> <lib dir="../../../contrib/joins/lib" regex=".*\.jar" />
> After issuing the "ant dist" command from inside the solr directory the joins
> contrib jar will appear in the solr/dist directory. Place the the
> solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr webapplication.
> This will ensure that the top level Solr classloader loads these classes
> rather then the core's classloaded.
> *BitSetJoinQParserPlugin aka bjoin*
> The bjoin behaves exactly like the hjoin but uses a BitSet instead of a
> HashSet to perform the underlying join. Because of this the bjoin is much
> faster and can provide sub-second response times on result sets of tens of
> millions of records from the fromIndex and hundreds of millions of records
> from the main query.
> But there are limitations to how the bjoin can be used. The bjoin treats the
> join keys as addresses in a BitSet and uses the Lucene OpenBitSet
> implementation which performs very well but is not sparse. So the BitSet
> memory is dictated by the size of the join keys. For example a bitset with a
> max join key of 200,000,000 will need 25 MB of memory. For this reason the
> BitSet join does not support long join keys. In order to keep memory usage
> down the join keys should also be packed at the low end, for example from 1
> to 50,000,000.
> Below is a sampe bjoin:
> fq=\{!bjoin fromIndex=collection2 from=id_i to=id_i threads=6
> fq=$qq\}user:customer1&qq=group:5
> To register the bjoin the solrconfig.xml in the main query core must contain
> the reference to the bjoin.
> <queryParser name="bjoin"
> class="org.apache.solr.joins.BitSetJoinQParserPlugin"/>
> *ValueSourceJoinParserPlugin aka vjoin*
> The second implementation is the ValueSourceJoinParserPlugin aka "vjoin".
> This implements a ValueSource function query that can return a value from a
> second core based on join keys and limiting query. The limiting query can be
> used to select a specific subset of data from the join core. This allows
> customer specific relevance data to be stored in a separate core and then
> joined in the main query.
> The vjoin is called using the "vjoin" function query. For example:
> bf=vjoin(fromCore, fromKey, fromVal, toKey, query)
> This example shows "vjoin" being called by the edismax boost function
> parameter. This example will return the "fromVal" from the "fromCore". The
> "fromKey" and "toKey" are used to link the records from the main query to the
> records in the "fromCore". The "query" is used to select a specific set of
> records to join with in fromCore.
> Currently the fromKey and toKey must be longs but this will change in future
> versions. Like the pjoin, the "join" SolrCache is used to hold the join
> memory structures.
> To configure the vjoin you must register the ValueSource plugin in the
> solrconfig.xml as follows:
> <valueSourceParser name="vjoin"
> class="org.apache.solr.joins.ValueSourceJoinParserPlugin" />
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]