[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920402#action_12920402
]
Dhruv Bansal commented on SOLR-1301:
------------------------------------
I am unable to compile SOLR 1.4.1 after patching with the latest (2010-09-20
04:40 AM) SOLR-1301.patch.
{code:borderStyle=solid}
$ wget
http://mirror.cloudera.com/apache//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
...
$ tar -xzf apache-solr-1.4.1.tgz
$ cd apache-solr-1.4.1/contrib
apache-solr-1.4.1/contrib$ wget
https://issues.apache.org/jira/secure/attachment/12455023/SOLR-1301.patch
apache-solr-1.4.1/contrib$ patch -p2 -i SOLR-1301.patch
...
apache-solr-1.4.1/contrib$ mkdir lib
apache-solr-1.4.1/contrib$ cd lib
apache-solr-1.4.1/contrib/lib$ wget .. # download hadoop, log4j,
commons-logging, commons-logging-api jars from top of this page
...
apache-solr-1.4.1/contrib/lib$ cd ../..
apache-solr-1.4.1$ ant dist -k
...
compile:
[javac] Compiling 9 source files to
/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build/classes
Target 'compile' failed with message 'The following error occurred while
executing this line:
/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/common-build.xml:159:
Reference lucene.classpath not found.'.
Cannot execute 'build' - 'compile' failed or was not executed.
Cannot execute 'dist' - 'build' failed or was not executed.
[subant] File
'/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build.xml'
failed with message 'The following error occurred whil\
e executing this line:
[subant]
/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/contrib/hadoop/build.xml:65:
The following error occurred while executing this line:
[subant]
/home/dhruv/projects/infochimps/search/apache-solr-1.4.1/common-build.xml:159:
Reference lucene.classpath not found.'.
....
{code}
Am I following the procedure properly? I'm able to build SOLR just fine out of
the box as well as after applying
[SOLR-1395|https://issues.apache.org/jira/browse/SOLR-1395].
> Solr + Hadoop
> -------------
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
> Issue Type: Improvement
> Affects Versions: 1.4
> Reporter: Andrzej Bialecki
> Fix For: Next
>
> Attachments: commons-logging-1.0.4.jar,
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
> hadoop-0.20.1-core.jar, hadoop.patch, log4j-1.2.15.jar, README.txt,
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java
>
>
> This patch contains a contrib module that provides distributed indexing
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS.
> SolrOutputFormat consumes data produced by reduce tasks directly, without
> storing it in intermediate files. Furthermore, by using an
> EmbeddedSolrServer, the indexing task is split into as many parts as there
> are reducers, and the data to be indexed is not sent over the network.
> Design
> ----------
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
> instantiates an EmbeddedSolrServer, and it also instantiates an
> implementation of SolrDocumentConverter, which is responsible for turning
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce
> task completes, and the OutputFormat is closed, SolrRecordWriter calls
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories
> as there were reduce tasks. The output shards are placed in the output
> directory on the default filesystem (e.g. HDFS). Such part-NNNNN directories
> can be used to run N shard servers. Additionally, users can specify the
> number of reduce tasks, in particular 1 reduce task, in which case the output
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor
> and approved for release under Apache License.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]