[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842555#comment-13842555
]
Steve Rowe commented on SOLR-1301:
----------------------------------
The Maven Jenkins build on trunk has been failing for a while because
{{com.sun.jersey:jersey-bundle:1.8}}, a morphlines-core dependency, causes
{{ant validate-maven-dependencies}} to fail - here's a log excerpt from the
most recent failure
[https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1046/console]:
{noformat}
[echo] Building solr-map-reduce...
-validate-maven-dependencies.init:
-validate-maven-dependencies:
[artifact:dependencies] [INFO] snapshot org.apache.solr:solr-cell:5.0-SNAPSHOT:
checking for updates from maven-restlet
[artifact:dependencies] [INFO] snapshot org.apache.solr:solr-cell:5.0-SNAPSHOT:
checking for updates from releases.cloudera.com
[artifact:dependencies] [INFO] snapshot
org.apache.solr:solr-morphlines-cell:5.0-SNAPSHOT: checking for updates from
maven-restlet
[artifact:dependencies] [INFO] snapshot
org.apache.solr:solr-morphlines-cell:5.0-SNAPSHOT: checking for updates from
releases.cloudera.com
[artifact:dependencies] [INFO] snapshot
org.apache.solr:solr-morphlines-core:5.0-SNAPSHOT: checking for updates from
maven-restlet
[artifact:dependencies] [INFO] snapshot
org.apache.solr:solr-morphlines-core:5.0-SNAPSHOT: checking for updates from
releases.cloudera.com
[artifact:dependencies] An error has occurred while processing the Maven
artifact tasks.
[artifact:dependencies] Diagnosis:
[artifact:dependencies]
[artifact:dependencies] Unable to resolve artifact: Unable to get dependency
information: Unable to read the metadata file for artifact
'com.sun.jersey:jersey-bundle:jar': Cannot find parent:
com.sun.jersey:jersey-project for project: null:jersey-bundle:jar:null for
project null:jersey-bundle:jar:null
[artifact:dependencies] com.sun.jersey:jersey-bundle:jar:1.8
[artifact:dependencies]
[artifact:dependencies] from the specified remote repositories:
[artifact:dependencies] central (http://repo1.maven.org/maven2),
[artifact:dependencies] releases.cloudera.com
(https://repository.cloudera.com/artifactory/libs-release),
[artifact:dependencies] maven-restlet (http://maven.restlet.org),
[artifact:dependencies] Nexus (http://repository.apache.org/snapshots)
[artifact:dependencies]
[artifact:dependencies] Path to dependency:
[artifact:dependencies] 1)
org.apache.solr:solr-map-reduce:jar:5.0-SNAPSHOT
[artifact:dependencies]
[artifact:dependencies]
[artifact:dependencies] Not a v4.0.0 POM. for project
com.sun.jersey:jersey-project at
/home/hudson/.m2/repository/com/sun/jersey/jersey-project/1.8/jersey-project-1.8.pom
{noformat}
I couldn't reproduce locally.
Turns out the parent POM in question, at
{{/home/hudson/.m2/repository/com/sun/jersey/jersey-project/1.8/jersey-project-1.8.pom}},
has the wrong contents:
{noformat}
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/0.6.39</center>
</body>
</html>
{noformat}
I replaced this by manually downloading the correct POM and it's checksum file
from Maven Central and putting them in the hudson user's local Maven repository.
[[email protected]]: While investigating this failure, I tried dropping
the triggering Ivy dependency com.sun.jersey:jersey-bundle, and all enabled
tests succeed. Okay with you to drop this dependency? The description from
the POM says:
{code:xml}
<description>
A bundle containing code of all jar-based modules that provide JAX-RS and
Jersey-related features. Such a bundle is *only intended* for developers that
do not use Maven's dependency system. The bundle does not include code for
contributes, tests and samples.
</description>
{code}
Sounds like it's a sneaky replacement for transitive dependencies? IMHO, if we
need some of the classes this jar provides, we should declare direct
dependencies on the appropriate artifacts.
> Add a Solr contrib that allows for building Solr indexes via Hadoop's
> Map-Reduce.
> ---------------------------------------------------------------------------------
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
> Issue Type: New Feature
> Reporter: Andrzej Bialecki
> Assignee: Mark Miller
> Fix For: 5.0, 4.7
>
> Attachments: README.txt, SOLR-1301-hadoop-0-20.patch,
> SOLR-1301-hadoop-0-20.patch, SOLR-1301-maven-intellij.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
> SOLR-1301.patch, SolrRecordWriter.java, commons-logging-1.0.4.jar,
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
> hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch,
> log4j-1.2.15.jar
>
>
> This patch contains a contrib module that provides distributed indexing
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS.
> SolrOutputFormat consumes data produced by reduce tasks directly, without
> storing it in intermediate files. Furthermore, by using an
> EmbeddedSolrServer, the indexing task is split into as many parts as there
> are reducers, and the data to be indexed is not sent over the network.
> Design
> ----------
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
> instantiates an EmbeddedSolrServer, and it also instantiates an
> implementation of SolrDocumentConverter, which is responsible for turning
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce
> task completes, and the OutputFormat is closed, SolrRecordWriter calls
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories
> as there were reduce tasks. The output shards are placed in the output
> directory on the default filesystem (e.g. HDFS). Such part-NNNNN directories
> can be used to run N shard servers. Additionally, users can specify the
> number of reduce tasks, in particular 1 reduce task, in which case the output
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor
> and approved for release under Apache License.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]