I have this issue building with spark-1.5 and hadoop-2.6.
This comes from clashing fasterxml.jacskson dependencies - the one
provided in the spark assembly and the one provided by
com.amazonaws:aws-java-sdk-s3
The fix is to exclude fasterxml in the zeppelin-zengine pom.xml or to
manually remove the jar (rm zeppelin-server/target/lib/jackson-* and rm
zeppelin-zengine/target/lib/jackson-*).
If you do this, I am not sure the sync to S3 will still work (didn't test).
I we want to PR this, we first need to validate S3 sync is still working.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.10.1</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
On 22/11/15 02:55, Hyung Sung Shim wrote:
Hello.
If you use cdh for hadoop can you try build order like 'mvn clean
package -Pvendor-repo -DskipTests -Pspark-1.5 -Dspark.version=1.5.2
-Dhadoop.version=2.6.0-mr1-cdh5.4.8' ?
I hope this is help.
2015-11-22 6:55 GMT+09:00 Timur Shenkao <[email protected]
<mailto:[email protected]>>:
Hi!
I use CentOS 6.7 + Spark 1.5.2 Standalone + Cloudera Hadoop 5.4.8 on
the same cluster. I can't use Mesos or Spark on YARN.
I decided to try Zeppelin. I tried to use binaries, to build from
sources with different parameters.
At last, I built version 0.6.0 so:
mvn clean package –DskipTests -Pspark-1.5 -Phadoop-2.6 -Pyarn
-Ppyspark -Pbuild-distr
But constantly get the error:
com.fasterxml.jackson.databind.JsonMappingException: Could not find
creator property with name 'id' (in class
org.apache.spark.rdd.RDDOperationScope) at [Source:
{"id":"0","name":"parallelize"}; line: 1, column: 1] at
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at
com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at
com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at
com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at
com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at
com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at
com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at
com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at
org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603) at
org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1603) at
scala.Option.map(Option.scala:145) at
org.apache.spark.rdd.RDD.<init>(RDD.scala:1603) at
org.apache.spark.rdd.ParallelCollectionRDD.<init>(ParallelCollectionRDD.scala:85)
at
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:725)
at
org.apache.spark.SparkContext$$anonfun$parallelize$1.apply(SparkContext.scala:723)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.parallelize(SparkContext.scala:723)
at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) at
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42) at
$iwC$$iwC$$iwC$$iwC$$i
...
and so on.
My code is:
%spark
import org.apache.spark.sql._
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
case class Contact(name: String, phone: String)
case class Person(name: String, age: Int, contacts: Seq[Contact])
val records = (1 to 100).map { i =>;
Person(s"name_$i", i, (0 to 1).map { m => Contact(s"contact_$m",
s"phone_$m") })
}
Then, it fails after the following line:
sc.parallelize(records).toDF().write.format("orc").save("people")
In spark-shell, this code works perfectly, so problem is in Zeppelin.
By the way, your own tutorial gives the same error:
// load bank data
val bankText = sc.parallelize(
IOUtils.toString(
new
URL("https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv"),
Charset.forName("utf8")).split("\n"))
case class Bank(age: Integer, job: String, marital: String,
education: String, balance: Integer)
val bank = bankText.map(s => s.split(";")).filter(s => s(0) !=
"\"age\"").map(
s => Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
).toDF()
bank.registerTempTable("bank")
How to fix it? Change some dependency in pom.xml?
--
본문 이미지 1
(주)엔에프랩 | 콘텐츠서비스팀 | 팀장 심형성
*E. hsshim*@nflabs.com <mailto:[email protected]>
*T.*02-3458-9650 *M. *010-4282-1230
*A.* 서울특별시 강남구 논현동 216-2 하림빌딩 2층 NFLABS