Thank you, Jose. That fixed it. On Wed, Feb 18, 2015 at 6:31 PM, Jose Fernandez <jfernan...@sdl.com> wrote:
> You need to instantiate the server in the forEachPartition block or Spark > will attempt to serialize it to the task. See the design patterns section > in the Spark Streaming guide. > > > Jose Fernandez | Principal Software Developer > jfernan...@sdl.com | > > The information transmitted, including attachments, is intended only for > the person(s) or entity to which it is addressed and may contain > confidential and/or privileged material. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon > this information by persons or entities other than the intended recipient > is prohibited. If you received this in error, please contact the sender and > destroy any copies of this information. > > > Jose Fernandez | Principal Software Developer > jfernan...@sdl.com | > > The information transmitted, including attachments, is intended only for > the person(s) or entity to which it is addressed and may contain > confidential and/or privileged material. Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon > this information by persons or entities other than the intended recipient > is prohibited. If you received this in error, please contact the sender and > destroy any copies of this information. > > -----Original Message----- > From: dgoldenberg [mailto:dgoldenberg...@gmail.com] > Sent: Wednesday, February 18, 2015 1:54 PM > To: user@spark.apache.org > Subject: NotSerializableException: > org.apache.http.impl.client.DefaultHttpClient when trying to send documents > to Solr > > I'm using Solrj in a Spark program. When I try to send the docs to Solr, I > get the NotSerializableException on the DefaultHttpClient. Is there a > possible fix or workaround? > > I'm using Spark 1.2.1 with Hadoop 2.4, SolrJ is version 4.0.0. > > final HttpSolrServer solrServer = new HttpSolrServer(SOLR_SERVER_URL); ... > JavaRDD<SolrInputDocument> solrDocs = rdd.map(new Function<Row, > SolrInputDocument>() { > public SolrInputDocument call(Row r) { > return r.toSolrDocument(); > } > }); > > solrDocs.foreachPartition(new > VoidFunction<Iterator<SolrInputDocument>>() > { > public void call(Iterator<SolrInputDocument> solrDocIterator) > throws Exception { > List<SolrInputDocument> batch = new > ArrayList<SolrInputDocument>(); > > while (solrDocIterator.hasNext()) { > SolrInputDocument inputDoc = > solrDocIterator.next(); > batch.add(inputDoc); > if (batch.size() >= batchSize) { > Utils.sendBatchToSolr(solrServer, > solrCollection, batch); > } > } > if (!batch.isEmpty()) { > Utils.sendBatchToSolr(solrServer, solrCollection, > batch); > } > } > }); > > ---------------- > > Exception in thread "main" org.apache.spark.SparkException: Task not > serializable > at > > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) > at org.apache.spark.SparkContext.clean(SparkContext.scala:1478) > at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:789) > at > > org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:195) > at > org.apache.spark.api.java.JavaRDD.foreachPartition(JavaRDD.scala:32) > at > com.kona.motivis.spark.proto.SparkProto.execute(SparkProto.java:158) > at > com.kona.motivis.spark.proto.SparkProto.main(SparkProto.java:186) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.NotSerializableException: > org.apache.http.impl.client.DefaultHttpClient > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > at > > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > at > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) > at > > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) > at > > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) > at > > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/NotSerializableException-org-apache-http-impl-client-DefaultHttpClient-when-trying-to-send-documentsr-tp21713.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org > > > > > > > > > > > > > > This message has been scanned for malware by Websense. www.websense.com >