Thank you, Jose. That fixed it.

On Wed, Feb 18, 2015 at 6:31 PM, Jose Fernandez <jfernan...@sdl.com> wrote:

> You need to instantiate the server in the forEachPartition block or Spark
> will attempt to serialize it to the task. See the design patterns section
> in the Spark Streaming guide.
>
>
> Jose Fernandez | Principal Software Developer
> jfernan...@sdl.com |
>
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
>
> Jose Fernandez | Principal Software Developer
> jfernan...@sdl.com |
>
> The information transmitted, including attachments, is intended only for
> the person(s) or entity to which it is addressed and may contain
> confidential and/or privileged material. Any review, retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this information by persons or entities other than the intended recipient
> is prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
>
> -----Original Message-----
> From: dgoldenberg [mailto:dgoldenberg...@gmail.com]
> Sent: Wednesday, February 18, 2015 1:54 PM
> To: user@spark.apache.org
> Subject: NotSerializableException:
> org.apache.http.impl.client.DefaultHttpClient when trying to send documents
> to Solr
>
> I'm using Solrj in a Spark program. When I try to send the docs to Solr, I
> get the NotSerializableException on the DefaultHttpClient.  Is there a
> possible fix or workaround?
>
> I'm using Spark 1.2.1 with Hadoop 2.4, SolrJ is version 4.0.0.
>
> final HttpSolrServer solrServer = new HttpSolrServer(SOLR_SERVER_URL); ...
> JavaRDD<SolrInputDocument> solrDocs = rdd.map(new Function<Row,
> SolrInputDocument>() {
>         public SolrInputDocument call(Row r) {
>                 return r.toSolrDocument();
>         }
> });
>
> solrDocs.foreachPartition(new
> VoidFunction<Iterator&lt;SolrInputDocument>>()
> {
>         public void call(Iterator<SolrInputDocument> solrDocIterator)
> throws Exception {
>                 List<SolrInputDocument> batch = new
> ArrayList<SolrInputDocument>();
>
>                 while (solrDocIterator.hasNext()) {
>                         SolrInputDocument inputDoc =
> solrDocIterator.next();
>                         batch.add(inputDoc);
>                         if (batch.size() >= batchSize) {
>                                 Utils.sendBatchToSolr(solrServer,
> solrCollection, batch);
>                         }
>                 }
>                 if (!batch.isEmpty()) {
>                         Utils.sendBatchToSolr(solrServer, solrCollection,
> batch);
>                 }
>         }
> });
>
> ----------------
>
> Exception in thread "main" org.apache.spark.SparkException: Task not
> serializable
>         at
>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
>         at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>         at org.apache.spark.SparkContext.clean(SparkContext.scala:1478)
>         at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:789)
>         at
>
> org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:195)
>         at
> org.apache.spark.api.java.JavaRDD.foreachPartition(JavaRDD.scala:32)
>         at
> com.kona.motivis.spark.proto.SparkProto.execute(SparkProto.java:158)
>         at
> com.kona.motivis.spark.proto.SparkProto.main(SparkProto.java:186)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.NotSerializableException:
> org.apache.http.impl.client.DefaultHttpClient
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>         at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>         at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>         at
>
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>         at
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>         at
>
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
>         at
>
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
>         at
>
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/NotSerializableException-org-apache-http-impl-client-DefaultHttpClient-when-trying-to-send-documentsr-tp21713.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
>
>
>
>
>
>
>
> This message has been scanned for malware by Websense. www.websense.com
>

Reply via email to