required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]
Hi, I am trying Spark with some sample programs, In my code, the following items are imported: import org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD, LabeledPoint} import org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD} import org.apache.spark.streaming.{Seconds, StreamingContext} import scala.util.Random I got following error: [error] StreamingModel.scala:100: type mismatch; [error] found : org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint] [error] required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector] [error] model.predictOn(labeledStream).print() [error] ^ [error] one error found [error] (compile:compile) Compilation failed Any idea? Regards
Re: required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]
Hi, line 99:model.trainOn(labeledStream) line 100: model.predictOn(labeledStream).print() line 101:ssc.start() line 102: ssc.awaitTermination() Regards On Sun, Jun 28, 2015 at 10:53 PM, Ted Yu wrote: > Can you show us your code around line 100 ? > > Which Spark release are you compiling against ? > > Cheers > > On Sun, Jun 28, 2015 at 5:49 AM, Arthur Chan > wrote: > >> Hi, >> >> I am trying Spark with some sample programs, >> >> >> In my code, the following items are imported: >> >> import >> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD, >> LabeledPoint} >> >> import >> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD} >> >> import org.apache.spark.streaming.{Seconds, StreamingContext} >> >> import scala.util.Random >> >> I got following error: >> >> [error] StreamingModel.scala:100: type mismatch; >> >> [error] found : >> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint] >> >> [error] required: >> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector] >> >> [error] model.predictOn(labeledStream).print() >> >> [error] ^ >> >> [error] one error found >> >> [error] (compile:compile) Compilation failed >> >> >> Any idea? >> >> >> Regards >> > >
Re: required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]
also my Spark is 1.4 On Mon, Jun 29, 2015 at 9:02 AM, Arthur Chan wrote: > > > Hi, > > > line 99:model.trainOn(labeledStream) > > line 100: model.predictOn(labeledStream).print() > > line 101:ssc.start() > > line 102: ssc.awaitTermination() > > > Regards > > On Sun, Jun 28, 2015 at 10:53 PM, Ted Yu wrote: > >> Can you show us your code around line 100 ? >> >> Which Spark release are you compiling against ? >> >> Cheers >> >> On Sun, Jun 28, 2015 at 5:49 AM, Arthur Chan >> wrote: >> >>> Hi, >>> >>> I am trying Spark with some sample programs, >>> >>> >>> In my code, the following items are imported: >>> >>> import >>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD, >>> LabeledPoint} >>> >>> import >>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD} >>> >>> import org.apache.spark.streaming.{Seconds, StreamingContext} >>> >>> import scala.util.Random >>> >>> I got following error: >>> >>> [error] StreamingModel.scala:100: type mismatch; >>> >>> [error] found : >>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint] >>> >>> [error] required: >>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector] >>> >>> [error] model.predictOn(labeledStream).print() >>> >>> [error] ^ >>> >>> [error] one error found >>> >>> [error] (compile:compile) Compilation failed >>> >>> >>> Any idea? >>> >>> >>> Regards >>> >> >> >
java.lang.IllegalStateException: unread block data
Hi, I use Spark 1.4. When saving the model to HDFS, I got error? Please help! Regards my scala command: sc.makeRDD(model.clusterCenters,10).saveAsObjectFile("/tmp/tweets/model") The error log: 15/07/14 18:27:40 INFO SequenceFileRDDFunctions: Saving as sequence file of type (NullWritable,BytesWritable) 15/07/14 18:27:40 INFO SparkContext: Starting job: saveAsObjectFile at :45 15/07/14 18:27:40 INFO DAGScheduler: Got job 110 (saveAsObjectFile at :45) with 10 output partitions (allowLocal=false) 15/07/14 18:27:40 INFO DAGScheduler: Final stage: ResultStage 174(saveAsObjectFile at :45) 15/07/14 18:27:40 INFO DAGScheduler: Parents of final stage: List() 15/07/14 18:27:40 INFO DAGScheduler: Missing parents: List() 15/07/14 18:27:40 INFO DAGScheduler: Submitting ResultStage 174 (MapPartitionsRDD[258] at saveAsObjectFile at :45), which has no missing parents 15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(135360) called with curMem=14724380, maxMem=280248975 15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256 stored as values in memory (estimated size 132.2 KB, free 253.1 MB) 15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(46231) called with curMem=14859740, maxMem=280248975 15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256_piece0 stored as bytes in memory (estimated size 45.1 KB, free 253.1 MB) 15/07/14 18:27:40 INFO BlockManagerInfo: Added broadcast_256_piece0 in memory on localhost:52681 (size: 45.1 KB, free: 263.1 MB) 15/07/14 18:27:40 INFO SparkContext: Created broadcast 256 from broadcast at DAGScheduler.scala:874 15/07/14 18:27:40 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 174 (MapPartitionsRDD[258] at saveAsObjectFile at :45) 15/07/14 18:27:40 INFO TaskSchedulerImpl: Adding task set 174.0 with 10 tasks 15/07/14 18:27:40 INFO TaskSetManager: Starting task 0.0 in stage 174.0 (TID 4513, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 1.0 in stage 174.0 (TID 4514, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 2.0 in stage 174.0 (TID 4515, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 3.0 in stage 174.0 (TID 4516, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 4.0 in stage 174.0 (TID 4517, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 5.0 in stage 174.0 (TID 4518, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 6.0 in stage 174.0 (TID 4519, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 7.0 in stage 174.0 (TID 4520, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 8.0 in stage 174.0 (TID 4521, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO TaskSetManager: Starting task 9.0 in stage 174.0 (TID 4522, localhost, PROCESS_LOCAL, 9486 bytes) 15/07/14 18:27:40 INFO Executor: Running task 0.0 in stage 174.0 (TID 4513) 15/07/14 18:27:40 INFO Executor: Running task 1.0 in stage 174.0 (TID 4514) 15/07/14 18:27:40 INFO Executor: Running task 2.0 in stage 174.0 (TID 4515) 15/07/14 18:27:40 INFO Executor: Running task 3.0 in stage 174.0 (TID 4516) 15/07/14 18:27:40 INFO Executor: Running task 4.0 in stage 174.0 (TID 4517) 15/07/14 18:27:40 INFO Executor: Running task 5.0 in stage 174.0 (TID 4518) 15/07/14 18:27:40 INFO Executor: Running task 6.0 in stage 174.0 (TID 4519) 15/07/14 18:27:40 INFO Executor: Running task 7.0 in stage 174.0 (TID 4520) 15/07/14 18:27:40 INFO Executor: Running task 8.0 in stage 174.0 (TID 4521) 15/07/14 18:27:40 ERROR Executor: Exception in task 1.0 in stage 174.0 (TID 4514) java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 15/07/14 18:27:40 ERROR
Re: java.lang.IllegalStateException: unread block data
Hi, Below is the log form the worker. 15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file /spark/app-20150714171703-0004/5/stderr java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) 15/07/14 17:18:57 INFO Worker: Executor app-20150714171703-0004/5 finished with state KILLED exitStatus 143 15/07/14 17:18:57 INFO Worker: Cleaning up local directories for application app-20150714171703-0004 15/07/14 17:18:57 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@10.10.10.1:52635] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
Re: java.lang.IllegalStateException: unread block data
I found the reason, it is about sc. Thanks On Tue, Jul 14, 2015 at 9:45 PM, Akhil Das wrote: > Someone else also reported this error with spark 1.4.0 > > Thanks > Best Regards > > On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan > wrote: > >> Hi, Below is the log form the worker. >> >> >> 15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file >> /spark/app-20150714171703-0004/5/stderr >> >> java.io.IOException: Stream closed >> >> at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) >> >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) >> >> at java.io.BufferedInputStream.read(BufferedInputStream.java:345) >> >> at java.io.FilterInputStream.read(FilterInputStream.java:107) >> >> at >> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) >> >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) >> >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) >> >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) >> >> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) >> >> at >> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) >> >> 15/07/14 17:18:57 INFO Worker: Executor app-20150714171703-0004/5 >> finished with state KILLED exitStatus 143 >> >> 15/07/14 17:18:57 INFO Worker: Cleaning up local directories for >> application app-20150714171703-0004 >> >> 15/07/14 17:18:57 WARN ReliableDeliverySupervisor: Association with >> remote system [akka.tcp://sparkExecutor@10.10.10.1:52635] has failed, >> address is now gated for [5000] ms. Reason is: [Disassociated]. >> > >
Which Hive version should be used with Spark 1.5.2?
Hi, I plan to upgrade from 1.4.1 (+ Hive 1.1.0) to 1.5.2, is there any upgrade document available about the upgrade especially which Hive version should be upgraded too? Regards
word2vec cosineSimilarity
Hi, I am trying sample word2vec from http://spark.apache.org/docs/latest/mllib-feature-extraction.html#example Following are my test results: scala> for((synonym, cosineSimilarity) <- synonyms) { | println(s"$synonym $cosineSimilarity") | } taiwan 2.0518918365726297 japan 1.8960962308732054 korea 1.8789320149319788 thailand 1.7549218525671182 mongolia 1.7375501108635814 I got the values cosineSimilarity are all greater than 1, should the cosineSimilarity be the values between 0 to 1? How can I get the values of Similarity in 0 to 1? Regards
Which Hive version should be used for Spark 1.3
Hi, I use Hive 0.12 for Spark 1.2 at the moment and plan to upgrade to Spark 1.3.x Could anyone advise which Hive version should be used to match Spark 1.3.x? Can I use Hive 1.1.0 for Spark 1.3? or can I use Hive 0.14 for Spark 1.3? Regards Arthur