required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]

2015-06-28 Thread Arthur Chan
Hi,

I am trying Spark with some sample programs,


In my code, the following items are imported:

import org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD,
LabeledPoint}

import org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD}

import org.apache.spark.streaming.{Seconds, StreamingContext}

import scala.util.Random

I got following error:

[error] StreamingModel.scala:100: type mismatch;

[error]  found   :
org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint]

[error]  required:
org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]

[error] model.predictOn(labeledStream).print()

[error] ^

[error] one error found

[error] (compile:compile) Compilation failed


Any idea?


Regards


Re: required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]

2015-06-28 Thread Arthur Chan
Hi,


line 99:model.trainOn(labeledStream)

line 100: model.predictOn(labeledStream).print()

line 101:ssc.start()

line 102: ssc.awaitTermination()


Regards

On Sun, Jun 28, 2015 at 10:53 PM, Ted Yu  wrote:

> Can you show us your code around line 100 ?
>
> Which Spark release are you compiling against ?
>
> Cheers
>
> On Sun, Jun 28, 2015 at 5:49 AM, Arthur Chan 
> wrote:
>
>> Hi,
>>
>> I am trying Spark with some sample programs,
>>
>>
>> In my code, the following items are imported:
>>
>> import
>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD,
>> LabeledPoint}
>>
>> import
>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD}
>>
>> import org.apache.spark.streaming.{Seconds, StreamingContext}
>>
>> import scala.util.Random
>>
>> I got following error:
>>
>> [error] StreamingModel.scala:100: type mismatch;
>>
>> [error]  found   :
>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint]
>>
>> [error]  required:
>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]
>>
>> [error] model.predictOn(labeledStream).print()
>>
>> [error] ^
>>
>> [error] one error found
>>
>> [error] (compile:compile) Compilation failed
>>
>>
>> Any idea?
>>
>>
>> Regards
>>
>
>


Re: required: org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]

2015-06-28 Thread Arthur Chan
also my Spark is 1.4

On Mon, Jun 29, 2015 at 9:02 AM, Arthur Chan 
wrote:

>
>
> Hi,
>
>
> line 99:model.trainOn(labeledStream)
>
> line 100: model.predictOn(labeledStream).print()
>
> line 101:ssc.start()
>
> line 102: ssc.awaitTermination()
>
>
> Regards
>
> On Sun, Jun 28, 2015 at 10:53 PM, Ted Yu  wrote:
>
>> Can you show us your code around line 100 ?
>>
>> Which Spark release are you compiling against ?
>>
>> Cheers
>>
>> On Sun, Jun 28, 2015 at 5:49 AM, Arthur Chan 
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying Spark with some sample programs,
>>>
>>>
>>> In my code, the following items are imported:
>>>
>>> import
>>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD,
>>> LabeledPoint}
>>>
>>> import
>>> org.apache.spark.mllib.regression.{StreamingLinearRegressionWithSGD}
>>>
>>> import org.apache.spark.streaming.{Seconds, StreamingContext}
>>>
>>> import scala.util.Random
>>>
>>> I got following error:
>>>
>>> [error] StreamingModel.scala:100: type mismatch;
>>>
>>> [error]  found   :
>>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.regression.LabeledPoint]
>>>
>>> [error]  required:
>>> org.apache.spark.streaming.dstream.DStream[org.apache.spark.mllib.linalg.Vector]
>>>
>>> [error] model.predictOn(labeledStream).print()
>>>
>>> [error] ^
>>>
>>> [error] one error found
>>>
>>> [error] (compile:compile) Compilation failed
>>>
>>>
>>> Any idea?
>>>
>>>
>>> Regards
>>>
>>
>>
>


java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
Hi,

I use Spark 1.4.  When saving the model to HDFS, I got error?

Please help!
Regards



my scala command:
sc.makeRDD(model.clusterCenters,10).saveAsObjectFile("/tmp/tweets/model")

The error log:

15/07/14 18:27:40 INFO SequenceFileRDDFunctions: Saving as sequence file of
type (NullWritable,BytesWritable)

15/07/14 18:27:40 INFO SparkContext: Starting job: saveAsObjectFile at
:45

15/07/14 18:27:40 INFO DAGScheduler: Got job 110 (saveAsObjectFile at
:45) with 10 output partitions (allowLocal=false)

15/07/14 18:27:40 INFO DAGScheduler: Final stage: ResultStage
174(saveAsObjectFile at :45)

15/07/14 18:27:40 INFO DAGScheduler: Parents of final stage: List()

15/07/14 18:27:40 INFO DAGScheduler: Missing parents: List()

15/07/14 18:27:40 INFO DAGScheduler: Submitting ResultStage 174
(MapPartitionsRDD[258] at saveAsObjectFile at :45), which has no
missing parents

15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(135360) called with
curMem=14724380, maxMem=280248975

15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256 stored as values in
memory (estimated size 132.2 KB, free 253.1 MB)

15/07/14 18:27:40 INFO MemoryStore: ensureFreeSpace(46231) called with
curMem=14859740, maxMem=280248975

15/07/14 18:27:40 INFO MemoryStore: Block broadcast_256_piece0 stored as
bytes in memory (estimated size 45.1 KB, free 253.1 MB)

15/07/14 18:27:40 INFO BlockManagerInfo: Added broadcast_256_piece0 in
memory on localhost:52681 (size: 45.1 KB, free: 263.1 MB)

15/07/14 18:27:40 INFO SparkContext: Created broadcast 256 from broadcast
at DAGScheduler.scala:874

15/07/14 18:27:40 INFO DAGScheduler: Submitting 10 missing tasks from
ResultStage 174 (MapPartitionsRDD[258] at saveAsObjectFile at :45)

15/07/14 18:27:40 INFO TaskSchedulerImpl: Adding task set 174.0 with 10
tasks

15/07/14 18:27:40 INFO TaskSetManager: Starting task 0.0 in stage 174.0
(TID 4513, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 1.0 in stage 174.0
(TID 4514, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 2.0 in stage 174.0
(TID 4515, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 3.0 in stage 174.0
(TID 4516, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 4.0 in stage 174.0
(TID 4517, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 5.0 in stage 174.0
(TID 4518, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 6.0 in stage 174.0
(TID 4519, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 7.0 in stage 174.0
(TID 4520, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 8.0 in stage 174.0
(TID 4521, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO TaskSetManager: Starting task 9.0 in stage 174.0
(TID 4522, localhost, PROCESS_LOCAL, 9486 bytes)

15/07/14 18:27:40 INFO Executor: Running task 0.0 in stage 174.0 (TID 4513)

15/07/14 18:27:40 INFO Executor: Running task 1.0 in stage 174.0 (TID 4514)

15/07/14 18:27:40 INFO Executor: Running task 2.0 in stage 174.0 (TID 4515)

15/07/14 18:27:40 INFO Executor: Running task 3.0 in stage 174.0 (TID 4516)

15/07/14 18:27:40 INFO Executor: Running task 4.0 in stage 174.0 (TID 4517)

15/07/14 18:27:40 INFO Executor: Running task 5.0 in stage 174.0 (TID 4518)

15/07/14 18:27:40 INFO Executor: Running task 6.0 in stage 174.0 (TID 4519)

15/07/14 18:27:40 INFO Executor: Running task 7.0 in stage 174.0 (TID 4520)

15/07/14 18:27:40 INFO Executor: Running task 8.0 in stage 174.0 (TID 4521)

15/07/14 18:27:40 ERROR Executor: Exception in task 1.0 in stage 174.0 (TID
4514)

java.lang.IllegalStateException: unread block data

at
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69)

at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)

at org.apache.spark.scheduler.Task.run(Task.scala:70)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

15/07/14 18:27:40 ERROR 

Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
Hi, Below is the log form the worker.


15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file
/spark/app-20150714171703-0004/5/stderr

java.io.IOException: Stream closed

at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)

at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)

at java.io.BufferedInputStream.read(BufferedInputStream.java:345)

at java.io.FilterInputStream.read(FilterInputStream.java:107)

at
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)

at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)

at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)

at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)

at
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

15/07/14 17:18:57 INFO Worker: Executor app-20150714171703-0004/5 finished
with state KILLED exitStatus 143

15/07/14 17:18:57 INFO Worker: Cleaning up local directories for
application app-20150714171703-0004

15/07/14 17:18:57 WARN ReliableDeliverySupervisor: Association with remote
system [akka.tcp://sparkExecutor@10.10.10.1:52635] has failed, address is
now gated for [5000] ms. Reason is: [Disassociated].


Re: java.lang.IllegalStateException: unread block data

2015-07-14 Thread Arthur Chan
I found the reason, it is about sc. Thanks

On Tue, Jul 14, 2015 at 9:45 PM, Akhil Das 
wrote:

> Someone else also reported this error with spark 1.4.0
>
> Thanks
> Best Regards
>
> On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan 
> wrote:
>
>> Hi, Below is the log form the worker.
>>
>>
>> 15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file
>> /spark/app-20150714171703-0004/5/stderr
>>
>> java.io.IOException: Stream closed
>>
>> at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
>>
>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
>>
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>>
>> at java.io.FilterInputStream.read(FilterInputStream.java:107)
>>
>> at
>> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
>>
>> at
>> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
>>
>> at
>> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
>>
>> at
>> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
>>
>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
>>
>> at
>> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
>>
>> 15/07/14 17:18:57 INFO Worker: Executor app-20150714171703-0004/5
>> finished with state KILLED exitStatus 143
>>
>> 15/07/14 17:18:57 INFO Worker: Cleaning up local directories for
>> application app-20150714171703-0004
>>
>> 15/07/14 17:18:57 WARN ReliableDeliverySupervisor: Association with
>> remote system [akka.tcp://sparkExecutor@10.10.10.1:52635] has failed,
>> address is now gated for [5000] ms. Reason is: [Disassociated].
>>
>
>


Which Hive version should be used with Spark 1.5.2?

2015-12-22 Thread Arthur Chan
Hi,

I plan to upgrade from 1.4.1 (+ Hive 1.1.0)  to 1.5.2, is there any upgrade
document available about the upgrade especially which Hive version should
be upgraded too?

Regards


word2vec cosineSimilarity

2015-10-15 Thread Arthur Chan
Hi,

I am trying sample word2vec  from
http://spark.apache.org/docs/latest/mllib-feature-extraction.html#example

Following are my test results:

scala> for((synonym, cosineSimilarity) <- synonyms) {
 |   println(s"$synonym $cosineSimilarity")
 | }
taiwan 2.0518918365726297
japan 1.8960962308732054
korea 1.8789320149319788
thailand 1.7549218525671182
mongolia 1.7375501108635814


I got the values cosineSimilarity are all greater than 1,  should the
cosineSimilarity be the values between 0 to 1?

How can I get the values of Similarity in 0 to 1?

Regards


Which Hive version should be used for Spark 1.3

2015-04-09 Thread Arthur Chan
Hi,

I use Hive 0.12 for Spark 1.2 at the moment and plan to upgrade to Spark
1.3.x

Could anyone advise which Hive version should be used to match Spark 1.3.x?
Can I use Hive 1.1.0 for Spark 1.3? or can I use Hive 0.14 for Spark 1.3?

Regards
Arthur