Thanks Zhan, I'm also confused about the jstack output, why the driver gets
stuck at  "org.apache.spark.SparkContext.clean" ?

On Tue, Jan 6, 2015 at 2:10 PM, Zhan Zhang <zzh...@hortonworks.com> wrote:

> I think it is overflow. The training data is quite big. The algorithms
>  scalability highly depends on the vocabSize. Even without overflow, there
> are still other bottlenecks, for example, syn0Global and syn1Global, each
> of them has vocabSize * vectorSize elements.
>
> Thanks.
>
> Zhan Zhang
>
>
>
> On Jan 5, 2015, at 7:47 PM, Eric Zhen <zhpeng...@gmail.com> wrote:
>
> Hi Xiangrui,
>
> Our dataset is about 80GB(10B lines).
>
> In the driver's log, we foud this:
>
> *INFO Word2Vec: trainWordsCount = -1610413239*
>
> it seems that there is a integer overflow?
>
>
> On Tue, Jan 6, 2015 at 5:44 AM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> How big is your dataset, and what is the vocabulary size? -Xiangrui
>>
>> On Sun, Jan 4, 2015 at 11:18 PM, Eric Zhen <zhpeng...@gmail.com> wrote:
>> > Hi,
>> >
>> > When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup
>> > usage. Here is the jstack output:
>> >
>> > "main" prio=10 tid=0x0000000040112800 nid=0x46f2 runnable
>> > [0x000000004162e000]
>> >    java.lang.Thread.State: RUNNABLE
>> >         at
>> >
>> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847)
>> >         at
>> >
>> java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1778)
>> >         at java.io.DataOutputStream.writeInt(DataOutputStream.java:182)
>> >         at
>> java.io.DataOutputStream.writeFloat(DataOutputStream.java:225)
>> >         at
>> >
>> java.io.ObjectOutputStream$BlockDataOutputStream.writeFloats(ObjectOutputStream.java:2064)
>> >         at
>> > java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1310)
>> >         at
>> > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1154)
>> >         at
>> >
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>> >         at
>> > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>> >         at
>> >
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>> >         at
>> > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>> >         at
>> >
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>> >         at
>> > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>> >         at
>> >
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>> >         at
>> > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>> >         at
>> >
>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>> >         at
>> > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>> >         at
>> >
>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>> >         at
>> > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>> >         at
>> > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
>> >         at
>> >
>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
>> >         at
>> >
>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
>> >         at
>> >
>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
>> >         at
>> > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>> >         at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>> >         at
>> org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:610)
>> >         at
>> >
>> org.apache.spark.mllib.feature.Word2Vec$$anonfun$fit$1.apply$mcVI$sp(Word2Vec.scala:291)
>> >         at
>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>> >         at
>> org.apache.spark.mllib.feature.Word2Vec.fit(Word2Vec.scala:290)
>> >         at com.baidu.inf.WordCount$.main(WordCount.scala:31)
>> >         at com.baidu.inf.WordCount.main(WordCount.scala)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at
>> > org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>> >         at
>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>> >         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> >
>> > --
>> > Best Regards
>>
>
>
>
> --
> Best Regards
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Best Regards

Reply via email to