It's possible this was caused by incorrect Graph creation, fixed in
[SPARK-13355].
Could you retry your dataset using the current master to see if the problem
is fixed? Thanks!
On Tue, Jan 19, 2016 at 5:31 AM, Li Li wrote:
> I have modified my codes. I can get the total vocabulary size and
> i
I have modified my codes. I can get the total vocabulary size and
index array and freq array from the jsonobject.
JsonArray idxArr = jo.get("idxArr").getAsJsonArray();
JsonArray freqArr=jo.get("freqArr").getAsJsonArray();
int total=jo.get("vocabSize").getAsInt();
I got it. I mistakenly thought that each line is a wordid list.
On Fri, Jan 15, 2016 at 3:24 AM, Bryan Cutler wrote:
> What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the
> Vector is a vector of counts of each term and should be the same size as the
> vocabulary (so if the voca
What I mean is the input to LDA.run() is a RDD[(Long, Vector)] and the
Vector is a vector of counts of each term and should be the same size as
the vocabulary (so if the vocabulary, or dictionary has 10 words, each
vector should have a size of 10). This probably means that there will be
some eleme
It looks like the problem is the vectors of term counts in the corpus
are not always the vocabulary size.
Do you mean some integers not occured in the corpus?
for example, I have the dictionary is 0 - 9 (total 10 words).
The docs are:
0 2 4 6 8
1 3 5 7 9
Then it will be correct
If the docs are:
0 2
I was now able to reproduce the exception using the master branch and local
mode. It looks like the problem is the vectors of term counts in the
corpus are not always the vocabulary size. Once I padded these with zero
counts to the vocab size, it ran without the exception.
Joseph, I also tried c
I will try spark 1.6.0 to see it is the bug of 1.5.2.
On Wed, Jan 13, 2016 at 3:58 PM, Li Li wrote:
> I have set up a stand alone spark cluster and use the same codes. it
> still failed with the same exception
> I also preprocessed the data to lines of integers and use the scala
> codes of lda ex
I am running it in 1.5.2. I will try running it in small standalone
cluster to see whether it's correct.
On Sat, Jan 9, 2016 at 6:21 AM, Bryan Cutler wrote:
> Hi Li,
>
> I tried out your code and sample data in both local mode and Spark
> Standalone and it ran correctly with output that looks goo
Hi Li,
I tried out your code and sample data in both local mode and Spark
Standalone and it ran correctly with output that looks good. Sorry, I
don't have a YARN cluster setup right now, so maybe the error you are
seeing is specific to that. Btw, I am running the latest Spark code from
the maste
anyone could help? the problem is very easy to reproduce. What's wrong?
On Wed, Dec 30, 2015 at 8:59 PM, Li Li wrote:
> I use a small data and reproduce the problem.
> But I don't know my codes are correct or not because I am not familiar
> with spark.
> So I first post my codes here. If it's cor
I use a small data and reproduce the problem.
But I don't know my codes are correct or not because I am not familiar
with spark.
So I first post my codes here. If it's correct, then I will post the data.
one line of my data like:
{ "time":"08-09-17","cmtUrl":"2094361"
,"rvId":"rev_1020","webpa
I will use a portion of data and try. will the hdfs block affect
spark?(if so, it's hard to reproduce)
On Wed, Dec 30, 2015 at 3:22 AM, Joseph Bradley wrote:
> Hi Li,
>
> I'm wondering if you're running into the same bug reported here:
> https://issues.apache.org/jira/browse/SPARK-12488
>
> I hav
Hi Li,
I'm wondering if you're running into the same bug reported here:
https://issues.apache.org/jira/browse/SPARK-12488
I haven't figured out yet what is causing it. Do you have a small corpus
which reproduces this error, and which you can share on the JIRA? If so,
that would help a lot in de
I ran my lda example in a yarn 2.6.2 cluster with spark 1.5.2.
it throws exception in line: Matrix topics = ldaModel.topicsMatrix();
But in yarn job history ui, it's successful. What's wrong with it?
I submit job with
.bin/spark-submit --class Myclass \
--master yarn-client \
--num-execut
14 matches
Mail list logo