Hi Manoj,

Done.

https://issues.apache.org/jira/browse/SPARK-9277

On Thu, Jul 23, 2015 at 1:02 PM, Manoj Kumar <manojkumarsivaraj...@gmail.com
> wrote:

> Hi,
>
> I think this should raise an error both in the scala code and python API.
>
> Please open a JIRA.
>
> On Thu, Jul 23, 2015 at 4:22 PM, Andrew Vykhodtsev <yoz...@gmail.com>
> wrote:
>
>> Dear Developers,
>>
>> I found that one can create SparseVector inconsistently and it will lead
>> to an Java error in runtime, for example when training
>> LogisticRegressionWithSGD.
>>
>> Here is the test case:
>>
>>
>> In [2]:
>> sc.version
>> Out[2]:
>> u'1.3.1'
>> In [13]:
>> from pyspark.mllib.linalg import SparseVector
>> from pyspark.mllib.regression import LabeledPoint
>> from pyspark.mllib.classification import LogisticRegressionWithSGD
>> In [3]:
>> x =  SparseVector(2, {1:1, 2:2, 3:3, 4:4, 5:5})
>> In [10]:
>> l = LabeledPoint(0, x)
>> In [12]:
>> r = sc.parallelize([l])
>> In [14]:
>> m = LogisticRegressionWithSGD.train(r)
>>
>> Error:
>>
>>
>> Py4JJavaError: An error occurred while calling 
>> o86.trainLogisticRegressionModelWithSGD.
>> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 
>> in stage 11.0 failed 1 times, most recent failure: Lost task 7.0 in stage 
>> 11.0 (TID 47, localhost): *java.lang.ArrayIndexOutOfBoundsException: 2*
>>
>>
>>
>> Attached is the notebook with the scenario and the full message:
>>
>>
>>
>> Should I raise a JIRA for this (forgive me if there is such a JIRA and I did 
>> not notice it)
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>
>
>
> --
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> <http://goog_1017110195>
> http://github.com/MechCoder
>

Reply via email to