[
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651496#comment-15651496
]
Saikat Kanjilal commented on SPARK-9487:
----------------------------------------
ok I have moved onto python, I am attaching a log that contains test errors
upon changing local[2] to local[4] on the ml module in python
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
[Stage 49:> (0 + 3) /
3]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py",
line 98, in __main__.GaussianMixture
Failed example:
model.gaussiansDF.show()
Expected:
+--------------------+--------------------+
| mean| cov|
+--------------------+--------------------+
|[0.82500000140229...|0.005625000000006...|
|[-0.4777098016092...|0.167969502720916...|
|[-0.4472625243352...|0.167304119758233...|
+--------------------+--------------------+
...
Got:
+--------------------+--------------------+
| mean| cov|
+--------------------+--------------------+
|[-0.6158006194417...|0.132188091748508...|
|[0.54523101952701...|0.159129291449328...|
|[0.54042985246699...|0.161430620150745...|
+--------------------+--------------------+
<BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py",
line 123, in __main__.GaussianMixture
Failed example:
model2.gaussiansDF.show()
Expected:
+--------------------+--------------------+
| mean| cov|
+--------------------+--------------------+
|[0.82500000140229...|0.005625000000006...|
|[-0.4777098016092...|0.167969502720916...|
|[-0.4472625243352...|0.167304119758233...|
+--------------------+--------------------+
...
Got:
+--------------------+--------------------+
| mean| cov|
+--------------------+--------------------+
|[-0.6158006194417...|0.132188091748508...|
|[0.54523101952701...|0.159129291449328...|
|[0.54042985246699...|0.161430620150745...|
+--------------------+--------------------+
<BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py",
line 656, in __main__.LDA
Failed example:
model.describeTopics().show()
Expected:
+-----+-----------+--------------------+
|topic|termIndices| termWeights|
+-----+-----------+--------------------+
| 0| [1, 0]|[0.50401530077160...|
| 1| [0, 1]|[0.50401530077160...|
+-----+-----------+--------------------+
...
Got:
+-----+-----------+--------------------+
|topic|termIndices| termWeights|
+-----+-----------+--------------------+
| 0| [1, 0]|[0.50010191915681...|
| 1| [0, 1]|[0.50010191915681...|
+-----+-----------+--------------------+
<BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py",
line 664, in __main__.LDA
Failed example:
model.topicsMatrix()
Expected:
DenseMatrix(2, 2, [0.496, 0.504, 0.504, 0.496], 0)
Got:
DenseMatrix(2, 2, [0.4999, 0.5001, 0.5001, 0.4999], 0)
**********************************************************************
2 items had failures:
2 of 21 in __main__.GaussianMixture
2 of 20 in __main__.LDA
***Test Failed*** 4 failures.
[~srowen][~holdenk] thoughts on next steps, should this pull request also
contain code fixes to fix the errors that occur when changing local[2] to
local[4] or should we break up the pull request into subcomponents, one focused
on the scala pieces already submitted and the next focused on fixing the python
code to work with local[4], thoughts on next steps?
> Use the same num. worker threads in Scala/Python unit tests
> -----------------------------------------------------------
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, Spark Core, SQL, Tests
> Affects Versions: 1.5.0
> Reporter: Xiangrui Meng
> Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other
> components. If the operation depends on partition IDs, e.g., random number
> generator, this will lead to different result in Python and Scala/Java. It
> would be nice to use the same number in all unit tests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]