[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

Saikat Kanjilal (JIRA) Wed, 09 Nov 2016 09:19:10 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651496#comment-15651496
 ]


Saikat Kanjilal commented on SPARK-9487:
----------------------------------------

ok I have moved onto python, I am attaching a log that contains test errors 
upon changing local[2] to local[4] on the ml module in python



Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).

[Stage 49:>                                                         (0 + 3) / 
3]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

                                                                                
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py", 
line 98, in __main__.GaussianMixture
Failed example:
    model.gaussiansDF.show()
Expected:
    +--------------------+--------------------+
    |                mean|                 cov|
    +--------------------+--------------------+
    |[0.82500000140229...|0.005625000000006...|
    |[-0.4777098016092...|0.167969502720916...|
    |[-0.4472625243352...|0.167304119758233...|
    +--------------------+--------------------+
    ...
Got:
    +--------------------+--------------------+
    |                mean|                 cov|
    +--------------------+--------------------+
    |[-0.6158006194417...|0.132188091748508...|
    |[0.54523101952701...|0.159129291449328...|
    |[0.54042985246699...|0.161430620150745...|
    +--------------------+--------------------+
    <BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py", 
line 123, in __main__.GaussianMixture
Failed example:
    model2.gaussiansDF.show()
Expected:
    +--------------------+--------------------+
    |                mean|                 cov|
    +--------------------+--------------------+
    |[0.82500000140229...|0.005625000000006...|
    |[-0.4777098016092...|0.167969502720916...|
    |[-0.4472625243352...|0.167304119758233...|
    +--------------------+--------------------+
    ...
Got:
    +--------------------+--------------------+
    |                mean|                 cov|
    +--------------------+--------------------+
    |[-0.6158006194417...|0.132188091748508...|
    |[0.54523101952701...|0.159129291449328...|
    |[0.54042985246699...|0.161430620150745...|
    +--------------------+--------------------+
    <BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py", 
line 656, in __main__.LDA
Failed example:
    model.describeTopics().show()
Expected:
    +-----+-----------+--------------------+
    |topic|termIndices|         termWeights|
    +-----+-----------+--------------------+
    |    0|     [1, 0]|[0.50401530077160...|
    |    1|     [0, 1]|[0.50401530077160...|
    +-----+-----------+--------------------+
    ...
Got:
    +-----+-----------+--------------------+
    |topic|termIndices|         termWeights|
    +-----+-----------+--------------------+
    |    0|     [1, 0]|[0.50010191915681...|
    |    1|     [0, 1]|[0.50010191915681...|
    +-----+-----------+--------------------+
    <BLANKLINE>
**********************************************************************
File "/Users/skanjila/code/opensource/spark/python/pyspark/ml/clustering.py", 
line 664, in __main__.LDA
Failed example:
    model.topicsMatrix()
Expected:
    DenseMatrix(2, 2, [0.496, 0.504, 0.504, 0.496], 0)
Got:
    DenseMatrix(2, 2, [0.4999, 0.5001, 0.5001, 0.4999], 0)
**********************************************************************
2 items had failures:
   2 of  21 in __main__.GaussianMixture
   2 of  20 in __main__.LDA
***Test Failed*** 4 failures.



[~srowen][~holdenk]  thoughts on next steps, should this pull request also 
contain code fixes to fix the errors that occur when changing local[2] to 
local[4] or should we break up the pull request into subcomponents, one focused 
on the scala pieces already submitted and the next focused on fixing the python 
code to work with local[4], thoughts on next steps?

> Use the same num. worker threads in Scala/Python unit tests
> -----------------------------------------------------------
>
>                 Key: SPARK-9487
>                 URL: https://issues.apache.org/jira/browse/SPARK-9487
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, Spark Core, SQL, Tests
>    Affects Versions: 1.5.0
>            Reporter: Xiangrui Meng
>              Labels: starter
>         Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

Reply via email to