[GitHub] zeppelin issue #928: [ZEPPELIN-116][WIP] Add Mahout Support for Spark Interp...

rawkintrevo Wed, 13 Jul 2016 13:43:21 -0700

Github user rawkintrevo commented on the issue:

    https://github.com/apache/zeppelin/pull/928
  
    @bzz, I can't recreate the build failure.
    
    I can say
    - Spark, pySpark, and Mahout notebooks and paragraphs run as expected.
    - Spark and pySpark tests pass. Also, integration tests pass in 
`zeppelin-server`.  The only thing that fails is the Spark Cluster test. 
    - The part of the Spark Cluster Test that fails is python not being found 
when testing via the REST API
    - I can also confirm that all of the failing tests ALSO work as expected 
against a built Zeppein (see following python script to recreate tests)
    
    
    ``` python
    
    # build zeppelin like this:
    #
    # mvn clean package -DskipTests -Psparkr -Ppyspark -Pspark-1.6
    
    from requests import post, get, delete
    from json import dumps
    
    ZEPPELIN_SERVER = "localhost"
    ZEPPELIN_PORT = 8080
    base_url = "http://%s:%i"; % (ZEPPELIN_SERVER, ZEPPELIN_PORT)
    
    
    
    def create_notebook(name_of_new_notebook):
        payload = {"name": name_of_new_notebook}
        notebook_url = base_url + "/api/notebook"
        r = post(notebook_url, dumps(payload))
        return r.json()
    
    def delete_notebook(notebook_id):
        target_url = base_url + "/api/notebook/%s" % notebook_id
        r = delete(target_url)
        return r
    
    
    def create_paragraph(code, notebook_id, title=""):
        target_url = base_url + "/api/notebook/%s/paragraph" % notebook_id
        payload = { "title": title, "text": code }
        r = post(target_url, dumps(payload))
        return r.json()["body"]
    
    
    notebook_id = create_notebook("test1")["body"]
    
    test_codes = [
    "%spark print(sc.parallelize(1 to 10).reduce(_ + _))",
        "%r localDF <- data.frame(name=c(\"a\", \"b\", \"c\"), age=c(19, 23, 
18))\n" +
        "df <- createDataFrame(sqlContext, localDF)\n" +
        "count(df)",
        "%pyspark print(sc.parallelize(range(1, 11)).reduce(lambda a, b: a + 
b))",
        "%pyspark print(sc.parallelize(range(1, 11)).reduce(lambda a, b: a + 
b))",
        "%pyspark\nfrom pyspark.sql.functions import *\n"
        + "print(sqlContext.range(0, 10).withColumn('uniform', rand(seed=10) * 
3.14).count())",
        "%spark z.run(1)"
    ]
    
    para_ids = [create_paragraph(c, notebook_id) for c in test_codes]
    
    # run all paragraphs:
    post(base_url + "/api/notebook/job/%s" % notebook_id)
    
    #delete_notebook(notebook_id)
    ```
    
    After two weeks of chasing dead ends and my tail, I call this is an issue 
with the testing env, not the mahout interpreter.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #928: [ZEPPELIN-116][WIP] Add Mahout Support for Spark Interp...

Reply via email to