[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-23 Thread Earthson
Github user Earthson closed the pull request at: https://github.com/apache/spark/pull/11237 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-21 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/11237#issuecomment-186774933 @srowen Is this ok to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-18 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/11237#issuecomment-186041596 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[jira] [Commented] (SPARK-13359) ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6

2016-02-18 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153586#comment-15153586 ] Earthson Lu commented on SPARK-13359: - I see:) > ArrayType(_, true) shou

[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-18 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/11237#issuecomment-186021130 https://issues.apache.org/jira/browse/SPARK-12746 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-13359][ML] ArrayType(_, true) should al...

2016-02-17 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/11237#issuecomment-185130735 @mengxr Hi, Xiangrui, I've copied the code from master to fix SPARK-12746 for branch-1.6. related: https://github.com/apache/spark/pull/10697 --- If

[GitHub] spark pull request: [ML] ArrayType(_, true) should also accept Arr...

2016-02-17 Thread Earthson
GitHub user Earthson opened a pull request: https://github.com/apache/spark/pull/11237 [ML] ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6 https://issues.apache.org/jira/browse/SPARK-13359 You can merge this pull request into a Git repository by

[jira] [Created] (SPARK-13359) ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6

2016-02-16 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-13359: --- Summary: ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6 Key: SPARK-13359 URL: https://issues.apache.org/jira/browse/SPARK-13359 Project

[GitHub] spark pull request: [SPARK-12746][ML] ArrayType(_, true) should al...

2016-02-12 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/10697#issuecomment-183275317 @mengxr ok, I'll have a look:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[jira] [Issue Comment Deleted] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-12746: Comment: was deleted (was: Hi Joseph, what is the status of nullability now? It seems someone has

[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122870#comment-15122870 ] Earthson Lu commented on SPARK-12746: - Hi Joseph, what is the status of nullabi

[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122871#comment-15122871 ] Earthson Lu commented on SPARK-12746: - Hi Joseph, what is the status of nullabi

[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097724#comment-15097724 ] Earthson Lu commented on SPARK-12746: - ok, i see:) If there's no nullabil

[jira] [Issue Comment Deleted] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-12746: Comment: was deleted (was: I was just wandering if you could do a review:) On Tue, Jan 12, 2016

[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096120#comment-15096120 ] Earthson Lu commented on SPARK-12746: - I was just wandering if you could do a re

[jira] [Updated] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-12746: Shepherd: Joseph K. Bradley (was: Xiangrui Meng) > ArrayType(_, true) should also acc

[GitHub] spark pull request: [ML] ArrayType(_, true) should also accept Arr...

2016-01-11 Thread Earthson
GitHub user Earthson opened a pull request: https://github.com/apache/spark/pull/10697 [ML] ArrayType(_, true) should also accept ArrayType(_, false) https://issues.apache.org/jira/browse/SPARK-12746 You can merge this pull request into a Git repository by running: $ git pull

[jira] [Updated] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-12746: Description: I see CountVectorizer has schema check for ArrayType which has ArrayType(StringType

[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091487#comment-15091487 ] Earthson Lu commented on SPARK-12746: - I could work on this:) I have some idea

[jira] [Comment Edited] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091487#comment-15091487 ] Earthson Lu edited comment on SPARK-12746 at 1/11/16 6:1

[jira] [Created] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-12746: --- Summary: ArrayType(_, true) should also accept ArrayType(_, false) Key: SPARK-12746 URL: https://issues.apache.org/jira/browse/SPARK-12746 Project: Spark

[jira] [Comment Edited] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012815#comment-15012815 ] Earthson Lu edited comment on SPARK-6725 at 11/19/15 6:3

[jira] [Comment Edited] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012815#comment-15012815 ] Earthson Lu edited comment on SPARK-6725 at 11/19/15 5:1

[jira] [Commented] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012815#comment-15012815 ] Earthson Lu commented on SPARK-6725: I'm glad to give help:) > Model e

[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-17 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010111#comment-15010111 ] Earthson Lu commented on SPARK-6727: It’s fine:)  I can give some help when the

[GitHub] spark pull request: [SPARK-6727][ML] Model export/import for spark...

2015-11-17 Thread Earthson
Github user Earthson closed the pull request at: https://github.com/apache/spark/pull/9650 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[jira] [Issue Comment Deleted] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-11-16 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-8332: --- Comment: was deleted (was: SparkUI not works when upgrade fasterxml.jackson to 2.5.3

[GitHub] spark pull request: [SPARK-6727][ML] Model export/import for spark...

2015-11-12 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/9650#issuecomment-156036519 I have an idea that may simplify the using DefaultParamsReader/Writer ```scala /** * Default Writable using DefaultParamsWriter

[jira] [Commented] (SPARK-6790) Model export/import for spark.ml: LinearRegression

2015-11-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001739#comment-15001739 ] Earthson Lu commented on SPARK-6790: I'm sorry, It's PR for SPARK-

[GitHub] spark pull request: https://issues.apache.org/jira/browse/SPARK-67...

2015-11-11 Thread Earthson
GitHub user Earthson opened a pull request: https://github.com/apache/spark/pull/9650 https://issues.apache.org/jira/browse/SPARK-6790 based on SPARK-6726. I've implemented a common interface for PipelineStage read/write that only need Metadata(non-model pipeline

[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001599#comment-15001599 ] Earthson Lu commented on SPARK-6727: It seems that we could implement a def

[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001590#comment-15001590 ] Earthson Lu commented on SPARK-6727: Is the API ready? Can I work on this? >

[jira] [Commented] (SPARK-6725) Model export/import for Pipeline API

2015-11-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001578#comment-15001578 ] Earthson Lu commented on SPARK-6725: Can we expect the this api be usable in s

[jira] [Commented] (SPARK-6726) Model export/import for spark.ml: LogisticRegression

2015-11-11 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000192#comment-15000192 ] Earthson Lu commented on SPARK-6726: Is the API ready for subtasks? I can do

Re: [Yarn-Client]Can not access SparkUI

2015-10-26 Thread Earthson Lu
1:45:36,600 INFO org.apache.commons.httpclient.HttpMethodDirector: Retrying request --  Earthson Lu On October 26, 2015 at 15:30:21, Deng Ching-Mallete (och...@apache.org) wrote: Hi Earthson, Unfortunately, attachments aren't allowed in the list so they seemed to have been removed from you

[Yarn-Client]Can not access SparkUI

2015-10-25 Thread Earthson
We are using Spark 1.5.1 with `--master yarn`, Yarn RM is running in HA mode. direct visit click ApplicationMaster link YARN RM log -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Yarn-Client-Can-not-access-SparkUI-tp25197.html Sent from the Apac

[jira] [Issue Comment Deleted] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earthson Lu updated SPARK-8332: --- Comment: was deleted (was: SparkUI not works when upgrade fasterxml.jackson to 2.5.3

[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964563#comment-14964563 ] Earthson Lu commented on SPARK-8332: SparkUI not works when upgrade fasterxml.jac

[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14964562#comment-14964562 ] Earthson Lu commented on SPARK-8332: SparkUI not works when upgrade fasterxml.jac

[jira] [Comment Edited] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-07-22 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636919#comment-14636919 ] Earthson Lu edited comment on SPARK-8332 at 7/22/15 1:4

[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-07-22 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14636919#comment-14636919 ] Earthson Lu commented on SPARK-8332: I recompiled spark with fasterxml.jackson 2

Re: [Spark-1.4.0]jackson-databind conflict?

2015-06-14 Thread Earthson Lu
I’ve recompiled spark-1.4.0 with fasterxml-2.5.x, it works fine now:) --  Earthson Lu On June 12, 2015 at 23:24:32, Sean Owen (so...@cloudera.com) wrote: I see the same thing in an app that uses Jackson 2.5. Downgrading to 2.4 made it work. I meant to go back and figure out if there&#

[Spark-1.4.0]jackson-databind conflict?

2015-06-12 Thread Earthson
I'm using Play-2.4 with play-json-2.4, It works fine with spark-1.3.1, but it failed after I upgrade Spark to spark-1.4.0:( sc.parallelize(1 to 1).count [info] com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd

[GitHub] spark pull request: Fallback to GenericRow

2015-03-25 Thread Earthson
Github user Earthson closed the pull request at: https://github.com/apache/spark/pull/5180 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: Fallback to GenericRow

2015-03-25 Thread Earthson
Github user Earthson commented on the pull request: https://github.com/apache/spark/pull/5180#issuecomment-85940060 @rxin, not work, because we also need no-arg ctor for StructType, also StructFields, and so on? Is this what we really want to do? I do not really understand

[GitHub] spark pull request: Fallback to GenericRow

2015-03-24 Thread Earthson
GitHub user Earthson opened a pull request: https://github.com/apache/spark/pull/5180 Fallback to GenericRow https://issues.apache.org/jira/browse/SPARK-6465 Cause: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor

[jira] [Comment Edited] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377490#comment-14377490 ] Earthson Lu edited comment on SPARK-6465 at 3/25/15 5:2

[jira] [Comment Edited] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377490#comment-14377490 ] Earthson Lu edited comment on SPARK-6465 at 3/25/15 5:2

[jira] [Commented] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377490#comment-14377490 ] Earthson Lu commented on SPARK-6465: I'm confused. https://github.com/apa

[jira] [Created] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-23 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-6465: -- Summary: GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor): Key: SPARK-6465 URL: https://issues.apache.org/jira/browse/SPARK-6465

Re: what is the best way to implement mini batches?

2014-12-15 Thread Earthson Lu
large batch for parallel inside each batch(It seems to be the way that SGD implemented in MLLib does?). --  Earthson Lu On December 16, 2014 at 04:02:22, Imran Rashid (im...@therashids.com) wrote: I'm a little confused by some of the responses.  It seems like there are two different issues

Re: what is the best way to implement mini batches?

2014-12-14 Thread Earthson
I think it could be done like: 1. using mapPartition to randomly drop some partition 2. drop some elements randomly(for selected partition) 3. calculate gradient step for selected elements I don't think fixed step is needed, but fixed step could be done: 1. zipWithIndex 2. create ShuffleRDD base

Re: How to get applicationId for yarn mode(both yarn-client and yarn-cluster mode)

2014-11-21 Thread Earthson
Finally, I've found two ways: 1. search the output with something like "Submitted application application_1416319392519_0115" 2. use specific AppName. We could query the ApplicationID(yarn) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-applic

How to get applicationId for yarn mode(both yarn-client and yarn-cluster mode)

2014-11-21 Thread Earthson
Is there any way to get the yarn application_id inside the program? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-applicationId-for-yarn-mode-both-yarn-client-and-yarn-cluster-mode-tp19462.html Sent from the Apache Spark User List mailing list

Re: [SparkSQL] Convert JavaSchemaRDD to SchemaRDD

2014-10-16 Thread Earthson
I'm trying to give API interface to Java users. And I need to accept their JavaSchemaRDDs, and convert it to SchemaRDD for Scala users. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Convert-JavaSchemaRDD-to-SchemaRDD-tp16482p16641.html Sent from t

[SparkSQL] Convert JavaSchemaRDD to SchemaRDD

2014-10-15 Thread Earthson
I don't know why the JavaSchemaRDD.baseSchemaRDD is private[sql]. And I found that DataTypeConversions is protected[sql]. Finally I find this solution: jrdd.registerTempTable("transform_tmp") jrdd.sqlContext.sql("select * from transform_tmp") Could Any One tell me that: Is it

Re: [PySpark][Python 2.7.8][Spark 1.0.2] count() with TypeError: an integer is required

2014-08-22 Thread Earthson
Do I have to deploy Python to every machine to make "$PYSPARK_PYTHON" work correctly? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Python-2-7-8-Spark-1-0-2-count-with-TypeError-an-integer-is-required-tp12643p12651.html Sent from the Apache Spark U

Re: [PySpark][Python 2.7.8][Spark 1.0.2] count() with TypeError: an integer is required

2014-08-22 Thread Earthson
I'm running pyspark with Python 2.7.8 under Virtualenv System Python Version: Python 2.6.x -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-Python-2-7-8-Spark-1-0-2-count-with-TypeError-an-integer-is-required-tp12643p12645.html Sent from the Apache

[PySpark][Python 2.7.8][Spark 1.0.2] count() with TypeError: an integer is required

2014-08-22 Thread Earthson
I am using PySpark with IPython notebook. data = sc.parallelize(range(1000), 10) #successful data.map(lambda x: x+1).collect() #Error data.count() Something similar:http://apache-spark-user-list.1001560.n3.nabble.com/Exception-on-simple-pyspark-script-td3415.html But it does not figure out

Re: [Spark 1.0.1][SparkSQL] reduce stage of shuffle is slow。

2014-07-29 Thread Earthson
Too many GC. The task runs much more faster with more memory(heap space). The CPU load is still too high, and network load is about 20+MB/s(not high enough) So what is the correct way to solve this GC problem? Is there other ways except using more memory? -- View this message in context: http

Re: [Spark 1.0.1][SparkSQL] reduce stage of shuffle is slow。

2014-07-29 Thread Earthson
It's really strange that cpu load so high and both disk/network IO load so low. CLUSTER BY is just something similar to groupBy, why it needs so much cpu resource? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SparkSQL-reduce-stage-of-shuffle-i

Re: [Spark 1.0.1][SparkSQL] reduce stage of shuffle is slow。

2014-07-28 Thread Earthson
"spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to" takes too much time, what should I do? What is the correct configuration? blockManager timeout if I using a small number of reduce partition.

[Spark 1.0.1][SparkSQL] reduce stage of shuffle is slow。

2014-07-28 Thread Earthson
I'm using SparkSQL with Hive 0.13, here is the SQL for inserting a partition with 2048 buckets. sqlsc.set("spark.sql.shuffle.partitions", "2048") hql("""|insert %s table mz_log |PARTITION (date='%s') |select * from tmp_mzlog

Re: Why spark-submit command hangs?

2014-07-22 Thread Earthson
That's what my problem is:) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-spark-submit-command-hangs-tp10308p10394.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why spark-submit command hangs?

2014-07-22 Thread Earthson
I've just have the same problem. I'm using $SPARK_HOME/bin/spark-submit --master yarn --deploy-mode client $JOBJAR --class $JOBCLASS It's really strange, because the log shows that 14/07/22 16:16:58 INFO ui.SparkUI: Started SparkUI at http://k1227.mzhen.cn:4040 14/07/22 16:16:58 WARN util.N

Re: How could I set the number of executor?

2014-06-20 Thread Earthson
--num-executors seems to be only available with YARN-only. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-could-I-set-the-number-of-executor-tp7990p7992.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

How could I set the number of executor?

2014-06-20 Thread Earthson
"spark-submit" has an arguments "--num-executors" to set the number of executor, but how could I set it from anywhere else? We're using Shark, and want to change the number of executor. The number of executor seems to be same as workers by default? Shall we configure the executor number manually(

How to add jar with SparkSQL HiveContext?

2014-06-16 Thread Earthson
I have a problem with add jar command hql("add jar /.../xxx.jar") Error: Exception in thread "main" java.lang.AssertionError: assertion failed: No plan for AddJar ... How could I do this job with HiveContext, I can't find any api to do it. Does SparkSQL with Hive support UDF/UDAF? -- View this m

Re: problem about broadcast variable in iteration

2014-05-15 Thread Earthson
RDD is not cached? Because recomputing may be required, every broadcast object is included in the dependences of RDDs, this may also have memory issue(when n and kv is too large in your case). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-b

[Suggestion]Strange behavior for broadcast cleaning with spark 0.9

2014-05-15 Thread Earthson
I'm using spark-0.9 with YARN. Q: Why spark.cleaner.ttl setting could remove broadcast that still in use? I think cleaner should not remove broadcasts still in the dependences of some RDDs. It makes the value of spark.cleaner.ttl need to be set more carefully. POINT: cleaner should not crash the

Re: Incredible slow iterative computation

2014-05-05 Thread Earthson
checkpoint seems to be just add a CheckPoint mark? You need an action after marked it. I have tried it with success:) newRdd = oldRdd.map(myFun).persist(myStorageLevel) newRdd.checkpoint // < {}) // Force evaluation newRdd.isCheckpointed // true here oldRdd.unpersist(true) If you have

Re: Cache issue for iteration with broadcast

2014-05-05 Thread Earthson
Yes, I've tried. The problem is new broadcast object generated by every step until eat up all of the memory. I solved it by using RDD.checkpoint to remove dependences to old broadcast object, and use cleanner.ttl to clean up these broadcast object automatically. If there's more elegant way to

Re: Cache issue for iteration with broadcast

2014-05-05 Thread Earthson
RDD.checkpoint works fine. But spark.cleaner.ttl is really ugly for broadcast cleaning. May be it could be removed automatically when no dependences. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350p5369.html Se

Re: Cache issue for iteration with broadcast

2014-05-05 Thread Earthson
Using checkpoint. It removes dependences:) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350p5368.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Cache issue for iteration with broadcast

2014-05-05 Thread Earthson
.set("spark.cleaner.ttl", "120") drops broadcast_0 which makes a Exception below. It is strange, because broadcast_0 is no need, and I have broadcast_3 instead, and recent RDD is persisted, there is no need for recomputing... what is the problem? need help. ~~~ 14/05/05 17:03:12 INFO stor

Re: Cache issue for iteration with broadcast

2014-05-05 Thread Earthson
How could I do iteration? because the persist is lazy and recomputing may required, all the path of iteration will be save, memory overflow can not be escaped? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350p53

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
I tried using serialization instead of broadcast, and my program exit with Error(beyond physical memory limits). The large object can not be released by GC? because it is needed for recomputing? So what is the recomended way to solve this problem? -- View this message in context: http://apache

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
Code Here <https://github.com/Earthson/sparklda/blob/dev/src/main/scala/net/earthson/nlp/lda/lda.scala#L121> Finally, iteration still runs into recomputing... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-bro

Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
A new broadcast object will generated for every iteration step, it may eat up the memory and make persist fail. The broadcast object should not be removed because RDD may be recomputed. And I am trying to prevent recomputing RDD, it need old broadcast release some memory. I've tried to set "spar

Re: cache not work as expected for iteration?

2014-05-04 Thread Earthson
thx for the help, unpersist is excatly what I want:) I see that spark will remove some cache automatically when memory is full, it is much more helpful if the rule satisfy something like LRU It seems that persist and cache is some kind of lazy? -- View this message in context: http://

cache not work as expected for iteration?

2014-05-03 Thread Earthson
code:) <https://github.com/Earthson/sparklda/blob/master/src/main/scala/net/earthson/nlp/lda/lda.scala#L99> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache1.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache2.png&

Re: parallelize for a large Seq is extreamly slow.

2014-04-29 Thread Earthson
I think the real problem is "spark.akka.frameSize". It is to small for passing the data. every executor failed, and there is no executor, then the task hangs up. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp

Re: Why Spark require this object to be serializerable?

2014-04-29 Thread Earthson
Finally, I'm using file to save RDDs, and then reload it. It works fine, because Gibbs Sampling for LDA is really slow. It's about 10min to sampling 10k wiki document for 10 round(1 round/min). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Spark-requir

Re: Why Spark require this object to be serializerable?

2014-04-28 Thread Earthson
The code is here:https://github.com/Earthson/sparklda/blob/master/src/main/scala/net/earthson/nlp/lda/lda.scala I've change it to from Broadcast to Serializable. Now it works:) But There are too many rdd cache, It is the problem? -- View this message in context: http://apache-spark-user

Re: Why Spark require this object to be serializerable?

2014-04-28 Thread Earthson
I've moved SparkContext and RDD as parameter of train. And now it tells me that SparkContext need to serialize! I think the the problem is RDD is trying to make itself lazy. and some BroadCast Object need to be generate dynamicly, so the closure have SparkContext inside, so the task complete faile

Re: Why Spark require this object to be serializerable?

2014-04-28 Thread Earthson
The RDD hold "this" in its closure? How to fix such a problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Spark-require-this-object-to-be-serializerable-tp5009p5015.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Why Spark require this object to be serializerable?

2014-04-28 Thread Earthson
Or what is the action that make the rdd run. I don't what to save it as file, and I've tried cache(), it seems to be some kind of lazy too. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-Spark-require-this-object-to-be-serializerable-tp5009p5011.html Se

Why Spark require this object to be serializerable?

2014-04-28 Thread Earthson
The problem is this object can't be Serializerable, it holds a RDD field and SparkContext. But Spark shows an error that it need Serialization. The order of my debug output is really strange. ~ Training Start! Round 0 Hehe? Hehe? started? failed? Round 1 Hehe? ~ here is my code 69 impo

Re: parallelize for a large Seq is extreamly slow.

2014-04-27 Thread Earthson
It's my fault! I upload a wrong jar when I changed the number of partitions. and Now it just works fine:) The size of word_mapping is 2444185. So it will take very long time for large object serialization? I don't think two million is very large, because the cost at local for such size is typical

Re: parallelize for a large Seq is extreamly slow.

2014-04-27 Thread Earthson
That's not work. I don't think it is just slow, It never ends(with 30+ hours, and I killed it). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4900.html Sent from the Apache Spark User List mailing list

Re: parallelize for a large Seq is extreamly slow.

2014-04-25 Thread Earthson
parallelize is still so slow. package com.semi.nlp import org.apache.spark._ import SparkContext._ import scala.io.Source import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator class MyRegistrator extends KryoRegistrator { override def registerCla

Re: parallelize for a large Seq is extreamly slow.

2014-04-25 Thread Earthson
reduceByKey(_+_).countByKey instead of countByKey seems to be fast. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4870.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: parallelize for a large Seq is extreamly slow.

2014-04-25 Thread Earthson
This error come just because I killed my App:( Is there something wrong? the reduceByKey operation is extremely slow(than default Serializer). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4869.html Sen

Re: parallelize for a large Seq is extreamly slow.

2014-04-25 Thread Earthson
I've tried to set larger buffer, but reduceByKey seems to be failed. need help:) 14/04/26 12:31:12 INFO cluster.CoarseGrainedSchedulerBackend: Shutting down all executors 14/04/26 12:31:12 INFO cluster.CoarseGrainedSchedulerBackend: Asking each executor to shut down 14/04/26 12:31:12 INFO schedule

Re: parallelize for a large Seq is extreamly slow.

2014-04-24 Thread Earthson
Kryo With Exception below: com.esotericsoftware.kryo.KryoException (com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 1) com.esotericsoftware.kryo.io.Output.require(Output.java:138) com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) com.esotericsof

parallelize for a large Seq is extreamly slow.

2014-04-24 Thread Earthson Lu
spark.parallelize(word_mapping.value.toSeq).saveAsTextFile("hdfs://ns1/nlp/word_mapping") this line is too slow. There are about 2 million elements in word_mapping. *Is there a good style for writing a large collection to hdfs?* import org.apache.spark._ > import SparkContext._ > import scala.io

[Desktop-packages] [Bug 1009879] Re: NetworkManager high CPU usage while nm-applet and Transmission run

2012-07-10 Thread Earthson
Same Issue here. when I start transmission, network-manager, deja-dup and whooise runs at high CPU level. -- You received this bug notification because you are a member of Desktop Packages, which is subscribed to network-manager in Ubuntu. https://bugs.launchpad.net/bugs/1009879 Title: Netwo

[Bug 1009879] Re: NetworkManager high CPU usage while nm-applet and Transmission run

2012-07-10 Thread Earthson
Same Issue here. when I start transmission, network-manager, deja-dup and whooise runs at high CPU level. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1009879 Title: NetworkManager high CPU usage

Re: [Ubuntu-zh] empathy在GNOME3下的 通知机制很糟糕啊

2012-06-30 Thread Earthson
值得注意的是,shell的通知在中下(如果没有更改的话),而ubuntu的通知在右上。 gnome-shell环境下,有时候两个位置不稳定,比较纠结。 本人的机器上,ubnutu的通知更稳定一些,所以我非常希望可以关闭shell自己的通知,因为它是在太卡了。 2012/6/30 Earthson > 其实我觉得shell的通知机制很好。只是,这个实现貌似有些bug,总是很卡,总是有各种问题。 > empathy的又下角托盘是能直接回复的。而且通知也是可以回复的(如果没有消失的话,但相比托盘位置,通知位置的回复这个很鸡肋,而且我这边通知非常卡) > empathy

Re: [Ubuntu-zh] empathy在GNOME3下的 通知机制很糟糕啊

2012-06-30 Thread Earthson
其实我觉得shell的通知机制很好。只是,这个实现貌似有些bug,总是很卡,总是有各种问题。 empathy的又下角托盘是能直接回复的。而且通知也是可以回复的(如果没有消失的话,但相比托盘位置,通知位置的回复这个很鸡肋,而且我这边通知非常卡) empathy的提示什么的可以用extension搞定。 至于输入法面板被shell覆盖,貌似fcitx的shell插件可以修正这个问题。 2012/6/30 Harris Wang > On 2012/6/28 17:57, Guxen Dai wrote: > >> 能直接回复的啊 >> >> 在 2012年6月28日 下午4:10,韩青

  1   2   3   >