from:"lihu"

Re: Is there a list of missing optimizations for typed functions?

2017-02-27 Thread lihu

Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for more detail. For performance issue,it is better to using the DataFrame than DataSet API. On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski wrote: > Hi Justin, > > I have never seen such a list. I think the area is in heav

Re: Graphx

2016-03-11 Thread lihu

Hi, John: I am very intersting in your experiment, How can you get that RDD serialization cost lots of time, from the log or some other tools? On Fri, Mar 11, 2016 at 8:46 PM, John Lilley wrote: > Andrew, > > > > We conducted some tests for using Graphx to solve the connected-components >

how to using local repository in spark[dev]

2015-11-27 Thread lihu

Hi, All: I modify the spark code and try to use some extra jars in Spark, the extra jars is published in my local maven repository using* mvn install*. However the sbt can not find this jars file, even I can find this jar fils under* /home/myname/.m2/resposiroty*. I can guarantee tha

Re: high GC in the Kmeans algorithm

2015-02-17 Thread lihu

sualVM. -Xiangrui > > On Wed, Feb 11, 2015 at 1:35 AM, lihu wrote: > > I just want to make the best use of CPU, and test the performance of > spark > > if there is a lot of task in a single node. > > > > On Wed, Feb 11, 2015 at 5:29 PM, Sean Owen wrote: > &g

Re: Task not serializable problem in the multi-thread SQL query

2015-02-12 Thread lihu

ecutors for more information. > > On Thu, Feb 12, 2015 at 2:34 AM, lihu wrote: > >> I try to use the multi-thread to use the Spark SQL query. >> some sample code just like this: >> >> val sqlContext = new SqlContext(sc) >> val rdd_query = sc.paralleliz

Task not serializable problem in the multi-thread SQL query

2015-02-12 Thread lihu

I try to use the multi-thread to use the Spark SQL query. some sample code just like this: val sqlContext = new SqlContext(sc) val rdd_query = sc.parallelize(data, part) rdd_query.registerTempTable("MyTable") sqlContext.cacheTable("MyTable") val serverPool = Executors.newFixedThreadPool(3) val

Re: high GC in the Kmeans algorithm

2015-02-11 Thread lihu

ou have 24 cores? > > On Wed, Feb 11, 2015 at 9:03 AM, lihu wrote: > > I give 50GB to the executor, so it seem that there is no reason the > memory > > is not enough. > > > > On Wed, Feb 11, 2015 at 4:50 PM, Sean Owen wrote: > >> > >> Meaning, you

high GC in the Kmeans algorithm

2015-02-11 Thread lihu

Hi, I run the kmeans(MLlib) in a cluster with 12 workers. Every work own a 128G RAM, 24Core. I run 48 task in one machine. the total data is just 40GB. When the dimension of the data set is about 10^7, for every task the duration is about 30s, but the cost for GC is about 20s. When I

Re: use netty shuffle for network cause high gc time

2015-01-14 Thread lihu

orted and much more > thoroughly tested version under the property > "spark.shuffle.blockTransferService", > which is set to netty by default. > > On Tue, Jan 13, 2015 at 9:26 PM, lihu wrote: > >> Hi, >> I just test groupByKey method on a 100GB data, the

Re: Save RDD with partition information

2015-01-13 Thread lihu

By the way, I am not sure enough wether the shuffle key can go into the similar container.

Re: Save RDD with partition information

2015-01-13 Thread lihu

there is no way to avoid shuffle if you use combine by key, no matter if your data is cached in memory, because the shuffle write must write the data into disk. And It seem that spark can not guarantee the similar key(K1) goes to the Container_X. you can use the tmpfs for your shuffle dir, this ca

use netty shuffle for network cause high gc time

2015-01-13 Thread lihu

Hi, I just test groupByKey method on a 100GB data, the cluster is 20 machine, each with 125GB RAM. At first I set conf.set("spark.shuffle.use.netty", "false") and run the experiment, and then I set conf.set("spark.shuffle.use.netty", "true") again to re-run the experiment, but at the lat

Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?

2015-01-12 Thread lihu

How about your scene? do you need use lots of Broadcast? If not, It will be better to focus more on other thing. At this time, there is not more better method than TorrentBroadcast. Though one-by-one, but after one node get the data, it can act as the data source immediately.

Re: do not assemble the spark example jar

2014-12-09 Thread lihu

Can this assembly get faster if we do not need the Spark SQL or some other component in spark ? such as we only need the core of spark. On Wed, Nov 26, 2014 at 3:37 PM, lihu wrote: > Matei, sorry for my last typo error. And the tip can improve about 30s in > my computer. > > On

Re: How to convert RDD to JSON?

2014-12-08 Thread lihu

RDD is just a wrap of the scala collection, Maybe you can use the .collect() method to get the scala collection type, you can then transfer to a JSON object using the Scala method.

Re: Viewing web UI after fact

2014-12-02 Thread lihu

nd to it. Once the old application >>>>>>>> finishes, the standalone Master renders the after-the-fact application >>>>>>>> UI >>>>>>>> and exposes it under a different URL. To see this, go to the Master UI >>>>>>>> (:8080) and click on your application in the "Completed >>>>>>>> Applications" table. >>>>>>>> >>>>>>>> >>>>>>>> 2014-08-13 10:56 GMT-07:00 Matei Zaharia : >>>>>>>> >>>>>>>> Take a look at http://spark.apache.org/docs/latest/monitoring.html >>>>>>>>> -- you need to launch a history server to serve the logs. >>>>>>>>> >>>>>>>>> Matei >>>>>>>>> >>>>>>>>> On August 13, 2014 at 2:03:08 AM, grzegorz-bialek ( >>>>>>>>> grzegorz.bia...@codilime.com) wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I wanted to access Spark web UI after application stops. I set >>>>>>>>> spark.eventLog.enabled to true and logs are availaible >>>>>>>>> in JSON format in /tmp/spark-event but web UI isn't available >>>>>>>>> under address >>>>>>>>> http://:4040 >>>>>>>>> I'm running Spark in standalone mode. >>>>>>>>> >>>>>>>>> What should I do to access web UI after application ends? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Grzegorz >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> View this message in context: >>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Viewing-web-UI-after-fact-tp12023.html >>>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>>> Nabble.com. >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- *Best Wishes!* *Li Hu(李浒) | Graduate Student* *Institute for Interdisciplinary Information Sciences(IIIS <http://iiis.tsinghua.edu.cn/>)* *Tsinghua University, China* *Email: lihu...@gmail.com * *Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ <http://iiis.tsinghua.edu.cn/zh/lihu/>*

Re: Viewing web UI after fact

2014-12-02 Thread lihu

Hi Grzegorz: I have a similar scenario with you, but even I called the sc.stop(), there is no APPLICATION_COMPLETE file in the log directory. can you share some experiment for this problem. Thanks very much. On Mon, Sep 15, 2014 at 4:10 PM, Grzegorz Białek < grzegorz.bia...@codilime.com>

Re: do not assemble the spark example jar

2014-11-25 Thread lihu

Matei, sorry for my last typo error. And the tip can improve about 30s in my computer. On Wed, Nov 26, 2014 at 3:34 PM, lihu wrote: > Mater, thank you very much! > After take your advice, the time for assembly from about 20min down to > 6min in my computer. that's a very big impro

Re: do not assemble the spark example jar

2014-11-25 Thread lihu

rce changes (by just running sbt/sbt with no args). It's a lot faster > the second time it builds something. > > Matei > > On Nov 25, 2014, at 8:31 PM, Matei Zaharia > wrote: > > You can do sbt/sbt assembly/assembly to assemble only the main package. > > Mate

do not assemble the spark example jar

2014-11-25 Thread lihu

Hi, The spark assembly is time costly. If I only need the spark-assembly-1.1.0-hadoop2.3.0.jar, do not need the spark-examples-1.1.0-hadoop2.3.0.jar. How to configure the spark to avoid assemble the example jar. I know *export SPARK_PREPEND_CLASSES=**true* method can reduce the assembly, but

Re: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-07-24 Thread lihu

Which code do you used, do you caused by your own code or something in spark itself? On Tue, Jul 22, 2014 at 8:50 AM, hsy...@gmail.com wrote: > I have the same problem > > > On Sat, Jul 19, 2014 at 12:31 AM, lihu wrote: > >> Hi, >> Everyone. I have a piece of

Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-07-19 Thread lihu

Hi, Everyone. I have a piece of following code. When I run it, it occurred the error just like below, it seem that the SparkContext is not serializable, but i do not try to use the SparkContext except the broadcast. [In fact, this code is in the MLLib, I just try to broadcast the centerAr

which function can generate a ShuffleMapTask

2014-06-23 Thread lihu

I see that the task will either be a ShuffleMapTask or be a ResultTask, I wonder which function will generate a ShuffleMapTask, which will generate a ResultTask?

spark-env.sh do not take effect.

2014-05-12 Thread lihu

Hi, I set a small cluster with 3 machines, every machine is 64GB RAM, 11 Core. and I used the spark0.9. I have set spark-env.sh as following: *SPARK_MASTER_IP=192.168.35.2* * SPARK_MASTER_PORT=7077* * SPARK_MASTER_WEBUI_PORT=12306* * SPARK_WORKER_CORES=3* * SPARK_WORKER_MEMORY=2

spark.akka.frameSize setting problem

2014-03-28 Thread lihu

Hi, I just run a simple example to generate some data for the ALS algorithm. my spark version is 0.9, and in local mode, the memory of my node is 108G but when I set conf.set("spark.akka.frameSize", "4096"), it then occurred the following problem, and when I do not set this, it runs well

Re: how to use the log4j for the standalone app

2014-03-10 Thread lihu

Thanks, but I do not to log myself program info, I just do not want spark output all the info to my console, I want the spark output the log into to some file which I specified. On Tue, Mar 11, 2014 at 11:49 AM, Robin Cjc wrote: > Hi lihu, > > you can extends the org.apache.spar

how to use the log4j for the standalone app

2014-03-10 Thread lihu

Hi, I use the spark0.9, and when i run the spark-shell, I can log property according the log4j.properties in the SPARK_HOME/conf directory.But when I use the standalone app, I do not know how to log it. I use the SparkConf to set it, such as: *val conf = new SparkConf()* * conf.set("*log4

Re: Is there a list of missing optimizations for typed functions?

Re: Graphx

how to using local repository in spark[dev]

Re: high GC in the Kmeans algorithm

Re: Task not serializable problem in the multi-thread SQL query

Task not serializable problem in the multi-thread SQL query

Re: high GC in the Kmeans algorithm

high GC in the Kmeans algorithm

Re: use netty shuffle for network cause high gc time

Re: Save RDD with partition information

Re: Save RDD with partition information

use netty shuffle for network cause high gc time

Re: Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?

Re: do not assemble the spark example jar

Re: How to convert RDD to JSON?

Re: Viewing web UI after fact

Re: Viewing web UI after fact

Re: do not assemble the spark example jar

Re: do not assemble the spark example jar

do not assemble the spark example jar

Re: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

which function can generate a ShuffleMapTask

spark-env.sh do not take effect.

spark.akka.frameSize setting problem

Re: how to use the log4j for the standalone app

how to use the log4j for the standalone app

27 matches

Site Navigation

Mail list logo

Footer information