Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for
more detail.
For performance issue,it is better to using the DataFrame than DataSet API.
On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski wrote:
> Hi Justin,
>
> I have never seen such a list. I think the area is in heav
Hi, John:
I am very intersting in your experiment, How can you get that RDD
serialization cost lots of time, from the log or some other tools?
On Fri, Mar 11, 2016 at 8:46 PM, John Lilley
wrote:
> Andrew,
>
>
>
> We conducted some tests for using Graphx to solve the connected-components
>
Hi, All:
I modify the spark code and try to use some extra jars in Spark, the
extra jars is published in my local maven repository using* mvn install*.
However the sbt can not find this jars file, even I can find this jar
fils under* /home/myname/.m2/resposiroty*.
I can guarantee tha
sualVM. -Xiangrui
>
> On Wed, Feb 11, 2015 at 1:35 AM, lihu wrote:
> > I just want to make the best use of CPU, and test the performance of
> spark
> > if there is a lot of task in a single node.
> >
> > On Wed, Feb 11, 2015 at 5:29 PM, Sean Owen wrote:
> &g
ecutors for more information.
>
> On Thu, Feb 12, 2015 at 2:34 AM, lihu wrote:
>
>> I try to use the multi-thread to use the Spark SQL query.
>> some sample code just like this:
>>
>> val sqlContext = new SqlContext(sc)
>> val rdd_query = sc.paralleliz
I try to use the multi-thread to use the Spark SQL query.
some sample code just like this:
val sqlContext = new SqlContext(sc)
val rdd_query = sc.parallelize(data, part)
rdd_query.registerTempTable("MyTable")
sqlContext.cacheTable("MyTable")
val serverPool = Executors.newFixedThreadPool(3)
val
ou have 24 cores?
>
> On Wed, Feb 11, 2015 at 9:03 AM, lihu wrote:
> > I give 50GB to the executor, so it seem that there is no reason the
> memory
> > is not enough.
> >
> > On Wed, Feb 11, 2015 at 4:50 PM, Sean Owen wrote:
> >>
> >> Meaning, you
Hi,
I run the kmeans(MLlib) in a cluster with 12 workers. Every work own
a 128G RAM, 24Core. I run 48 task in one machine. the total data is just
40GB.
When the dimension of the data set is about 10^7, for every task the
duration is about 30s, but the cost for GC is about 20s.
When I
orted and much more
> thoroughly tested version under the property
> "spark.shuffle.blockTransferService",
> which is set to netty by default.
>
> On Tue, Jan 13, 2015 at 9:26 PM, lihu wrote:
>
>> Hi,
>> I just test groupByKey method on a 100GB data, the
By the way, I am not sure enough wether the shuffle key can go into the
similar container.
there is no way to avoid shuffle if you use combine by key, no matter if
your data is cached in memory, because the shuffle write must write the
data into disk. And It seem that spark can not guarantee the similar
key(K1) goes to the Container_X.
you can use the tmpfs for your shuffle dir, this ca
Hi,
I just test groupByKey method on a 100GB data, the cluster is 20
machine, each with 125GB RAM.
At first I set conf.set("spark.shuffle.use.netty", "false") and run
the experiment, and then I set conf.set("spark.shuffle.use.netty", "true")
again to re-run the experiment, but at the lat
How about your scene? do you need use lots of Broadcast? If not, It will be
better to focus more on other thing.
At this time, there is not more better method than TorrentBroadcast. Though
one-by-one, but after one node get the data, it can act as the data source
immediately.
Can this assembly get faster if we do not need the Spark SQL or some other
component in spark ? such as we only need the core of spark.
On Wed, Nov 26, 2014 at 3:37 PM, lihu wrote:
> Matei, sorry for my last typo error. And the tip can improve about 30s in
> my computer.
>
> On
RDD is just a wrap of the scala collection, Maybe you can use the
.collect() method to get the scala collection type, you can then transfer
to a JSON object using the Scala method.
nd to it. Once the old application
>>>>>>>> finishes, the standalone Master renders the after-the-fact application
>>>>>>>> UI
>>>>>>>> and exposes it under a different URL. To see this, go to the Master UI
>>>>>>>> (:8080) and click on your application in the "Completed
>>>>>>>> Applications" table.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-08-13 10:56 GMT-07:00 Matei Zaharia :
>>>>>>>>
>>>>>>>> Take a look at http://spark.apache.org/docs/latest/monitoring.html
>>>>>>>>> -- you need to launch a history server to serve the logs.
>>>>>>>>>
>>>>>>>>> Matei
>>>>>>>>>
>>>>>>>>> On August 13, 2014 at 2:03:08 AM, grzegorz-bialek (
>>>>>>>>> grzegorz.bia...@codilime.com) wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I wanted to access Spark web UI after application stops. I set
>>>>>>>>> spark.eventLog.enabled to true and logs are availaible
>>>>>>>>> in JSON format in /tmp/spark-event but web UI isn't available
>>>>>>>>> under address
>>>>>>>>> http://:4040
>>>>>>>>> I'm running Spark in standalone mode.
>>>>>>>>>
>>>>>>>>> What should I do to access web UI after application ends?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Grzegorz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> View this message in context:
>>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Viewing-web-UI-after-fact-tp12023.html
>>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>>> Nabble.com.
>>>>>>>>>
>>>>>>>>> -
>>>>>>>>>
>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
--
*Best Wishes!*
*Li Hu(李浒) | Graduate Student*
*Institute for Interdisciplinary Information Sciences(IIIS
<http://iiis.tsinghua.edu.cn/>)*
*Tsinghua University, China*
*Email: lihu...@gmail.com *
*Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/
<http://iiis.tsinghua.edu.cn/zh/lihu/>*
Hi Grzegorz:
I have a similar scenario with you, but even I called the sc.stop(),
there is no APPLICATION_COMPLETE file in the log directory. can you share
some experiment for this problem. Thanks very much.
On Mon, Sep 15, 2014 at 4:10 PM, Grzegorz Białek <
grzegorz.bia...@codilime.com>
Matei, sorry for my last typo error. And the tip can improve about 30s in
my computer.
On Wed, Nov 26, 2014 at 3:34 PM, lihu wrote:
> Mater, thank you very much!
> After take your advice, the time for assembly from about 20min down to
> 6min in my computer. that's a very big impro
rce changes (by just running sbt/sbt with no args). It's a lot faster
> the second time it builds something.
>
> Matei
>
> On Nov 25, 2014, at 8:31 PM, Matei Zaharia
> wrote:
>
> You can do sbt/sbt assembly/assembly to assemble only the main package.
>
> Mate
Hi,
The spark assembly is time costly. If I only need
the spark-assembly-1.1.0-hadoop2.3.0.jar, do not need
the spark-examples-1.1.0-hadoop2.3.0.jar. How to configure the spark to
avoid assemble the example jar. I know *export
SPARK_PREPEND_CLASSES=**true* method
can reduce the assembly, but
Which code do you used, do you caused by your own code or something in
spark itself?
On Tue, Jul 22, 2014 at 8:50 AM, hsy...@gmail.com wrote:
> I have the same problem
>
>
> On Sat, Jul 19, 2014 at 12:31 AM, lihu wrote:
>
>> Hi,
>> Everyone. I have a piece of
Hi,
Everyone. I have a piece of following code. When I run it,
it occurred the error just like below, it seem that the SparkContext is not
serializable, but i do not try to use the SparkContext except the broadcast.
[In fact, this code is in the MLLib, I just try to broadcast the
centerAr
I see that the task will either be a ShuffleMapTask or be a ResultTask, I
wonder which function will generate a ShuffleMapTask, which will generate a
ResultTask?
Hi,
I set a small cluster with 3 machines, every machine is 64GB RAM, 11
Core. and I used the spark0.9.
I have set spark-env.sh as following:
*SPARK_MASTER_IP=192.168.35.2*
* SPARK_MASTER_PORT=7077*
* SPARK_MASTER_WEBUI_PORT=12306*
* SPARK_WORKER_CORES=3*
* SPARK_WORKER_MEMORY=2
Hi,
I just run a simple example to generate some data for the ALS
algorithm. my spark version is 0.9, and in local mode, the memory of my
node is 108G
but when I set conf.set("spark.akka.frameSize", "4096"), it
then occurred the following problem, and when I do not set this, it runs
well
Thanks, but I do not to log myself program info, I just do not want spark
output all the info to my console, I want the spark output the log into to
some file which I specified.
On Tue, Mar 11, 2014 at 11:49 AM, Robin Cjc wrote:
> Hi lihu,
>
> you can extends the org.apache.spar
Hi,
I use the spark0.9, and when i run the spark-shell, I can log property
according the log4j.properties in the SPARK_HOME/conf directory.But when I
use the standalone app, I do not know how to log it.
I use the SparkConf to set it, such as:
*val conf = new SparkConf()*
* conf.set("*log4
27 matches
Mail list logo