You should add the hub command line wrapper of git for github to that wiki
page: https://github.com/github/hub -- doesn't look like I have edit access
to the wiki, or I've forgotten a password, or something
Once you've got hub installed and aliased, you've got some nice additional
options, suc
Hi Chieh,
You can increase the heap size by exporting the java options (See below,
will increase the heap size to 10Gb)
export _JAVA_OPTIONS="-Xmx10g"
On Mon, Apr 21, 2014 at 11:43 AM, Chieh-Yen wrote:
> Can anybody help me?
> Thanks.
>
> Chieh-Yen
>
>
> On Wed, Apr 16, 2014 at 5:18 PM, Chie
Can anybody help me?
Thanks.
Chieh-Yen
On Wed, Apr 16, 2014 at 5:18 PM, Chieh-Yen wrote:
> Dear all,
>
> I developed a application that the message size of communication
> is greater than 10 MB sometimes.
> For smaller datasets it works fine, but fails for larger datasets.
> Please check the er
Ah great. thanks. missed the quotes.
On Sun, Apr 20, 2014 at 9:01 PM, Patrick Wendell wrote:
> I put some notes in this doc:
> https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
>
>
> On Sun, Apr 20, 2014 at 8:58 PM, Arun Ramakrishnan <
> sinchronized.a...@gmail.com> wrote
For a HadoopRDD, first the spark scheduler calculates the number of tasks
based on input splits. Usually people use this with HDFS data so in that
case it's based on HDFS blocks. If the HDFS datanodes are co-located with
the Spark cluster then it will try to run the tasks on the data node that
cont
I put some notes in this doc:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
On Sun, Apr 20, 2014 at 8:58 PM, Arun Ramakrishnan <
sinchronized.a...@gmail.com> wrote:
> I would like to run some of the tests selectively. I am in branch-1.0
>
> Tried the following two comm
I would like to run some of the tests selectively. I am in branch-1.0
Tried the following two commands. But, it seems to run everything.
./sbt/sbt testOnly org.apache.spark.rdd.RDDSuite
./sbt/sbt test-only org.apache.spark.rdd.RDDSuite
Also, how do I run tests of only one of the subprojects.
Hi~Ankurdave~
Now I get another question, I realized that GraphX provides four different
graph partition methods: RandonVertexCut, CanonicalRandomVertexCut,
EdgePartition1D and EdgePartition2D. I've test the running time of these
four method using pagerank in several different datasets and found th
Hello~
I was running some pagerank tests of GraphX in my 8 nodes cluster. I
allocated each worker 32G memory and 8 CPU cores. The LiveJournal dataset
used 370s, which in my mind is reasonable. But when I tried the
com-Friendster data ( http://snap.stanford.edu/data/com-Friendster.html )
with 656083
M ¥
n vc czwqq
On Sunday, April 20, 2014, Brad Heller wrote:
> Hey list,
>
> I've got some CSV data I'm importing from S3. I can create the external
> table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
> it to pull the data internal to Spark.
>
> Here's the HQL for my
Hey list,
I've got some CSV data I'm importing from S3. I can create the external
table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from
it to pull the data internal to Spark.
Here's the HQL for my external table:
https://gist.github.com/bradhe/11126024
Now I'd like to add pa
I want to evaluate spark performance by measuring the running time of
transformation operations such as map and join. To do so, do I need to
materialize merely count action? because As far as I know, transformations
are lazy operations and don't do any computation until we action on them but
when I
Great, this should give me enough to go on. Appreciate the help!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Valid-spark-streaming-use-case-tp4410p4507.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
The homepage for Ooyala's job server is here:
https://github.com/ooyala/spark-jobserver
They decided (I think with input from the Spark team) that it made more
sense to keep the jobserver in a separate repository for now.
Andrew
On Fri, Apr 18, 2014 at 5:42 AM, Azuryy Yu wrote:
> Hi,
> Good t
Hi,I am unable to see how Shark (eventually Spark) can recover from a bad
node in the cluster. One of my EC2 clusters with 50 nodes ended up with a
single node with datanode corruption, I can see the following error when I'm
trying to load up a simple file into memory using CTAS:
org.apache.hadoop.
Problem solved, Shivaram's answer in the github post is the perfect solution
for me.
See https://github.com/amplab-extras/SparkR-pkg/issues/46#
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-error-initializing-SparkR-tp4495p4504.html
Sent
Type alias aren't safe as you could use any string as a name or id.
On 20 Apr 2014 14:18, "Surendranauth Hiraman"
wrote:
> If the purpose is only aliasing, rather than adding additional methods and
> avoiding runtime allocation, what about type aliases?
>
> type ID = String
> type Name = String
>
Oh, sorry, I think your point was probably you wouldn't need runtime
allocation.
I guess that is the key question. I would be interested if this works for
you.
-Suren
On Sun, Apr 20, 2014 at 9:18 AM, Surendranauth Hiraman <
suren.hira...@velos.io> wrote:
> If the purpose is only aliasing, rat
If the purpose is only aliasing, rather than adding additional methods and
avoiding runtime allocation, what about type aliases?
type ID = String
type Name = String
On Sat, Apr 19, 2014 at 9:26 PM, kamatsuoka wrote:
> No, you can wrap other types in value classes as well. You can try it in
>
Hi
I am studing the structure of the Spark Streaming(my spark version is
0.9.0). I have a question about the SocketReceiver.In the onStart function:
---
protected def onStart() {
logInfo("Connecting to " + host + ":" + port)
val sock
I just updated the github issue -- In case anybody is curious, this was a
problem with R resolving the right java version installed in the VM.
Thanks
Shivaram
On Sat, Apr 19, 2014 at 7:12 PM, tongzzz wrote:
> I can't initialize sc context after a successful install on Cloudera
> quickstart VM.
21 matches
Mail list logo