Solved:
Call spark-submit with
--driver-memory 512m --driver-java-options
"-Dspark.memory.useLegacyMode=true -Dspark.shuffle.memoryFraction=0.2
-Dspark.storage.memoryFraction=0.6 -Dspark.storage.unrollFraction=0.2"
Thanks to:
https://issues.apache.org/jira/browse/SPARK-14367
--
View this messa
Hi,
I am trying to get the same memory behavior in Spark 1.6 as I had in Spark
1.3 with default settings.
I set
--driver-java-options "--Dspark.memory.useLegacyMode=true
-Dspark.shuffle.memoryFraction=0.2 -Dspark.storage.memoryFraction=0.6
-Dspark.storage.unrollFraction=0.2"
in Spark 1.6.
But
to not use HDFS)
* Bonus question: Should I use a different API to get a better performance?
Thanks for any responses!
Tom Hubregtsen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/50-performance-decrease-when-using-local-file-vs-hdfs-tp23987.html
Sent f
oid
confusion :)
Best regards,
Tom Hubregtsen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Info-from-the-event-timeline-appears-to-contradict-dstat-info-tp23862p23865.html
Sent from the Apache Spark User List mail
ork included in
any of these 7 labels?
Thanks in advance,
Tom Hubregtsen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Info-from-the-event-timeline-appears-to-contradict-dstat-info-tp23862.html
Sent from the Apache Spark User List mail
I believe that as you are not persisting anything into the memory space
defined by
spark.storage.memoryFraction
you also have nothing to clear from this area using the unpersist.
FYI: The data will be kept in the OS-buffer/on disk at the point of the
reduce (as this involves a wide dependency ->
only available on pairRDD's, this might have something to with it..)
I am using the spark master branch. The error:
[error]
/home/th/spark-1.5.0/spark/IBM_ARL_teraSort_v4-01/src/main/scala/IBM_ARL_teraSort.scala:107:
value partitionBy is not a member of org.apache.spark.sql.DataFrame
Thanks,
I've looked a bit into what DataFrames are, and it seems that most posts on
the subject are related to SQL, but it does seem to be very efficient. My
main questions is: Are DataFrames also beneficial for non-SQL computations?
For instance I want to:
- sort k/v pairs (in particular, is the naive v
"I'm not sure, but I wonder if because you are using the Spark REPL that it
may not be representing what a normal runtime execution would look like and
is possibly eagerly running a partial DAG once you define an operation that
would cause a shuffle.
What happens if you setup your same set of comm
Thanks for the responses.
"Try removing toDebugString and see what happens. "
The toDebugString is performed after [d] (the action), as [e]. By then all
stages are already executed.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Extra-stage-that-execute
a what is running in this
Job/stage 0?
Thanks,
Tom Hubregtsen
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Extra-stage-that-executes-before-triggering-computation-with-an-action-tp22707.html
Sent from the Apache Spark User List mailing list arch
ce
>> code.
>> I've tried to search in the Spark User forum archive's, seeing request of
>> people, indicating a demand, but did not succeed in finding the actual
>> source code.
>>
>> My question:
>> Could you guys please make the source code
exist, (ii) it
> existed but the user could not navigate to it or (iii) it existed but
> was not actually a directory.
>
> So please double-check all that.
>
> On Mon, Mar 30, 2015 at 5:11 PM, Tom Hubregtsen
> wrote:
> > Stack trace:
> > 15/03/30 17:37:30 INFO storage
lways helps to show the command line you're actually running, and
> if there's an exception, the first few frames of the stack trace.)
>
> On Mon, Mar 30, 2015 at 4:11 PM, Tom Hubregtsen
> wrote:
> > Updated spark-defaults and spark-env:
> > "Log directory /hom
Updated spark-defaults and spark-env:
"Log directory /home/hduser/spark/spark-events does not exist."
(Also, in the default /tmp/spark-events it also did not work)
On 30 March 2015 at 18:03, Marcelo Vanzin wrote:
> Are those config values in spark-defaults.conf? I don't think you can
> use "~" t
1.pdf>.
> It is expected to scale sub-linearly; i.e., O(log N), where N is the
> number of machines in your cluster.
> We evaluated up to 100 machines, and it does follow O(log N) scaling.
>
> --
> Mosharaf Chowdhury
> http://www.mosharaf.com/
>
> On Wed, Mar 11, 2015 at
Thanks Mosharaf, for the quick response! Can you maybe give me some
pointers to an explanation of this strategy? Or elaborate a bit more on it?
Which parts are involved in which way? Where are the time penalties and how
scalable is this implementation?
Thanks again,
Tom
On 11 March 2015 at 16:01
17 matches
Mail list logo