yes, Spark needs to create the RDD first(loads all the data) to create the
sample. You can split the files into two sets outside of spark in order to
load only the sample set. Thank youDhiraj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Behaviour-of-RDD-
I have not come across official docs in this regard how ever if you use 24
hour window size, you will need to have memory big enough to fit the stream
data for 24 hours. Usually memory is the limiting factor for the window
size.
Dhiraj Peechara
--
View this message in context:
http://apache-s
Thes is very well explained.
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/What-is-difference-btw-reduce-fold-tp22653p25376.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Have you found how to get the applicationId from submissionId ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-get-Application-ID-from-Submission-ID-or-Driver-ID-programmatically-tp24341p24912.html
Sent from the Apache Spark User List mailing list ar
I am getting same error. Any resolution on this issue ?
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p24082.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
I am having the same issue. Have you found any resolution ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Connection-closed-reset-by-peers-error-tp21459p24081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
I had a similar requirement and I come up with a small algorithem to
determine number of partitions based on cluster size and input data.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Question-regarding-spark-data-partition-and-coalesce-Need-info-on-my-us
I have built a data analytics SaaS platform by creating Rest end points and
based on the type of job request I would invoke the necessary spark job/jobs
and return the results as json(async). I used yarn-client mode to submit the
jobs to yarn cluster.
hope this helps.
--
View this messag
Hi,
I am having similar issues. Have you found any resolution ?
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Have you found any resolution for this issue ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-save-RDD-as-text-file-to-local-file-system-tp21050p21067.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
I am facing same exception in saveAsObjectFile. Have you found any solution ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-IOException-Mkdirs-failed-to-create-file-some-path-myapp-csv-while-using-rdd-saveAsTextFile-k-tp20994p21066.html
Sent from
I am running into similar problem. Have you found any resolution to this
issue ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Elastic-allocation-spark-dynamicAllocation-enabled-results-in-task-never-being-executed-tp18969p20957.html
Sent from the Apache S
I am able to fix it by adding the the jars(in the spark distribution) to the
classpath. In my sbt file I changed the scope to provided.
Let me know if you need more details.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/trying-to-understand-yarn-client-m
Currently only standalone cluster is supported with the spark-ec2 script. You
can use Cloudera/ambari/sequenceiq for creating yarn cluster.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-YARN-on-a-cluster-created-with-spark-ec2-tp20816p20870.html
Sent
When I am running spark locally, RDD saveAsObjectFile writes the file to
local file system (ex : path /data/temp.txt)
and
when I am running spark on YARN cluster, RDD saveAsObjectFile writes the
file to hdfs. (ex : path /data/temp.txt )
Is there a way to explictly mention local file system inste
I am running a 3 node(32 core, 60gb) Yarn cluster for Spark jobs.
1) Below are my Yarn memory settings
yarn.nodemanager.resource.memory-mb = 52224
yarn.scheduler.minimum-allocation-mb = 40960
yarn.scheduler.maximum-allocation-mb = 52224
Apache Spark Memory Settings
export SPARK_EXECUTOR_MEMORY=
Yes export worked.
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-on-yarn-cluster-problem-tp7560p17180.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Hi,
I am facing same problem. My spark-env.sh has below entries yet I see the
yarn container with only 1G and yarn only spawns two workers.
SPARK_EXECUTOR_CORES=1
SPARK_EXECUTOR_MEMORY=3G
SPARK_EXECUTOR_INSTANCES=5
Please let me know if you are able to resolve this issue.
Thank you
--
Vi
18 matches
Mail list logo