1. If we add more executors to cluster and data is already cached inside
system(rdds are already there) . so, in that case
those executors will run job on new executors or not , as rdd are not
present there??
if yes, then how the performance on new executors ??
2. What is the replication factor in
Hi Cheng,
Is it possibe to delete or replicate an rdd ??
> rdd1 = textFile("hdfs...").cache()
>
> rdd2 = rdd1.filter(userDefinedFunc1).cache()
> rdd3 = rdd1.filter(userDefinedFunc2).cache()
I reframe above question , if rdd1 is around 50G and after filtering its
come around say 4G.
then to incre
way, an additional job is required so that you have chance to
> evict rdd1 as early as possible.
>
>
> On Wed, Apr 16, 2014 at 2:43 PM, Arpit Tak wrote:
>
>> Hi Cheng,
>>
>> Is it possibe to delete or replicate an rdd ??
>>
>>
>> >
LED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask
Regards,
Arpit Tak
just set your java class path properly
export JAVA_HOME=/usr/lib/jvm/java-7-. (somewhat like this...whatever
version you having)
it will work
Regards,
Arpit
On Wed, Apr 16, 2014 at 1:24 AM, ge ko wrote:
> Hi,
>
>
>
> after starting the shark-shell
> via /opt/shark/shark-0.9.1/bin/sha
I too stuck on same issue , but on shark (0.9 with spark-0.9 ) on
hadoop-2.2.0 .
On rest hadoop versions , it works perfect
Regards,
Arpit Tak
On Wed, Apr 16, 2014 at 11:18 PM, Aureliano Buendia wrote:
> Is this resolved in spark 0.9.1?
>
>
> On Tue, Apr 15, 2014 at 6:55 PM,
Also try this ...
http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Ubuntu-12.04
http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_HortonWorks_VM
Regards,
arpit
On Thu, Apr 10, 2014 at 3:04 AM, Pradeep baji
wrote:
> Thanks Prabeesh.
>
>
> On Wed, Apr 9, 2014 a
Its because , there is no sl4f directory exists there may be they
updating it .
https://oss.sonatype.org/content/repositories/snapshots/org/
Hard luck try after some time...
Regards,
Arpit
On Thu, Apr 17, 2014 at 12:33 AM, Yiou Li wrote:
> Hi all,
>
> I am trying to build spark a
Hi Wel,
Take a look at this post...
http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-td2016.html
Regards,
Arpit Tak
On Thu, Apr 17, 2014 at 3:42 PM, Wei Wang wrote:
> Hi, there
>
> I would like to know is
Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
How you build shark .for it??
are you able to read any file from hdfs ...did you tried that out..???
Regards,
Arpit Tak
On Thu, Apr 17, 2014 at 7:07 PM, ge ko wrote:
> Hi,
>
>
?id=0B0Q4Le4DZj5iNUdSZXpFTUJEU0E&export=download
You will love it...
Regards,
Arpit Tak
On Tue, Apr 15, 2014 at 4:28 AM, Nabeel Memon wrote:
> Hi. I found AmpCamp exercises as a nice way to get started with spark.
> However they require amazon ec2 access. Has anyone put together
Download Cloudera VM from here.
https://drive.google.com/file/d/0B7zn-Mmft-XcdTZPLXltUjJyeUE/edit?usp=sharing
Regards,
Arpit Tak
On Fri, Apr 18, 2014 at 1:20 PM, Arpit Tak wrote:
> HI Nabeel,
>
> I have a cloudera VM , It has both spark and shark installed in it.
> You
Hi all,
If the cluster is running and I want to add slaves to existing cluster ,
which is the best way of doing it:
1.) As Matei said, select slave launch more of these
2.) Create a AMI of it and launch more of it like these .
The plus point of first is that its faster , but I have to rync every
1.) How about if data is in S3 and we cached in memory , instead of hdfs ?
2.) How is the numbers of reducers determined in both case .
Even if I specify set.mapred.reduce.tasks=50, still somehow reducers
allocated are only 2, instead of 50. Although query/tasks gets completed.
Regards,
Arpit
Also check out this post
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html
On Mon, Apr 21, 2014 at 11:49 AM, Akhil Das wrote:
> Hi Chieh,
>
> You can increase the heap size by exporting the java options (See below,
> will increase the heap size
Hi,
You should be able to read it, file://or file:/// not even required for
reading locally , just path is enough..
what error message you getting on spark-shell while reading...
for local:
Also read the same from hdfs file also ...
put your README file there and read , it works in both ways..
.0.jar"))
val tr = sc.textFile(logFile).cache
tr.take(100).foreach(println)
}
}
This will work
On Thu, Apr 24, 2014 at 3:00 PM, wxhsdp wrote:
> hi arpit,
> on spark shell, i can read local file properly,
> but when i use sbt run, error occurs.
Also try out these examples, all of them works
http://docs.sigmoidanalytics.com/index.php/MLlib
if you spot any problems in those, let us know.
Regards,
arpit
On Wed, Apr 23, 2014 at 11:08 PM, Matei Zaharia wrote:
> See http://people.csail.mit.edu/matei/spark-unified-docs/ for a more
> re
Try setting hostname to domain setting in /etc/hosts .
Its not able to resolve ip to hostname
try this ...
localhost 192.168.10.220 CHBM220
On Wed, May 7, 2014 at 12:50 PM, Sophia wrote:
> [root@CHBM220 spark-0.9.1]#
>
> SPARK_JAR=.assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2
Also try this out , we have already done this ..
It will help you..
http://docs.sigmoidanalytics.com/index.php/Setup_hadoop_2.0.0-cdh4.2.0_and_spark_0.9.0_on_ubuntu_12.04
On Tue, May 6, 2014 at 10:17 PM, Andrew Lee wrote:
> Please check JAVA_HOME. Usually it should point to /usr/java/default
20 matches
Mail list logo