Re: Spark resilience

2014-04-15 Thread Arpit Tak
1. If we add more executors to cluster and data is already cached inside system(rdds are already there) . so, in that case those executors will run job on new executors or not , as rdd are not present there?? if yes, then how the performance on new executors ?? 2. What is the replication factor in

Re: Proper caching method

2014-04-15 Thread Arpit Tak
Hi Cheng, Is it possibe to delete or replicate an rdd ?? > rdd1 = textFile("hdfs...").cache() > > rdd2 = rdd1.filter(userDefinedFunc1).cache() > rdd3 = rdd1.filter(userDefinedFunc2).cache() I reframe above question , if rdd1 is around 50G and after filtering its come around say 4G. then to incre

Re: Proper caching method

2014-04-16 Thread Arpit Tak
way, an additional job is required so that you have chance to > evict rdd1 as early as possible. > > > On Wed, Apr 16, 2014 at 2:43 PM, Arpit Tak wrote: > >> Hi Cheng, >> >> Is it possibe to delete or replicate an rdd ?? >> >> >> >

Create cache fails on first time

2014-04-16 Thread Arpit Tak
LED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask Regards, Arpit Tak

Re: Shark: class java.io.IOException: Cannot run program "/bin/java"

2014-04-16 Thread Arpit Tak
just set your java class path properly export JAVA_HOME=/usr/lib/jvm/java-7-. (somewhat like this...whatever version you having) it will work Regards, Arpit On Wed, Apr 16, 2014 at 1:24 AM, ge ko wrote: > Hi, > > > > after starting the shark-shell > via /opt/shark/shark-0.9.1/bin/sha

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-04-16 Thread Arpit Tak
I too stuck on same issue , but on shark (0.9 with spark-0.9 ) on hadoop-2.2.0 . On rest hadoop versions , it works perfect Regards, Arpit Tak On Wed, Apr 16, 2014 at 11:18 PM, Aureliano Buendia wrote: > Is this resolved in spark 0.9.1? > > > On Tue, Apr 15, 2014 at 6:55 PM,

Re: Spark packaging

2014-04-16 Thread Arpit Tak
Also try this ... http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_Ubuntu-12.04 http://docs.sigmoidanalytics.com/index.php/How_to_Install_Spark_on_HortonWorks_VM Regards, arpit On Thu, Apr 10, 2014 at 3:04 AM, Pradeep baji wrote: > Thanks Prabeesh. > > > On Wed, Apr 9, 2014 a

Re: sbt assembly error

2014-04-16 Thread Arpit Tak
Its because , there is no sl4f directory exists there may be they updating it . https://oss.sonatype.org/content/repositories/snapshots/org/ Hard luck try after some time... Regards, Arpit On Thu, Apr 17, 2014 at 12:33 AM, Yiou Li wrote: > Hi all, > > I am trying to build spark a

Re: Spark on Yarn or Mesos?

2014-04-17 Thread Arpit Tak
Hi Wel, Take a look at this post... http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-standalone-mode-vs-YARN-td2016.html Regards, Arpit Tak On Thu, Apr 17, 2014 at 3:42 PM, Wei Wang wrote: > Hi, there > > I would like to know is

Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

2014-04-17 Thread Arpit Tak
Just for curiosity , as you are using Cloudera-Manager hadoop and spark.. How you build shark .for it?? are you able to read any file from hdfs ...did you tried that out..??? Regards, Arpit Tak On Thu, Apr 17, 2014 at 7:07 PM, ge ko wrote: > Hi, > >

Re: AmpCamp exercise in a local environment

2014-04-18 Thread Arpit Tak
?id=0B0Q4Le4DZj5iNUdSZXpFTUJEU0E&export=download You will love it... Regards, Arpit Tak On Tue, Apr 15, 2014 at 4:28 AM, Nabeel Memon wrote: > Hi. I found AmpCamp exercises as a nice way to get started with spark. > However they require amazon ec2 access. Has anyone put together

Re: AmpCamp exercise in a local environment

2014-04-18 Thread Arpit Tak
Download Cloudera VM from here. https://drive.google.com/file/d/0B7zn-Mmft-XcdTZPLXltUjJyeUE/edit?usp=sharing Regards, Arpit Tak On Fri, Apr 18, 2014 at 1:20 PM, Arpit Tak wrote: > HI Nabeel, > > I have a cloudera VM , It has both spark and shark installed in it. > You

Re: Having spark-ec2 join new slaves to existing cluster

2014-04-18 Thread Arpit Tak
Hi all, If the cluster is running and I want to add slaves to existing cluster , which is the best way of doing it: 1.) As Matei said, select slave launch more of these 2.) Create a AMI of it and launch more of it like these . The plus point of first is that its faster , but I have to rync every

Re: Task splitting among workers

2014-04-21 Thread Arpit Tak
1.) How about if data is in S3 and we cached in memory , instead of hdfs ? 2.) How is the numbers of reducers determined in both case . Even if I specify set.mapred.reduce.tasks=50, still somehow reducers allocated are only 2, instead of 50. Although query/tasks gets completed. Regards, Arpit

Re: Java heap space and spark.akka.frameSize Inbox x

2014-04-21 Thread Arpit Tak
Also check out this post http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html On Mon, Apr 21, 2014 at 11:49 AM, Akhil Das wrote: > Hi Chieh, > > You can increase the heap size by exporting the java options (See below, > will increase the heap size

Re: how to set spark.executor.memory and heap size

2014-04-24 Thread Arpit Tak
Hi, You should be able to read it, file://or file:/// not even required for reading locally , just path is enough.. what error message you getting on spark-shell while reading... for local: Also read the same from hdfs file also ... put your README file there and read , it works in both ways..

Re: how to set spark.executor.memory and heap size

2014-04-24 Thread Arpit Tak
.0.jar")) val tr = sc.textFile(logFile).cache tr.take(100).foreach(println) } } This will work On Thu, Apr 24, 2014 at 3:00 PM, wxhsdp wrote: > hi arpit, > on spark shell, i can read local file properly, > but when i use sbt run, error occurs.

Re: error in mllib lr example code

2014-04-24 Thread Arpit Tak
Also try out these examples, all of them works http://docs.sigmoidanalytics.com/index.php/MLlib if you spot any problems in those, let us know. Regards, arpit On Wed, Apr 23, 2014 at 11:08 PM, Matei Zaharia wrote: > See http://people.csail.mit.edu/matei/spark-unified-docs/ for a more > re

Re: Is there anything that I need to modify?

2014-05-11 Thread Arpit Tak
Try setting hostname to domain setting in /etc/hosts . Its not able to resolve ip to hostname try this ... localhost 192.168.10.220 CHBM220 On Wed, May 7, 2014 at 12:50 PM, Sophia wrote: > [root@CHBM220 spark-0.9.1]# > > SPARK_JAR=.assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2

Re: run spark0.9.1 on yarn with hadoop CDH4

2014-05-15 Thread Arpit Tak
Also try this out , we have already done this .. It will help you.. http://docs.sigmoidanalytics.com/index.php/Setup_hadoop_2.0.0-cdh4.2.0_and_spark_0.9.0_on_ubuntu_12.04 On Tue, May 6, 2014 at 10:17 PM, Andrew Lee wrote: > Please check JAVA_HOME. Usually it should point to /usr/java/default