Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-06 Thread jane thorpe
Hi Som , Did you know that simple demo program of reading characters from file didn't work ? Who wrote that simple hello world type little program ? jane thorpe janethor...@aol.com -Original Message- From: jane thorpe To: somplasticllc ; user Sent: Fri, 3 Apr 2020 2:44 S

Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-06 Thread Som Lima
program ? > > jane thorpe > janethor...@aol.com > > > -Original Message- > From: jane thorpe > To: somplasticllc ; user > Sent: Fri, 3 Apr 2020 2:44 > Subject: Re: HDFS file hdfs:// > 127.0.0.1:9000/hdfs/spark/examples/README.txt > > > Thanks darling

Re: HDFS file hdfs://127.0.0.1:9000/hdfs/spark/examples/README.txt

2020-04-02 Thread jane thorpe
0.1:9000/hdfs/spark/examples/README.txt MapPartitionsRDD[91] at textFile at :27 counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[94] at reduceByKey at :30 scala> :quit jane thorpe janethor...@aol.com -Original Message- From: Som Lima CC: user Sent: Tue, 31 Mar 2020

Re: HDFS file

2020-03-31 Thread Som Lima
Hi Jane Try this example https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala Som On Tue, 31 Mar 2020, 21:34 jane thorpe, wrote: > hi, > > Are there setup instructions on the website for > spark-3.0.0-preview2-bin-hadoop2

Re: HDFS or NFS as a cache?

2017-10-02 Thread Miguel Morales
ran [mailto:ste...@hortonworks.com] > > Sent: Saturday, September 30, 2017 6:10 AM > > To: JG Perrin > > Cc: Alexander Czech ; > user@spark.apache.org > > Subject: Re: HDFS or NFS as a cache? > > > > > > > > > > > > On 29 Sep 2017, at 20:03

Re: HDFS or NFS as a cache?

2017-10-02 Thread Marcelo Vanzin
Steve Loughran [mailto:ste...@hortonworks.com] > Sent: Saturday, September 30, 2017 6:10 AM > To: JG Perrin > Cc: Alexander Czech ; user@spark.apache.org > Subject: Re: HDFS or NFS as a cache? > > > > > > On 29 Sep 2017, at 20:03, JG Perrin wrote: > > > > Y

RE: HDFS or NFS as a cache?

2017-10-02 Thread JG Perrin
ghran [mailto:ste...@hortonworks.com] Sent: Saturday, September 30, 2017 6:10 AM To: JG Perrin Cc: Alexander Czech ; user@spark.apache.org Subject: Re: HDFS or NFS as a cache? On 29 Sep 2017, at 20:03, JG Perrin mailto:jper...@lumeris.com>> wrote: You will collect in the driver (often the mas

Re: HDFS or NFS as a cache?

2017-09-30 Thread Steve Loughran
On 29 Sep 2017, at 20:03, JG Perrin mailto:jper...@lumeris.com>> wrote: You will collect in the driver (often the master) and it will save the data, so for saving, you will not have to set up HDFS. no, it doesn't work quite like that. 1. workers generate their data and save somwhere 2. on "ta

Re: HDFS or NFS as a cache?

2017-09-30 Thread Steve Loughran
On 29 Sep 2017, at 15:59, Alexander Czech mailto:alexander.cz...@googlemail.com>> wrote: Yes I have identified the rename as the problem, that is why I think the extra bandwidth of the larger instances might not help. Also there is a consistency issue with S3 because of the how the rename work

RE: HDFS or NFS as a cache?

2017-09-29 Thread JG Perrin
You will collect in the driver (often the master) and it will save the data, so for saving, you will not have to set up HDFS. From: Alexander Czech [mailto:alexander.cz...@googlemail.com] Sent: Friday, September 29, 2017 8:15 AM To: user@spark.apache.org Subject: HDFS or NFS as a cache? I have a

Re: HDFS or NFS as a cache?

2017-09-29 Thread Alexander Czech
Yes I have identified the rename as the problem, that is why I think the extra bandwidth of the larger instances might not help. Also there is a consistency issue with S3 because of the how the rename works so that I probably lose data. On Fri, Sep 29, 2017 at 4:42 PM, Vadim Semenov wrote: > How

Re: HDFS or NFS as a cache?

2017-09-29 Thread Vadim Semenov
How many files you produce? I believe it spends a lot of time on renaming the files because of the output committer. Also instead of 5x c3.2xlarge try using 2x c3.8xlarge instead because they have 10GbE and you can get good throughput for S3. On Fri, Sep 29, 2017 at 9:15 AM, Alexander Czech < alex

Re: hdfs persist rollbacks when spark job is killed

2016-08-08 Thread Gourav Sengupta
There is a mv command in GCS but I am not quite sure (because of limitation of data on which I work on it and lack my budget) whether the mv command actually copies and deletes or just re-points the files to a new directory by changing its meta-data. Yes the Data Quality checks are done after the

Re: hdfs persist rollbacks when spark job is killed

2016-08-07 Thread Chanh Le
Thank you Gourav, > Moving files from _temp folders to main folders is an additional overhead > when you are working on S3 as there is no move operation. Good catch. Is that GCS the same? > I generally have a set of Data Quality checks after each job to ascertain > whether everything went fine

Re: hdfs persist rollbacks when spark job is killed

2016-08-07 Thread Gourav Sengupta
But you have to be careful, that is the default setting. There is a way you can overwrite it so that the writing to _temp folder does not take place and you write directly to the main folder. Moving files from _temp folders to main folders is an additional overhead when you are working on S3 as th

Re: hdfs persist rollbacks when spark job is killed

2016-08-07 Thread Chanh Le
It’s out of the box in Spark. When you write data into hfs or any storage it only creates a new parquet folder properly if your Spark job was success else only _temp folder inside to mark it’s still not success (spark was killed) or nothing inside (Spark job was failed). > On Aug 8, 2016,

Re: HDFS

2015-12-14 Thread Akhil Das
Try to set the spark.locality.wait to a higher number and see if things change. You can read more about the configuration properties from here http://spark.apache.org/docs/latest/configuration.html#scheduling Thanks Best Regards On Sat, Dec 12, 2015 at 12:16 AM, shahid ashraf wrote: > hi Folks

RE: hdfs-ha on mesos - odd bug

2015-11-11 Thread Buttler, David
m: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Tuesday, September 15, 2015 7:47 PM To: Adrian Bridgett Cc: user Subject: Re: hdfs-ha on mesos - odd bug On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett wrote: > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID > 0

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
. 2015 à 16:48, a écrit : > Thanks a lot, why you said "the most recent version" ? > > - Mail original - > De: "Jörn Franke" > À: "nibiau" > Cc: banto...@gmail.com, user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 13:56:43 >

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
Thanks a lot, why you said "the most recent version" ? - Mail original - De: "Jörn Franke" À: "nibiau" Cc: banto...@gmail.com, user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 13:56:43 Objet: Re: RE : Re: HDFS small file generation problem Yes the m

Re: RE : Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
gt; After a CONCATENATE I suppose the records are still updatable. >> >> Tks to confirm if it can be solution for my use case. Or any other idea.. >> >> Thanks a lot ! >> Nicolas >> >> >> - Mail original - >> De: "Jörn Franke" >

RE : Re: HDFS small file generation problem

2015-10-03 Thread nibiau
firm if it can be solution for my use case. Or any other idea.. Thanks a lot ! Nicolas - Mail original - De: "Jörn Franke" À: nib...@free.fr, "Brett Antonides" Cc: user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation pro

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
re still updatable. > > Tks to confirm if it can be solution for my use case. Or any other idea.. > > Thanks a lot ! > Nicolas > > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, "Brett Antonides" > Cc: user@spark.apache.org >

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
olas > > > - Mail original - > De: "Jörn Franke" > À: nib...@free.fr, "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Samedi 3 Octobre 2015 11:17:51 > Objet: Re: HDFS small file generation problem > > > > You can update data

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
;Jörn Franke" À: nib...@free.fr, "Brett Antonides" Cc: user@spark.apache.org Envoyé: Samedi 3 Octobre 2015 11:17:51 Objet: Re: HDFS small file generation problem You can update data in hive if you use the orc format Le sam. 3 oct. 2015 à 10:42, < nib...@free.fr > a écrit :

Re: HDFS small file generation problem

2015-10-03 Thread Jörn Franke
Mail original - > De: nib...@free.fr > À: "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also update data instead of insert data ? > >

Re: HDFS small file generation problem

2015-10-03 Thread Jagat Singh
er solutions ? > > Nicolas > > - Mail original - > De: nib...@free.fr > À: "Brett Antonides" > Cc: user@spark.apache.org > Envoyé: Vendredi 2 Octobre 2015 18:37:22 > Objet: Re: HDFS small file generation problem > > Ok thanks, but can I also update data inst

Re: HDFS small file generation problem

2015-10-03 Thread nibiau
7:22 Objet: Re: HDFS small file generation problem Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet: Re: HDFS small file generation problem I had a

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
Ok thanks, but can I also update data instead of insert data ? - Mail original - De: "Brett Antonides" À: user@spark.apache.org Envoyé: Vendredi 2 Octobre 2015 18:18:18 Objet: Re: HDFS small file generation problem I had a very similar problem and solved it with Hi

Re: HDFS small file generation problem

2015-10-02 Thread Brett Antonides
t; De: "Jörn Franke" > À: nib...@free.fr, "user" > Envoyé: Lundi 28 Septembre 2015 23:53:56 > Objet: Re: HDFS small file generation problem > > > > Use hadoop archive > > > > Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écr

Re: HDFS small file generation problem

2015-10-02 Thread nibiau
-- De: "Jörn Franke" À: nib...@free.fr, "user" Envoyé: Lundi 28 Septembre 2015 23:53:56 Objet: Re: HDFS small file generation problem Use hadoop archive Le dim. 27 sept. 2015 à 15:36, < nib...@free.fr > a écrit : Hello, I'm still investigating my small fil

Re: HDFS small file generation problem

2015-09-28 Thread Jörn Franke
Use hadoop archive Le dim. 27 sept. 2015 à 15:36, a écrit : > Hello, > I'm still investigating my small file generation problem generated by my > Spark Streaming jobs. > Indeed, my Spark Streaming jobs are receiving a lot of small events (avg > 10kb), and I have to store them inside HDFS in ord

Re: HDFS is undefined

2015-09-28 Thread Ted Yu
Please post the question on vendor's forum. > On Sep 25, 2015, at 7:13 AM, Angel Angel wrote: > > hello, > I am running the spark application. > > I have installed the cloudera manager. > it includes the spark version 1.2.0 > > > But now i want to use spark version 1.4.0. > > its also worki

Re: HDFS is undefined

2015-09-28 Thread Akhil Das
For some reason Spark isnt picking up your hadoop confs, Did you download spark compiled with the hadoop version that you are having in the cluster? Thanks Best Regards On Fri, Sep 25, 2015 at 7:43 PM, Angel Angel wrote: > hello, > I am running the spark application. > > I have installed the cl

Re: HDFS small file generation problem

2015-09-27 Thread Deenar Toraskar
You could try a couple of things a) use Kafka for stream processing, store current incoming events and spark streaming job ouput in Kafka rather than on HDFS and dual write to HDFS too (in a micro batched mode), so every x minutes. Kafka is more suited to processing lots of small events/ b) Coales

Re: HDFS small file generation problem

2015-09-27 Thread ayan guha
I would suggest not to write small files to hdfs. rather you can hold them in memory, maybe off heap. and then you may flush it to hdfs using another job. similar to https://github.com/ptgoetz/storm-hdfs (not sure if spark already has something like it) On Sun, Sep 27, 2015 at 11:36 PM, wrote: >

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Marcelo Vanzin
On Mon, Sep 14, 2015 at 6:55 AM, Adrian Bridgett wrote: > 15/09/14 13:00:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, > 10.1.200.245): java.lang.IllegalArgumentException: > java.net.UnknownHostException: nameservice1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenServic

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Iulian Dragoș
I've seen similar traces, but couldn't track down the failure completely. You are using Kerberos for your HDFS cluster, right? AFAIK Kerberos isn't supported in Mesos deployments. Can you resolve that host name (nameservice1) from the driver machine (ping nameservice1)? Can it be resolved from the

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
Thanks Steve - we are already taking the safe route - putting NN and datanodes on the central mesos-masters which are on demand. Later (much later!) we _may_ put some datanodes on spot instances (and using several spot instance types as the spikes seem to only affect one type - worst case we c

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Steve Loughran
> On 15 Sep 2015, at 08:55, Adrian Bridgett wrote: > > Hi Sam, in short, no, it's a traditional install as we plan to use spot > instances and didn't want price spikes to kill off HDFS. > > We're actually doing a bit of a hybrid, using spot instances for the mesos > slaves, ondemand for the m

Re: hdfs-ha on mesos - odd bug

2015-09-15 Thread Adrian Bridgett
Hi Sam, in short, no, it's a traditional install as we plan to use spot instances and didn't want price spikes to kill off HDFS. We're actually doing a bit of a hybrid, using spot instances for the mesos slaves, ondemand for the mesos masters. So for the time being, putting hdfs on the master

Re: hdfs-ha on mesos - odd bug

2015-09-14 Thread Sam Bessalah
I don't know about the broken url. But are you running HDFS as a mesos framework? If so is it using mesos-dns? Then you should resolve the namenode via hdfs:/// On Mon, Sep 14, 2015 at 3:55 PM, Adrian Bridgett wrote: > I'm hitting an odd issue with running spark on mesos together with > HA-

Re: HDFS performances + unexpected death of executors.

2015-07-14 Thread Max Demoulin
I will try a fresh setup very soon. Actually, I tried to compile spark by myself, against hadoop 2.5.2, but I had the issue that I mentioned in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-td23651.html I was wondering if maybe serialization/deseria

Re: HDFS not supported by databricks cloud :-(

2015-06-16 Thread Simon Elliston Ball
You could consider using Zeppelin and spark on yarn as an alternative. http://zeppelin.incubator.apache.org/ Simon > On 16 Jun 2015, at 17:58, Sanjay Subramanian > wrote: > > hey guys > > After day one at the spark-summit SFO, I realized sadly that (indeed) HDFS is > not supported by Databr

Re: HDFS Rest Service not available

2015-06-02 Thread Su She
Ahh, this did the trick, I had to get the name node out of same mode however before it fully worked. Thanks! On Tue, Jun 2, 2015 at 12:09 AM, Akhil Das wrote: > It says your namenode is down (connection refused on 8020), you can restart > your HDFS by going into hadoop directory and typing sbin/

Re: HDFS Rest Service not available

2015-06-02 Thread Akhil Das
It says your namenode is down (connection refused on 8020), you can restart your HDFS by going into hadoop directory and typing sbin/stop-dfs.sh and then sbin/start-dfs.sh Thanks Best Regards On Tue, Jun 2, 2015 at 5:03 AM, Su She wrote: > Hello All, > > A bit scared I did something stupid...I

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-27 Thread Su She
Thanks Akhil! 1) I had to do sudo -u hdfs hdfs dfsadmin -safemode leave a) I had created a user called hdfs with superuser privileges in Hue, hence the double hdfs. 2) Lastly, I know this is getting a bit off topic, but this is my etc/hosts file: 127.0.0.1 localhost.localdomain loca

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Akhil Das
Command would be: hadoop dfsadmin -safemode leave If you are not able to ping your instances, it can be because of you are blocking all the ICMP requests. Im not quiet sure why you are not able to ping google.com from your instances. Make sure the internal IP (ifconfig) is proper in the f

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-26 Thread Su She
Hello Sean and Akhil, I shut down the services on Cloudera Manager. I shut them down in the appropriate order and then stopped all services of CM. I then shut down my instances. I then turned my instances back on, but I am getting the same error. 1) I tried hadoop fs -safemode leave and it said -

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-22 Thread Sean Owen
If you are using CDH, you would be shutting down services with Cloudera Manager. I believe you can do it manually using Linux 'services' if you do the steps correctly across your whole cluster. I'm not sure if the stock stop-all.sh script is supposed to work. Certainly, if you are using CM, by far

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-21 Thread Su She
Hello Sean & Akhil, I tried running the stop-all.sh script on my master and I got this message: localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). chown: changing ownership of `/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/logs': Operation not permitted no org.apa

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Su She
Thanks Akhil and Sean for the responses. I will try shutting down spark, then storage and then the instances. Initially, when hdfs was in safe mode, I waited for >1 hour and the problem still persisted. I will try this new method. Thanks! On Sat, Jan 17, 2015 at 2:03 AM, Sean Owen wrote: > Y

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Sean Owen
You would not want to turn off storage underneath Spark. Shut down Spark first, then storage, then shut down the instances. Reverse the order when restarting. HDFS will be in safe mode for a short time after being started before it becomes writeable. I would first check that it's not just that. Ot

Re: HDFS Namenode in safemode when I turn off my EC2 instance

2015-01-17 Thread Akhil Das
Safest way would be to first shutdown HDFS and then shutdown Spark (call stop-all.sh would do) and then shutdown the machines. You can execute the following command to disable safe mode: *hadoop fs -safemode leave* Thanks Best Regards On Sat, Jan 17, 2015 at 8:31 AM, Su She wrote: > Hello E

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
@spark.apache.org Subject: Re: hdfs streaming context Yes but you can't follow three slashes with host:port. No host probably defaults to whatever is found in your HDFS config. On Mon, Dec 1, 2014 at 11:02 PM, Bui, Tri wrote: > For the streaming example I am working on, Its accepted ("

Re: hdfs streaming context

2014-12-01 Thread Sean Owen
Yes but you can't follow three slashes with host:port. No host probably defaults to whatever is found in your HDFS config. On Mon, Dec 1, 2014 at 11:02 PM, Bui, Tri wrote: > For the streaming example I am working on, Its accepted ("hdfs:///user/data") > without the localhost info. > > Let me dig

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
t Cc: user@spark.apache.org Subject: Re: hdfs streaming context Yes, in fact, that's the only way it works. You need "hdfs://localhost:8020/user/data", I believe. (No it's not correct to write "hdfs:///...") On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert wrote: >

Re: hdfs streaming context

2014-12-01 Thread Benjamin Cuthbert
Thanks Sean, That worked just removing the /* and leaving it as /user/data Seems to be streaming in. > On 1 Dec 2014, at 22:50, Sean Owen wrote: > > Yes, in fact, that's the only way it works. You need > "hdfs://localhost:8020/user/data", I believe. > > (No it's not correct to write "hdfs://

Re: hdfs streaming context

2014-12-01 Thread Sean Owen
Yes, in fact, that's the only way it works. You need "hdfs://localhost:8020/user/data", I believe. (No it's not correct to write "hdfs:///...") On Mon, Dec 1, 2014 at 10:41 PM, Benjamin Cuthbert wrote: > All, > > Is it possible to stream on HDFS directory and listen for multiple files? > > I hav

Re: hdfs streaming context

2014-12-01 Thread Andy Twigg
Have you tried just passing a path to ssc.textFileStream() ? It monitors the path for new files by looking at mtime/atime ; all new/touched files in the time window appear as an rdd in the dstream. On 1 December 2014 at 14:41, Benjamin Cuthbert wrote: > All, > > Is it possible to stream on HDFS d

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Try ("hdfs:///localhost:8020/user/data/*") With 3 "/". Thx tri -Original Message- From: Benjamin Cuthbert [mailto:cuthbert@gmail.com] Sent: Monday, December 01, 2014 4:41 PM To: user@spark.apache.org Subject: hdfs streaming context All, Is it possible to stream on HDFS director

Re: HDFS read text file

2014-11-17 Thread Hlib Mykhailenko
Hello Naveen, I think you should first override "toString" method of your sample.spark.test.Student class. -- Cordialement, Hlib Mykhailenko Doctorant à INRIA Sophia-Antipolis Méditerranée 2004 Route des Lucioles BP93 06902 SOPHIA ANTIPOLIS cedex - Original Message - > From: "N

Re: HDFS read text file

2014-11-17 Thread Akhil Das
You can use the sc.objectFile to read it. It will be RDD[Student] type. Thanks Best Regards On Mon, Nov 17, 2014 at 4:03 PM, Naveen Kumar Pokala < npok...@spcapitaliq.com> wrote: > Hi, > > > > > > JavaRDD s

Re: hdfs read performance issue

2014-08-20 Thread Gurvinder Singh
I got some time to look in to it. It appears as that Spark (latest git) is doing this operation much more often compare to Aug 1 version. Here is the log from operation I am referring to 14/08/19 12:37:26 INFO spark.CacheManager: Partition rdd_8_414 not found, computing it 14/08/19 12:37:26 INFO r

Re: hdfs replication on saving RDD

2014-07-15 Thread Kan Zhang
Andrew, there are overloaded versions of saveAsHadoopFile or saveAsNewAPIHadoopFile that allow you to pass in a per-job Hadoop conf. saveAsTextFile is just a convenience wrapper on top of saveAsHadoopFile. On Mon, Jul 14, 2014 at 11:22 PM, Andrew Ash wrote: > In general it would be nice to be a

Re: hdfs replication on saving RDD

2014-07-14 Thread Andrew Ash
In general it would be nice to be able to configure replication on a per-job basis. Is there a way to do that without changing the config values in the Hadoop conf/ directory between jobs? Maybe by modifying OutputFormats or the JobConf ? On Mon, Jul 14, 2014 at 11:12 PM, Matei Zaharia wrote:

Re: hdfs replication on saving RDD

2014-07-14 Thread Matei Zaharia
You can change this setting through SparkContext.hadoopConfiguration, or put the conf/ directory of your Hadoop installation on the CLASSPATH when you launch your app so that it reads the config values from there. Matei On Jul 14, 2014, at 8:06 PM, valgrind_girl <124411...@qq.com> wrote: > eag

Re: hdfs replication on saving RDD

2014-07-14 Thread valgrind_girl
eager to know this issue too,does any one knows how? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/hdfs-replication-on-saving-RDD-tp289p9700.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-23 Thread Andrew Lee
t: 5f48721, github.com/apache/spark/pull/586 From: alee...@hotmail.com To: user@spark.apache.org Subject: RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode Date: Wed, 18 Jun 2014 11:24:36 -0700 Forgot to mention that I am using spark-submit to submit jobs, and a verbose

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated. [tes

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-12 Thread bijoy deb
Hi, The problem was due to a pre-built/binary Tachyon-0.4.1 jar in the SPARK_CLASSPATH, and that Tachyon jar had been built against Hadoop-1.0.4.Building the Tachyon against Hadoop-2.0.0 resolved the issue. Thanks On Wed, Jun 11, 2014 at 11:34 PM, Marcelo Vanzin wrote: > The error is saying t

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-11 Thread Marcelo Vanzin
The error is saying that your client libraries are older than what your server is using (2.0.0-mr1-cdh4.6.0 is IPC version 7). Try double-checking that your build is actually using that version (e.g., by looking at the hadoop jar files in lib_managed/jars). On Wed, Jun 11, 2014 at 2:07 AM, bijoy

Re: HDFS Server/Client IPC version mismatch while trying to access HDFS files using Spark-0.9.1

2014-06-11 Thread bijoy deb
Any suggestions from anyone? Thanks Bijoy On Tue, Jun 10, 2014 at 11:46 PM, bijoy deb wrote: > Hi all, > > I have build Shark-0.9.1 using sbt using the below command: > > *SPARK_HADOOP_VERSION=2.0.0-mr1-cdh4.6.0 sbt/sbt assembly* > > My Hadoop cluster is also having version 2.0.0-mr1-cdh4.6.0.