Hi Ted, Thanks for your prompt reply.
I am afraid clearing the DNS cache did not help. I did the following sudo /etc/init.d/dnsmasq restart on the two nodes I am using, as I did not have nscd, but still getting the same error. I am launching the master from 172.26.49.156, whose old name was IMPETUS-1466, launching one worker from each of 172.26.49.156 and 172.26.49.55, and launching the app through ./bin/pyspark from 172.26.49.55. I am sending the detailed stack trace. Exception in user code: Traceback (most recent call last): File "/home/impadmin/bibudh/healthcare/code/cloudera_challenge/analyze_anomaly_with_spark.py", line 121, in anom_with_lr pat_proc = pycsv.csvToDataFrame(sqlContext, plaintext_rdd, sep = ",") File "/tmp/spark-0fe22b7c-da8a-4971-8fcf-20b43829504b/userFiles-d9a3c3ae-20d4-4476-8026-a225dd746dc4/pyspark_csv.py", line 53, in csvToDataFrame column_types = evaluateType(rdd_sql, parseDate) File "/tmp/spark-0fe22b7c-da8a-4971-8fcf-20b43829504b/userFiles-d9a3c3ae-20d4-4476-8026-a225dd746dc4/pyspark_csv.py", line 179, in evaluateType return rdd_sql.map(getRowType).reduce(reduceTypes) File "/home/impadmin/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 797, in reduce vals = self.mapPartitions(func).collect() File "/home/impadmin/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/impadmin/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/home/impadmin/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/home/impadmin/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, IMPETUS-1466): java.lang.IllegalArgumentException: java.net.UnknownHostException: IMPETUS-1466 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:231) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:166) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$33.apply(SparkContext.scala:1015) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:212) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: IMPETUS-1466 ... 38 more On Tue, Apr 12, 2016 at 5:53 PM, Ted Yu <yuzhih...@gmail.com> wrote: > FYI > > > https://documentation.cpanel.net/display/CKB/How+To+Clear+Your+DNS+Cache#HowToClearYourDNSCache-MacOS > ®10.10 > https://www.whatsmydns.net/flush-dns.html#linux > > On Tue, Apr 12, 2016 at 2:44 PM, Bibudh Lahiri <bibudhlah...@gmail.com> > wrote: > >> Hi, >> >> I am trying to run a piece of code with logistic regression on >> PySpark. I’ve run it successfully on my laptop, and I have run it >> previously on a standalone cluster mode, but the name of the server on >> which I am running it was changed in between (the old name was >> "IMPETUS-1466") by the admin. Now, when I am trying to run, it is >> throwing the following error: >> >> File >> "/home/impadmin/Nikunj/spark-1.6.0/python/lib/pyspark.zip/pyspark/sql/utils.py", >> line 53, in deco >> >> raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) >> >> pyspark.sql.utils.IllegalArgumentException: >> u'java.net.UnknownHostException: IMPETUS-1466. >> >> I have changed a few configuration files, and /etc/hosts, and >> regenerated the SSH keys, updated the files .ssh/known_hosts and >> .ssh/authorized_keys, >> but still this is not getting resolved. Can someone please point out where >> this name is being picked up from? >> >> -- >> Bibudh Lahiri >> Data Scientist, Impetus Technolgoies >> 5300 Stevens Creek Blvd >> San Jose, CA 95129 >> http://knowthynumbers.blogspot.com/ >> >> > > -- Bibudh Lahiri Data Scientist, Impetus Technolgoies 5300 Stevens Creek Blvd San Jose, CA 95129 http://knowthynumbers.blogspot.com/