Re: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J
I see, thank you! -- Henri Maxime Demoulin 2015-06-25 5:54 GMT-04:00 Steve Loughran : > you are using a guava version on the classpath which your version of > Hadoop can't handle. try a version < 15 or build spark against Hadoop 2.7.0 > > > On 24 Jun 2015, at 19:03, maxdml wrote: > > > >Exception in thread "main" java.lang.NoSuchMethodError: > > com.google.common.base.Stopwatch.elapsedMillis()J > >at > > > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:245) > >at > > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > >
Re: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J
Can I actually include another version of guava in the classpath when launching the example through spark submit? -- Henri Maxime Demoulin 2015-06-25 10:57 GMT-04:00 Max Demoulin : > I see, thank you! > > -- > Henri Maxime Demoulin > > 2015-06-25 5:54 GMT-04:00 Steve Loughran : > >> you are using a guava version on the classpath which your version of >> Hadoop can't handle. try a version < 15 or build spark against Hadoop 2.7.0 >> >> > On 24 Jun 2015, at 19:03, maxdml wrote: >> > >> >Exception in thread "main" java.lang.NoSuchMethodError: >> > com.google.common.base.Stopwatch.elapsedMillis()J >> >at >> > >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:245) >> >at >> > >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) >> >> >
Re: Directory creation failed leads to job fail (should it?)
The underlying issue is a filesystem corruption on the workers. In the case where I use hdfs, with a sufficient amount of replica, would Spark try to launch a task on another node where the block replica is present? Thanks :-) -- Henri Maxime Demoulin 2015-06-29 9:10 GMT-04:00 ayan guha : > No, spark can not do that as it does not replicate partitions (so no retry > on different worker). It seems your cluster is not provisioned with correct > permissions. I would suggest to automate node provisioning. > > On Mon, Jun 29, 2015 at 11:04 PM, maxdml wrote: > >> Hi there, >> >> I have some traces from my master and some workers where for some reason, >> the ./work directory of an application can not be created on the workers. >> There is also an issue with the master's temp directory creation. >> >> master logs: http://pastebin.com/v3NCzm0u >> worker's logs: http://pastebin.com/Ninkscnx >> >> It seems that some of the executors can create the directories, but as >> some >> others are repetitively failing, the job ends up failing. Shouldn't spark >> manage to keep working with a smallest number of executors instead of >> failing? >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Best Regards, > Ayan Guha >
Re: Directory creation failed leads to job fail (should it?)
I see. Thank you for your help! -- Henri Maxime Demoulin 2015-06-29 11:57 GMT-04:00 ayan guha : > It's a scheduler question. Spark will retry the task on the same worker. > From spark standpoint data is not replicated because spark provides fault > tolerance but lineage not by replication. > On 30 Jun 2015 01:50, "Max Demoulin" wrote: > >> The underlying issue is a filesystem corruption on the workers. >> >> In the case where I use hdfs, with a sufficient amount of replica, would >> Spark try to launch a task on another node where the block replica is >> present? >> >> Thanks :-) >> >> -- >> Henri Maxime Demoulin >> >> 2015-06-29 9:10 GMT-04:00 ayan guha : >> >>> No, spark can not do that as it does not replicate partitions (so no >>> retry on different worker). It seems your cluster is not provisioned with >>> correct permissions. I would suggest to automate node provisioning. >>> >>> On Mon, Jun 29, 2015 at 11:04 PM, maxdml wrote: >>> >>>> Hi there, >>>> >>>> I have some traces from my master and some workers where for some >>>> reason, >>>> the ./work directory of an application can not be created on the >>>> workers. >>>> There is also an issue with the master's temp directory creation. >>>> >>>> master logs: http://pastebin.com/v3NCzm0u >>>> worker's logs: http://pastebin.com/Ninkscnx >>>> >>>> It seems that some of the executors can create the directories, but as >>>> some >>>> others are repetitively failing, the job ends up failing. Shouldn't >>>> spark >>>> manage to keep working with a smallest number of executors instead of >>>> failing? >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >>
Re: Master doesn't start, no logs
Yes, I do set $SPARK_MASTER_IP. I suspect a more "internal" issue, maybe due to multiple spark/hdfs instances having successively run on the same machine? -- Henri Maxime Demoulin 2015-07-07 4:10 GMT-04:00 Akhil Das : > Strange. What are you having in $SPARK_MASTER_IP? It may happen that it is > not able to bind to the given ip but again it should be in the logs. > > Thanks > Best Regards > > On Tue, Jul 7, 2015 at 12:54 AM, maxdml wrote: > >> Hi, >> >> I've been compiling spark 1.4.0 with SBT, from the source tarball >> available >> on the official website. I cannot run spark's master, even tho I have >> built >> and run several other instance of spark on the same machine (spark 1.3, >> master branch, pre built 1.4, ...) >> >> /starting org.apache.spark.deploy.master.Master, logging to >> >> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out >> failed to launch org.apache.spark.deploy.master.Master: >> full log in >> >> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out/ >> >> But the log file is empty. >> >> After digging up to ./bin/spark-class, and finally trying to start the >> master with: >> >> ./bin/spark-class org.apache.spark.deploy.master.Master --host >> 155.99.144.31 >> >> I still have the same result. Here is the strace output for this command: >> >> http://pastebin.com/bkJVncBm >> >> I'm using a 64 bit Xeon, CentOS 6.5, spark 1.4.0, compiled against hadoop >> 2.5.2 >> >> Any idea? :-) >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-tp23651.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Master doesn't start, no logs
Thanks, I tried that, and the result was the same. I still can start a master from the spark-1.4.0-bin-hadoop2.4 pre-built version thought I don't really know what to show more than the strace that I already linked, so I could use any hint for that. -- Henri Maxime Demoulin 2015-07-07 9:53 GMT-04:00 Akhil Das : > Can you try renaming the ~/.ivy2 file to ~/.ivy2_backup and build > spark1.4.0 again and run it? > > Thanks > Best Regards > > On Tue, Jul 7, 2015 at 6:27 PM, Max Demoulin > wrote: > >> Yes, I do set $SPARK_MASTER_IP. I suspect a more "internal" issue, maybe >> due to multiple spark/hdfs instances having successively run on the same >> machine? >> >> -- >> Henri Maxime Demoulin >> >> 2015-07-07 4:10 GMT-04:00 Akhil Das : >> >>> Strange. What are you having in $SPARK_MASTER_IP? It may happen that it >>> is not able to bind to the given ip but again it should be in the logs. >>> >>> Thanks >>> Best Regards >>> >>> On Tue, Jul 7, 2015 at 12:54 AM, maxdml wrote: >>> >>>> Hi, >>>> >>>> I've been compiling spark 1.4.0 with SBT, from the source tarball >>>> available >>>> on the official website. I cannot run spark's master, even tho I have >>>> built >>>> and run several other instance of spark on the same machine (spark 1.3, >>>> master branch, pre built 1.4, ...) >>>> >>>> /starting org.apache.spark.deploy.master.Master, logging to >>>> >>>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out >>>> failed to launch org.apache.spark.deploy.master.Master: >>>> full log in >>>> >>>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out/ >>>> >>>> But the log file is empty. >>>> >>>> After digging up to ./bin/spark-class, and finally trying to start the >>>> master with: >>>> >>>> ./bin/spark-class org.apache.spark.deploy.master.Master --host >>>> 155.99.144.31 >>>> >>>> I still have the same result. Here is the strace output for this >>>> command: >>>> >>>> http://pastebin.com/bkJVncBm >>>> >>>> I'm using a 64 bit Xeon, CentOS 6.5, spark 1.4.0, compiled against >>>> hadoop >>>> 2.5.2 >>>> >>>> Any idea? :-) >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-tp23651.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >
Re: Issues when combining Spark and a third party java library
Yes, Thank you. -- Henri Maxime Demoulin 2015-07-12 2:53 GMT-04:00 Akhil Das : > Did you try setting the HADOOP_CONF_DIR? > > Thanks > Best Regards > > On Sat, Jul 11, 2015 at 3:17 AM, maxdml wrote: > >> Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4 >> and higher from the official website. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: HDFS performances + unexpected death of executors.
I will try a fresh setup very soon. Actually, I tried to compile spark by myself, against hadoop 2.5.2, but I had the issue that I mentioned in this thread: http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-td23651.html I was wondering if maybe serialization/deserialization configuration could be the reason of my executor losses. -- Henri Maxime Demoulin 2015-07-14 3:41 GMT-04:00 Akhil Das : > This is more like a version miss-match between your spark binaries and the > hadoop. I have not tried accessing hadoop 2.5.x with spark 1.4.0 pre-build > against hadoop 2.4, If possible you could upgrade your hadoop to 2.6 and > download the spark binaries for that version, Or you can download the spark > source and compile it with hadoop 2.5 version. > > Thanks > Best Regards > > On Tue, Jul 14, 2015 at 2:18 AM, maxdml wrote: > >> Hi, >> >> I have several issues related to HDFS, that may have different roots. I'm >> posting as much information as I can, with the hope that I can get your >> opinion on at least some of them. Basically the cases are: >> >> - HDFS classes not found >> - Connections with some datanode seems to be slow/ unexpectedly close. >> - Executors become lost (and cannot be relaunched due to an out of memory >> error) >> >> * >> What I'm looking for: >> - HDFS misconfiguration/ tuning advises >> - Global setup flaws (impact of VMs and NUMA mismatch, for example) >> - For the last category of issue, I'd like to know why, when the executor >> dies, JVM's memory is not freed, thus not allowing a new executor to be >> launched.* >> >> My setup is the following: >> 1 hypervisor with 32 cores and 50 GB of RAM, 5 VMs running in this hv. >> Each >> vms has 5 cores and 7GB. >> Each node has 1 worker setup with 4 cores 6 GB available (the remaining >> resources are intended to be used by hdfs/os >> >> I run a Wordcount workload with a dataset of 4GB, on a spark 1.4.0 / hdfs >> 2.5.2 setup. I got the binaries from official websites (no local >> compiling). >> >> (1) & 2) are logged on the worker, in the work/app-id/exec-id/stderr file) >> >> *1) Hadoop class related issues* >> >> /15:34:32: DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop classes >> are unavailable. Using the older Hadoop location info code. >> java.lang.ClassNotFoundException: >> org.apache.hadoop.mapred.InputSplitWithLocationInfo/ >> >> / >> 15:40:46: DEBUG SparkHadoopUtil: Couldn't find method for retrieving >> thread-level FileSystem input data >> java.lang.NoSuchMethodException: >> org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()/ >> >> >> *2) HDFS performance related issues* >> >> The following error arise: >> >> / 15:43:16: ERROR TransportRequestHandler: Error sending result >> ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=284992323013, >> chunkIndex=2}, >> >> buffer=FileSegmentManagedBuffer{file=/tmp/spark-b17f3299-99f3-4147-929f-1f236c812d0e/executor-d4ceae23-b9d9-4562-91c2-2855baeb8664/blockmgr-10da9c53-c20a-45f7-a430-2e36d799c7e1/2f/shuffle_0_14_0.data, >> offset=15464702, length=998530}} to /192.168.122.168:59299; closing >> connection >> java.io.IOException: Broken pipe/ >> >> /15:43:16 ERROR TransportRequestHandler: Error sending result >> ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=284992323013, >> chunkIndex=0}, >> >> buffer=FileSegmentManagedBuffer{file=/tmp/spark-b17f3299-99f3-4147-929f-1f236c812d0e/executor-d4ceae23-b9d9-4562-91c2-2855baeb8664/blockmgr-10da9c53-c20a-45f7-a430-2e36d799c7e1/31/shuffle_0_12_0.data, >> offset=15238441, length=980944}} to /192.168.122.168:59299; closing >> connection >> java.io.IOException: Broken pipe/ >> >> >> /15:44:28 : WARN TransportChannelHandler: Exception in connection from >> /192.168.122.15:50995 >> java.io.IOException: Connection reset by peer/ (note that it's on another >> executor) >> >> Some time later: >> / >> 15:44:52 DEBUG DFSClient: DFSClient seqno: -2 status: SUCCESS status: >> ERROR >> downstreamAckTimeNanos: 0 >> 15:44:52 WARN DFSClient: DFSOutputStream ResponseProcessor exception for >> block BP-845049430-155.99.144.31-1435598542277:blk_1073742427_1758 >> java.io.IOException: Bad response ERROR for block >> BP-845049430-155.99.144.31-1435598542277:blk_1073742427_1758 from datanode >> x.x.x.x:50010 >> at >> >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)/ >> >> The following two errors appears several times: >> >> /15:51:05 ERROR Executor: Exception in task 19.0 in stage 1.0 (TID 51) >> java.nio.channels.ClosedChannelException >> at >> >> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1528) >> at >> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98) >> at >> >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58) >> at java.io.DataOutputStream.write(DataOutputStream.java:107) >> at >> >> org.apache.hadoop.mapred.TextOutp