Re: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

2015-06-25 Thread Max Demoulin
I see, thank you!

--
Henri Maxime Demoulin

2015-06-25 5:54 GMT-04:00 Steve Loughran :

> you are using a guava version on the classpath which your version of
> Hadoop can't handle. try a version < 15 or build spark against Hadoop 2.7.0
>
> > On 24 Jun 2015, at 19:03, maxdml  wrote:
> >
> >Exception in thread "main" java.lang.NoSuchMethodError:
> > com.google.common.base.Stopwatch.elapsedMillis()J
> >at
> >
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:245)
> >at
> >
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>
>


Re: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

2015-06-25 Thread Max Demoulin
Can I actually include another version of guava in the classpath when
launching the example through spark submit?

--
Henri Maxime Demoulin

2015-06-25 10:57 GMT-04:00 Max Demoulin :

> I see, thank you!
>
> --
> Henri Maxime Demoulin
>
> 2015-06-25 5:54 GMT-04:00 Steve Loughran :
>
>> you are using a guava version on the classpath which your version of
>> Hadoop can't handle. try a version < 15 or build spark against Hadoop 2.7.0
>>
>> > On 24 Jun 2015, at 19:03, maxdml  wrote:
>> >
>> >Exception in thread "main" java.lang.NoSuchMethodError:
>> > com.google.common.base.Stopwatch.elapsedMillis()J
>> >at
>> >
>> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:245)
>> >at
>> >
>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>>
>>
>


Re: Directory creation failed leads to job fail (should it?)

2015-06-29 Thread Max Demoulin
The underlying issue is a filesystem corruption on the workers.

In the case where I use hdfs, with a sufficient amount of replica, would
Spark try to launch a task on another node where the block replica is
present?

Thanks :-)

--
Henri Maxime Demoulin

2015-06-29 9:10 GMT-04:00 ayan guha :

> No, spark can not do that as it does not replicate partitions (so no retry
> on different worker). It seems your cluster is not provisioned with correct
> permissions. I would suggest to automate node provisioning.
>
> On Mon, Jun 29, 2015 at 11:04 PM, maxdml  wrote:
>
>> Hi there,
>>
>> I have some traces from my master and some workers where for some reason,
>> the ./work directory of an application can not be created on the workers.
>> There is also an issue with the master's temp directory creation.
>>
>> master logs: http://pastebin.com/v3NCzm0u
>> worker's logs: http://pastebin.com/Ninkscnx
>>
>> It seems that some of the executors can create the directories, but as
>> some
>> others are repetitively failing, the job ends up failing. Shouldn't spark
>> manage to keep working with a smallest number of executors instead of
>> failing?
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Directory creation failed leads to job fail (should it?)

2015-06-29 Thread Max Demoulin
I see. Thank you for your help!

--
Henri Maxime Demoulin

2015-06-29 11:57 GMT-04:00 ayan guha :

> It's a scheduler question. Spark will retry the task on the same worker.
> From spark standpoint data is not replicated because spark provides fault
> tolerance but lineage not by replication.
> On 30 Jun 2015 01:50, "Max Demoulin"  wrote:
>
>> The underlying issue is a filesystem corruption on the workers.
>>
>> In the case where I use hdfs, with a sufficient amount of replica, would
>> Spark try to launch a task on another node where the block replica is
>> present?
>>
>> Thanks :-)
>>
>> --
>> Henri Maxime Demoulin
>>
>> 2015-06-29 9:10 GMT-04:00 ayan guha :
>>
>>> No, spark can not do that as it does not replicate partitions (so no
>>> retry on different worker). It seems your cluster is not provisioned with
>>> correct permissions. I would suggest to automate node provisioning.
>>>
>>> On Mon, Jun 29, 2015 at 11:04 PM, maxdml  wrote:
>>>
>>>> Hi there,
>>>>
>>>> I have some traces from my master and some workers where for some
>>>> reason,
>>>> the ./work directory of an application can not be created on the
>>>> workers.
>>>> There is also an issue with the master's temp directory creation.
>>>>
>>>> master logs: http://pastebin.com/v3NCzm0u
>>>> worker's logs: http://pastebin.com/Ninkscnx
>>>>
>>>> It seems that some of the executors can create the directories, but as
>>>> some
>>>> others are repetitively failing, the job ends up failing. Shouldn't
>>>> spark
>>>> manage to keep working with a smallest number of executors instead of
>>>> failing?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Directory-creation-failed-leads-to-job-fail-should-it-tp23531.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>


Re: Master doesn't start, no logs

2015-07-07 Thread Max Demoulin
Yes, I do set $SPARK_MASTER_IP. I suspect a more "internal" issue, maybe
due to multiple spark/hdfs instances having successively run on the same
machine?

--
Henri Maxime Demoulin

2015-07-07 4:10 GMT-04:00 Akhil Das :

> Strange. What are you having in $SPARK_MASTER_IP? It may happen that it is
> not able to bind to the given ip but again it should be in the logs.
>
> Thanks
> Best Regards
>
> On Tue, Jul 7, 2015 at 12:54 AM, maxdml  wrote:
>
>> Hi,
>>
>> I've been compiling spark 1.4.0 with SBT, from the source tarball
>> available
>> on the official website. I cannot run spark's master, even tho I have
>> built
>> and run several other instance of spark on the same machine (spark 1.3,
>> master branch, pre built 1.4, ...)
>>
>> /starting org.apache.spark.deploy.master.Master, logging to
>>
>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out
>> failed to launch org.apache.spark.deploy.master.Master:
>> full log in
>>
>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out/
>>
>> But the log file is empty.
>>
>> After digging up to ./bin/spark-class, and finally trying to start the
>> master with:
>>
>> ./bin/spark-class org.apache.spark.deploy.master.Master --host
>> 155.99.144.31
>>
>> I still have the same result. Here is the strace output for this command:
>>
>> http://pastebin.com/bkJVncBm
>>
>> I'm using a 64 bit Xeon, CentOS 6.5, spark 1.4.0, compiled against hadoop
>> 2.5.2
>>
>> Any idea? :-)
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-tp23651.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Master doesn't start, no logs

2015-07-07 Thread Max Demoulin
Thanks,

I tried that, and the result was the same.

I still can start a master from the spark-1.4.0-bin-hadoop2.4 pre-built
version thought

I don't really know what to show more than the strace that I already
linked, so I could use any hint for that.

--
Henri Maxime Demoulin

2015-07-07 9:53 GMT-04:00 Akhil Das :

> Can you try renaming the ~/.ivy2 file to ~/.ivy2_backup and build
> spark1.4.0 again and run it?
>
> Thanks
> Best Regards
>
> On Tue, Jul 7, 2015 at 6:27 PM, Max Demoulin 
> wrote:
>
>> Yes, I do set $SPARK_MASTER_IP. I suspect a more "internal" issue, maybe
>> due to multiple spark/hdfs instances having successively run on the same
>> machine?
>>
>> --
>> Henri Maxime Demoulin
>>
>> 2015-07-07 4:10 GMT-04:00 Akhil Das :
>>
>>> Strange. What are you having in $SPARK_MASTER_IP? It may happen that it
>>> is not able to bind to the given ip but again it should be in the logs.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Tue, Jul 7, 2015 at 12:54 AM, maxdml  wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been compiling spark 1.4.0 with SBT, from the source tarball
>>>> available
>>>> on the official website. I cannot run spark's master, even tho I have
>>>> built
>>>> and run several other instance of spark on the same machine (spark 1.3,
>>>> master branch, pre built 1.4, ...)
>>>>
>>>> /starting org.apache.spark.deploy.master.Master, logging to
>>>>
>>>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out
>>>> failed to launch org.apache.spark.deploy.master.Master:
>>>> full log in
>>>>
>>>> /mnt/spark-1.4.0/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-xx.out/
>>>>
>>>> But the log file is empty.
>>>>
>>>> After digging up to ./bin/spark-class, and finally trying to start the
>>>> master with:
>>>>
>>>> ./bin/spark-class org.apache.spark.deploy.master.Master --host
>>>> 155.99.144.31
>>>>
>>>> I still have the same result. Here is the strace output for this
>>>> command:
>>>>
>>>> http://pastebin.com/bkJVncBm
>>>>
>>>> I'm using a 64 bit Xeon, CentOS 6.5, spark 1.4.0, compiled against
>>>> hadoop
>>>> 2.5.2
>>>>
>>>> Any idea? :-)
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-tp23651.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>


Re: Issues when combining Spark and a third party java library

2015-07-12 Thread Max Demoulin
Yes,

Thank you.

--
Henri Maxime Demoulin

2015-07-12 2:53 GMT-04:00 Akhil Das :

> Did you try setting the HADOOP_CONF_DIR?
>
> Thanks
> Best Regards
>
> On Sat, Jul 11, 2015 at 3:17 AM, maxdml  wrote:
>
>> Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
>> and higher from the official website.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: HDFS performances + unexpected death of executors.

2015-07-14 Thread Max Demoulin
I will try a fresh setup very soon.

Actually, I tried to compile spark by myself, against hadoop 2.5.2, but I
had the issue that I mentioned in this thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Master-doesn-t-start-no-logs-td23651.html

I was wondering if maybe serialization/deserialization configuration could
be the reason of my executor losses.

--
Henri Maxime Demoulin

2015-07-14 3:41 GMT-04:00 Akhil Das :

> This is more like a version miss-match between your spark binaries and the
> hadoop. I have not tried accessing hadoop 2.5.x with spark 1.4.0 pre-build
> against hadoop 2.4, If possible you could upgrade your hadoop to 2.6 and
> download the spark binaries for that version, Or you can download the spark
> source and compile it with hadoop 2.5 version.
>
> Thanks
> Best Regards
>
> On Tue, Jul 14, 2015 at 2:18 AM, maxdml  wrote:
>
>> Hi,
>>
>> I have several issues related to HDFS, that may have different roots. I'm
>> posting as much information as I can, with the hope that I can get your
>> opinion on at least some of them. Basically the cases are:
>>
>> - HDFS classes not found
>> - Connections with some datanode seems to be slow/ unexpectedly close.
>> - Executors become lost (and cannot be relaunched due to an out of memory
>> error)
>>
>> *
>> What I'm looking for:
>> - HDFS misconfiguration/ tuning advises
>> - Global setup flaws (impact of VMs and NUMA mismatch, for example)
>> - For the last category of issue, I'd like to know why, when the executor
>> dies, JVM's memory is not freed, thus not allowing a new executor to be
>> launched.*
>>
>> My setup is the following:
>> 1 hypervisor with 32 cores and 50 GB of RAM, 5 VMs running in this hv.
>> Each
>> vms has 5 cores and 7GB.
>> Each node has 1 worker setup with 4 cores 6 GB available (the remaining
>> resources are intended to be used by hdfs/os
>>
>> I run a Wordcount workload with a dataset of 4GB, on a spark 1.4.0 / hdfs
>> 2.5.2 setup. I got the binaries from official websites (no local
>> compiling).
>>
>> (1) & 2) are logged on the worker, in the work/app-id/exec-id/stderr file)
>>
>> *1) Hadoop class related issues*
>>
>> /15:34:32: DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop classes
>> are unavailable. Using the older Hadoop location info code.
>> java.lang.ClassNotFoundException:
>> org.apache.hadoop.mapred.InputSplitWithLocationInfo/
>>
>> /
>> 15:40:46: DEBUG SparkHadoopUtil: Couldn't find method for retrieving
>> thread-level FileSystem input data
>> java.lang.NoSuchMethodException:
>> org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()/
>>
>>
>> *2) HDFS performance related issues*
>>
>> The following error arise:
>>
>> / 15:43:16: ERROR TransportRequestHandler: Error sending result
>> ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=284992323013,
>> chunkIndex=2},
>>
>> buffer=FileSegmentManagedBuffer{file=/tmp/spark-b17f3299-99f3-4147-929f-1f236c812d0e/executor-d4ceae23-b9d9-4562-91c2-2855baeb8664/blockmgr-10da9c53-c20a-45f7-a430-2e36d799c7e1/2f/shuffle_0_14_0.data,
>> offset=15464702, length=998530}} to /192.168.122.168:59299; closing
>> connection
>> java.io.IOException: Broken pipe/
>>
>> /15:43:16 ERROR TransportRequestHandler: Error sending result
>> ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=284992323013,
>> chunkIndex=0},
>>
>> buffer=FileSegmentManagedBuffer{file=/tmp/spark-b17f3299-99f3-4147-929f-1f236c812d0e/executor-d4ceae23-b9d9-4562-91c2-2855baeb8664/blockmgr-10da9c53-c20a-45f7-a430-2e36d799c7e1/31/shuffle_0_12_0.data,
>> offset=15238441, length=980944}} to /192.168.122.168:59299; closing
>> connection
>> java.io.IOException: Broken pipe/
>>
>>
>> /15:44:28 : WARN TransportChannelHandler: Exception in connection from
>> /192.168.122.15:50995
>> java.io.IOException: Connection reset by peer/ (note that it's on another
>> executor)
>>
>> Some time later:
>> /
>> 15:44:52 DEBUG DFSClient: DFSClient seqno: -2 status: SUCCESS status:
>> ERROR
>> downstreamAckTimeNanos: 0
>> 15:44:52 WARN DFSClient: DFSOutputStream ResponseProcessor exception  for
>> block BP-845049430-155.99.144.31-1435598542277:blk_1073742427_1758
>> java.io.IOException: Bad response ERROR for block
>> BP-845049430-155.99.144.31-1435598542277:blk_1073742427_1758 from datanode
>> x.x.x.x:50010
>> at
>>
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:819)/
>>
>> The following two errors appears several times:
>>
>> /15:51:05 ERROR Executor: Exception in task 19.0 in stage 1.0 (TID 51)
>> java.nio.channels.ClosedChannelException
>> at
>>
>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1528)
>> at
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>> at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
>> at java.io.DataOutputStream.write(DataOutputStream.java:107)
>> at
>>
>> org.apache.hadoop.mapred.TextOutp