Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-27 Thread Gourav Sengupta
Hi, It may be interesting to see this. Can you please create a hivecontext (using standard AWS Spark stack on EMR 4.0) and create a table to read the avro file and read data into a dataframe using hivecontext sql? Please let me know if i can be of any help with this. Regards, Gourav On Wed, Jan

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-27 Thread Erisa Dervishi
Hi, I think I have the same issue mentioned here: https://issues.apache.org/jira/browse/SPARK-8898 I tried to run the job with 1 core and it didn't hang anymore. I can live with that for now, but any suggestions are welcome. Erisa On Tue, Jan 26, 2016 at 4:51 PM, Erisa Dervishi wrote: > Actu

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Erisa Dervishi
Actually now that I was taking a close look at the thread dump, it looks like all the worker threads are in a "Waiting" condition: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionOb

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Gourav Sengupta
Hi, Are you creating RDD's using textfile option? Can you please let me know the following: 1. Number of partitions 2. Number of files 3. Time taken to create the RDD's Regards, Gourav Sengupta On Tue, Jan 26, 2016 at 1:12 PM, Gourav Sengupta wrote: > Hi, > > are you creating RDD's out of th

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Gourav Sengupta
Hi, are you creating RDD's out of the data? Regards, Gourav On Tue, Jan 26, 2016 at 12:45 PM, aecc wrote: > Sorry, I have not been able to solve the issue. I used speculation mode as > workaround to this. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.n

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread aecc
Sorry, I have not been able to solve the issue. I used speculation mode as workaround to this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html Sent from the Apache Spark User List mail

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Erisa Dervishi
Hi, I kind am in your situation now while trying to read from S3. Where you able to find a workaround in the end? Thnx, Erisa On Thu, Nov 12, 2015 at 12:00 PM, aecc wrote: > Some other stats: > > The number of files I have in the folder is 48. > The number of partitions used when reading data

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
Some other stats: The number of files I have in the folder is 48. The number of partitions used when reading data is 7315. The maximum size of a file to read is 14G The size of the folder is around: 270G -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spar

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread Alessandro Chacón
Hi Michael, Thanks for your answer. My path is exactly as you mention: s3://my-bucket/// /*.avro For sure i'm not using wild cards in any other part besides the date. So i don't think the issue could be that. The weird thing is that on top of the same data set, randomly in 1 of every 20 jobs one

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread Michael Cutler
Reading files directly from Amazon S3 can be frustrating especially if you're dealing with a large number of input files, could you please elaborate more on your use-case? Does the S3 bucket in question already contain a large number of files? The implementation of the * wildcard operator in S3 i

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
Any hints? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25365.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-09 Thread aecc
Any help on this? this is really blocking me and I don't find any feasible solution yet. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25327.html Sent from the Apache Spark User List m