M, Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Are you creating RDD's using textfile option? Can you please let me
>>>> know the following:
>>>> 1. Number of partitions
>>
t;>
>>>
>>> On Tue, Jan 26, 2016 at 1:12 PM, Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> are you creating RDD's out of the data?
>>>>
>>>>
>>>&g
>> Hi,
>>>
>>> are you creating RDD's out of the data?
>>>
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>> On Tue, Jan 26, 2016 at 12:45 PM, aecc wrote:
>>>
>>>> Sorry, I have not been able to
View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> --
he-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail
Sorry, I have not been able to solve the issue. I used speculation mode as
workaround to this.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html
Sent from the Apache Spark User List
when reading data is 7315.
> The maximum size of a file to read is 14G
> The size of the folder is around: 270G
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25367.ht
/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25367.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e
Hi Michael,
Thanks for your answer.
My path is exactly as you mention: s3://my-bucket///
/*.avro
For sure i'm not using wild cards in any other part besides the date. So i
don't think the issue could be that.
The weird thing is that on top of the same data set, randomly in 1 of every
20 jobs one
Reading files directly from Amazon S3 can be frustrating especially if
you're dealing with a large number of input files, could you please
elaborate more on your use-case? Does the S3 bucket in question already
contain a large number of files?
The implementation of the * wildcard operator in S3 i
Any hints?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25365.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Any help on this? this is really blocking me and I don't find any feasible
solution yet.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25327.html
Sent from the Apache Spark User
sage in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
Are you sitting behind a proxy or something? Can you look more into the
executor logs? I have a strange feeling that you are blowing the memory
(and possibly hitting GC etc).
Thanks
Best Regards
On Thu, Sep 10, 2015 at 10:05 PM, Mario Pastorelli <
mario.pastore...@teralytics.ch> wrote:
> Dear co
Dear community,
I am facing a problem accessing data on S3 via Spark. My current
configuration is the following:
- Spark 1.4.1
- Hadoop 2.7.1
- hadoop-aws-2.7.1
- mesos 0.22.1
I am accessing the data using the s3a protocol but it just hangs. The job
runs through the whole data set but
systematica
15 matches
Mail list logo