starting "export
> CLASSPATH" - you can double check that your jar looks to be included
> correctly there. If it is I think you have a really "interesting" issue on
> your hands!
>
> - scrypso
>
> On Wed, Dec 14, 2022, 05:17 Hariharan wrote:
>
>&g
nd hope it picks that up? (Wild
> guess, no idea if that works or how hard it would be.)
>
>
> On Tue, Dec 13, 2022, 17:29 Hariharan wrote:
>
>> Thanks for the response, scrypso! I will try adding the extraClassPath
>> option. Meanwhile, please f
I can't immediately tell how your error might arise, unless there is some
> timing issue with the Spark and Hadoop setup. Can you share the full
> stacktrace of the ClassNotFound exception? That might tell us when Hadoop
> is looking up this class.
>
> Good luck!
> - scrypso
>
Missed to mention it above, but just to add, the error is coming from the
driver. I tried using *--driver-class-path /path/to/my/jar* as well, but no
luck.
Thanks!
On Mon, Dec 12, 2022 at 4:21 PM Hariharan wrote:
> Hello folks,
>
> I have a spark app with a custom implemen
Hello folks,
I have a spark app with a custom implementation of
*fs.s3a.s3.client.factory.impl* which is packaged into the same jar.
Output of *jar tf*
*2620 Mon Dec 12 11:23:00 IST 2022 aws/utils/MyS3ClientFactory.class*
However when I run the my spark app with spark-submit in cluster mode, it
trock/blob/master/flintrock/core.py#L47>.
Thanks,
Hariharan
On Sat, Apr 10, 2021 at 9:00 AM Dhruv Kumar wrote:
> Hello
>
> I am new to Apache Spark and am looking for some close guidance or
> collaboration for my Spark Project which has the following main components:
>
> 1.
check these recommendations from Cloudera
<https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/s3-performance.html>
for optimal use of S3A.
Thanks,
Hariharan
On Wed, Apr 7, 2021 at 12:15 AM Tzahi File wrote:
> Hi All,
>
> We have a spark cluster on aws ec2 tha
hrowing the error.
~ Hariharan
On Thu, Oct 15, 2020 at 8:56 PM Devi P V wrote:
> hadoop_conf.set("fs.s3a.multipart.size", 104857600L)
>
> .set only allows string values. Its throwing invalid syntax.
>
> I tried following also. But issue not fixed.
>
> hadoop_conf.
fs.s3a.multipart.size needs to be a long value, not a string, so you
will need to use
hadoop_conf.set("fs.s3a.multipart.size", 104857600L)
~ Hariharan
On Thu, Oct 15, 2020 at 6:32 PM Devi P V wrote:
>
> Hi All,
>
> I am trying to write a pyspark dataframe into KMS en
.impl=org.apache.hadoop.fs.s3a.S3A
Hadoop 2.8 and above would have these set by default.
Thanks,
Hariharan
On Thu, Mar 5, 2020 at 2:41 AM Devin Boyer
wrote:
>
> Hello,
>
> I'm attempting to run Spark within a Docker container with the hope of
> eventually running Spark on Kubernete
Akshay Bhardwaj
> +91-97111-33849
>
>
> On Mon, May 20, 2019 at 1:29 PM Hariharan wrote:
>
>> Hi Akshay,
>>
>> I believe HDP uses the capacity scheduler by default. In the capacity
>> scheduler, assignment of multiple containers on the same node is
>> d
Hi Akshay,
I believe HDP uses the capacity scheduler by default. In the capacity
scheduler, assignment of multiple containers on the same node is
determined by the option
yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled,
which is true by default. If you would like YARN to sp
Hi Huizhe,
You can set the "fs.defaultFS" field in core-site.xml to some path on s3.
That way your spark job will use S3 for all operations that need HDFS.
Intermediate data will still be stored on local disk though.
Thanks,
Hari
On Mon, May 20, 2019 at 10:14 AM Abdeali Kothari
wrote:
> While
Hi Jack,
You can try sparklens (https://github.com/qubole/sparklens). I think it
won't give details at as low a level as you're looking for, but it can help
you identify and remove performance bottlenecks.
~ Hariharan
On Fri, Mar 29, 2019 at 12:01 AM bo yang wrote:
> Yeah, thes
14 matches
Mail list logo