Hi,
may be this is useful in case someone is testing SPARK in containers for
developing SPARK.
*From a production scale work point of view:*
But if I am in AWS, I will just use GLUE if I want to use containers for
SPARK, without massively increasing my costs for operations unnecessarily.
Also, in
Hi Gaurav, All,
I'm doing a spark-submit from my local system to a GCP Dataproc cluster ..
This is more for dev/testing.
I can run a -- 'gcloud dataproc jobs submit' command as well, which is what
will be done in Production.
Hope that clarifies.
regds,
Karan Alang
On Sat, Feb 12, 2022 at 10:31
Hi Holden,
when you mention - GS Access jar - which jar is this ?
Can you pls clarify ?
thanks,
Karan Alang
On Sat, Feb 12, 2022 at 11:10 AM Holden Karau wrote:
> You can also put the GS access jar with your Spark jars — that’s what the
> class not found exception is pointing you towards.
>
>
Thanks, Mich - will check this and update.
regds,
Karan Alang
On Sat, Feb 12, 2022 at 1:57 AM Mich Talebzadeh
wrote:
> BTW I also answered you in in stackoverflow :
>
>
> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
>
> HTH
>
>
>view my Lin
Hi, Danilo.
Do you have a single large file, only?
If so, I guess you can use tools like sed/awk to split it into more files
based on layout, so you can read these files into Spark.
Em qua, 9 de fev de 2022 09:30, Bitfox escreveu:
> Hi
>
> I am not sure about the total situation.
> But if you w
Putting the GS access jar with Spark jars may technically resolve the
issue of spark-submit but it is not a recommended practice to create a
local copy of jar files.
The approach that the thread owner adopted by putting the files in Google
cloud bucket is correct. Indeed this is what he states a