Please find the answers inline please .
1) Can I apply predicate pushdown filters if I have data stored in S3 or it
should be used only while reading from DBs?
it can be applied in s3 if you store parquet , csv, json or in avro format
.It does not depend on the DB , its supported in object store li
There are a few solutions :
1. Please make sure your driver has enough memory to broadcast the smaller
dataframe .
2. Please change the config "spark.sql.autoBroadcastJoinThreshold": "2g"
this an example
3. please use Hint in the Join , you need to scroll a bit down
https://spark.apache.org/docs/l
Mainly depends what your cluster manager Yarn or kubernates ?
Best
Tufan
On Sun, 10 Jul 2022 at 14:38, Sean Owen wrote:
> Jobs consist of tasks, each of which consumes a core (can be set to >1
> too, but that's a different story). If there are more tasks ready to
> execute than available cores,
so as an average every 4 core , you get back 3.6 core in Yarn , but you can
use only 3 .
in Kubernetes you get back 3.6 and also can use 3.6
Best
Tufan
On Mon, 11 Jul 2022 at 11:02, Yong Walt wrote:
> We were using Yarn. thanks.
>
> On Sun, Jul 10, 2022 at 9:02 PM Tufan Raksh
maybe try intellij or some other IDE with SBT . Maven has been always
magical for me
Best
Tufan
On Sat, 16 Jul 2022 at 00:11, Sean Owen wrote:
> Java 8 binaries are probably on your PATH
>
> On Fri, Jul 15, 2022, 5:01 PM Szymon Kuryło
> wrote:
>
>> Hello,
>>
>> I'm trying to build a Java 11 Sp
Hey
Could you provide some pseudo code ?
Also what kind of machine are you using per executor ? How many cores per
executor ?
What's the size of input data and what's the size of the output ?
What kind of errors are you getting ?
Best
Tufan
On Sun, 17 Jul 2022 at 00:31, Orkhan Dadashov
wrote:
>
Just use Delta
Best
Tufan
Sent from my iPhone
> On 24 Jul 2022, at 12:20, Shay Elbaz wrote:
>
>
> This is a known issue. Apache Iceberg, Hudi and Delta lake and among the
> possible solutions.
> Alternatively, instead of writing the output directly to the "official"
> location, write it t
that's Total Nonsense , EMR is total crap , use kubernetes i will help
you .
can you please provide whats the size of the shuffle file that is getting
generated in each task .
What's the total number of Partitions that you have ?
What machines are you using ? Are you using an SSD ?
Best
Tufan
On
I use m1 apple silicon , use java11 from Zulu , and runs SBT based Build
Jobs in Kubernetes
Best
Tufan
On Tue, 7 Mar 2023 at 16:11, Sean Owen wrote:
> No, it's that JAVA_HOME wasn't set to .../Home. It is simply not finding
> javac, in the error. Zulu supports M1.
>
> On Tue, Mar 7, 2023 at 9:0