Ah sorry I made a mistake. "Spark can only pick BroadcastNestedLoopJoin to
implement left/right join" this should be "left/right non-equal join"
On Thu, Oct 24, 2019 at 6:32 AM zhangliyun wrote:
>
> Hi Herman:
>I guess what you mentioned before
> ```
> if you are OK with slightly different N
Hi Herman:
I guess what you mentioned before
```
if you are OK with slightly different NULL semantics then you could use NOT
EXISTS(subquery). The latter should perform a lot better.
```
is the NULL key1 of left table will be retained if NULL key2 is not found in
the right table ( join
Hi all:
From google , I know that:
Spark can only pick BroadcastNestedLoopJoin to implement left/right join.
but why I use following case , broascastnestedLoopJoin became Sortmerged
join when set spark.sql.autoBroadcastJoinThreshold=-1;
{code}
set spark
Hello,
In a non streaming application, I am using the checkpoint feature to
truncate the lineage of complex datasets. At the end of the job, the
checkpointed data, which is stored in HDFS, is deleted.
I am looking for a way to delete the unused checkpointed data earlier than
the end of the job. If
Thanks Abhisehk I was able to resolve the issue.
I was building an assembly jar which has some unwanted spring and netty
classes. Because of which I was getting that exception.
Regards
Manish Gupta
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Hi
I am trying to run spark submit on kubernetes. I am able to achieve the
desired results in a way that driver and executors are getting launched as
per the given configuration and my job is able to run successfully.
*But even after job completion spark driver pod is always in Running state
and
I haven't looked into your query yet, just want to let you know that: Spark
can only pick BroadcastNestedLoopJoin to implement left/right join. If the
table is very big, then OOM happens.
Maybe there is an algorithm to implement left/right join in a distributed
environment without broadcast, but c
Hi,
Were you able to check the executors logs for this? If executors are
running in a separate JVMs/machines, they will have separate log files from
driver. If the OOME is due to concatenation of the large string, it may be
reported in the executors logs first.
How are you running this spark job?
Hi all:
i want to ask a question about broadcast nestloop join? from google i know,
that
left outer/semi join and right outer/semi join will use broadcast nestloop.
and in some cases, when the input data is very small, it is suitable to use.
so here
how to define the input data very small?