Hello,
I'm attempting to run a query on Spark 3.4.0 through the Spark
ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in
standalone mode using HDFS for storage.
The query is as follows:
SELECT ME.*, MB.BenefitID
FROM MemberEnrollment ME
JOIN MemberBenefits MB
ON ME.ID = MB.EnrollmentID
WHERE MB.BenefitID = 5
LIMIT 10
The tables are defined as follows:
-- Contains about 3M rows
CREATE TABLE MemberEnrollment
(
ID INT
, MemberID VARCHAR(50)
, StartDate DATE
, EndDate DATE
-- Other columns, but these are the most important
) STORED AS ORC;
-- Contains about 25m rows
CREATE TABLE MemberBenefits
(
EnrollmentID INT
, BenefitID INT
) STORED AS ORC;
When I execute the query, it runs a single broadcast exchange stage, which
completes after a few seconds. Then everything just hangs. The JDBC/ODBC
tab in the UI shows the query state as COMPILED, but no stages or tasks are
executing or pending:
[image: image.png]
I've let the query run for as long as 30 minutes with no additional stages,
progress, or errors. I'm not sure where to start troubleshooting.
Thanks for your help,
Patrick