Re: Spark-SQL - Query Hanging, How To Troubleshoot

Patrick Tucci Thu, 10 Aug 2023 14:02:03 -0700

Hi Mich,

I don't believe Hive is installed. I set up this cluster from scratch. I
installed Hadoop and Spark by downloading them from their project websites.
If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm
running the Thrift server distributed with Spark, like so:


~/spark/sbin/start-thriftserver.sh --master spark://10.0.50.1:7077

I can look into installing Hive, but it might take some time. I tried to
set up Hive when I first started evaluating distributed data processing
solutions, but I encountered many issues. Spark was much simpler, which was
part of the reason why I chose it.

Thanks again for the reply, I truly appreciate your help.

Patrick

On Thu, Aug 10, 2023 at 3:43 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> sorry host is 10.0.50.1
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 10 Aug 2023 at 20:41, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Hi Patrick
>>
>> That beeline on port 10000 is a hive thrift server running on your hive
>> on host 10.0.50.1:10000.
>>
>> if you can access that host, you should be able to log into hive by
>> typing hive. The os user is hadoop in your case and sounds like there is no
>> password!
>>
>> Once inside that host, hive logs are kept in your case
>> /tmp/hadoop/hive.log or go to /tmp and do
>>
>> /tmp> find ./ -name hive.log. It should be under /tmp/hive.log
>>
>> Try running the sql inside hive and see what it says
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Solutions Architect/Engineering Lead
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 10 Aug 2023 at 20:02, Patrick Tucci <patrick.tu...@gmail.com>
>> wrote:
>>
>>> Hi Mich,
>>>
>>> Thanks for the reply. Unfortunately I don't have Hive set up on my
>>> cluster. I can explore this if there are no other ways to troubleshoot.
>>>
>>> I'm using beeline to run commands against the Thrift server. Here's the
>>> command I use:
>>>
>>> ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:10000 -n hadoop -f
>>> command.sql
>>>
>>> Thanks again for your help.
>>>
>>> Patrick
>>>
>>>
>>> On Thu, Aug 10, 2023 at 2:24 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Can you run this sql query through hive itself?
>>>>
>>>> Are you using this command or similar for your thrift server?
>>>>
>>>> beeline -u jdbc:hive2://<hostname>/10000/default
>>>> org.apache.hive.jdbc.HiveDriver -n hadoop -p xxx
>>>>
>>>> HTH
>>>>
>>>> Mich Talebzadeh,
>>>> Solutions Architect/Engineering Lead
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, 10 Aug 2023 at 18:39, Patrick Tucci <patrick.tu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I'm attempting to run a query on Spark 3.4.0 through the Spark
>>>>> ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in
>>>>> standalone mode using HDFS for storage.
>>>>>
>>>>> The query is as follows:
>>>>>
>>>>> SELECT ME.*, MB.BenefitID
>>>>> FROM MemberEnrollment ME
>>>>> JOIN MemberBenefits MB
>>>>> ON ME.ID = MB.EnrollmentID
>>>>> WHERE MB.BenefitID = 5
>>>>> LIMIT 10
>>>>>
>>>>> The tables are defined as follows:
>>>>>
>>>>> -- Contains about 3M rows
>>>>> CREATE TABLE MemberEnrollment
>>>>> (
>>>>>     ID INT
>>>>>     , MemberID VARCHAR(50)
>>>>>     , StartDate DATE
>>>>>     , EndDate DATE
>>>>>     -- Other columns, but these are the most important
>>>>> ) STORED AS ORC;
>>>>>
>>>>> -- Contains about 25m rows
>>>>> CREATE TABLE MemberBenefits
>>>>> (
>>>>>     EnrollmentID INT
>>>>>     , BenefitID INT
>>>>> ) STORED AS ORC;
>>>>>
>>>>> When I execute the query, it runs a single broadcast exchange stage,
>>>>> which completes after a few seconds. Then everything just hangs. The
>>>>> JDBC/ODBC tab in the UI shows the query state as COMPILED, but no stages 
>>>>> or
>>>>> tasks are executing or pending:
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>> I've let the query run for as long as 30 minutes with no additional
>>>>> stages, progress, or errors. I'm not sure where to start troubleshooting.
>>>>>
>>>>> Thanks for your help,
>>>>>
>>>>> Patrick
>>>>>
>>>>

Re: Spark-SQL - Query Hanging, How To Troubleshoot

Reply via email to