Hi Mich, I don't believe Hive is installed. I set up this cluster from scratch. I installed Hadoop and Spark by downloading them from their project websites. If Hive isn't bundled with Hadoop or Spark, I don't believe I have it. I'm running the Thrift server distributed with Spark, like so:
~/spark/sbin/start-thriftserver.sh --master spark://10.0.50.1:7077 I can look into installing Hive, but it might take some time. I tried to set up Hive when I first started evaluating distributed data processing solutions, but I encountered many issues. Spark was much simpler, which was part of the reason why I chose it. Thanks again for the reply, I truly appreciate your help. Patrick On Thu, Aug 10, 2023 at 3:43 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > sorry host is 10.0.50.1 > > Mich Talebzadeh, > Solutions Architect/Engineering Lead > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 10 Aug 2023 at 20:41, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Hi Patrick >> >> That beeline on port 10000 is a hive thrift server running on your hive >> on host 10.0.50.1:10000. >> >> if you can access that host, you should be able to log into hive by >> typing hive. The os user is hadoop in your case and sounds like there is no >> password! >> >> Once inside that host, hive logs are kept in your case >> /tmp/hadoop/hive.log or go to /tmp and do >> >> /tmp> find ./ -name hive.log. It should be under /tmp/hive.log >> >> Try running the sql inside hive and see what it says >> >> HTH >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 10 Aug 2023 at 20:02, Patrick Tucci <patrick.tu...@gmail.com> >> wrote: >> >>> Hi Mich, >>> >>> Thanks for the reply. Unfortunately I don't have Hive set up on my >>> cluster. I can explore this if there are no other ways to troubleshoot. >>> >>> I'm using beeline to run commands against the Thrift server. Here's the >>> command I use: >>> >>> ~/spark/bin/beeline -u jdbc:hive2://10.0.50.1:10000 -n hadoop -f >>> command.sql >>> >>> Thanks again for your help. >>> >>> Patrick >>> >>> >>> On Thu, Aug 10, 2023 at 2:24 PM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Can you run this sql query through hive itself? >>>> >>>> Are you using this command or similar for your thrift server? >>>> >>>> beeline -u jdbc:hive2://<hostname>/10000/default >>>> org.apache.hive.jdbc.HiveDriver -n hadoop -p xxx >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> Solutions Architect/Engineering Lead >>>> London >>>> United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Thu, 10 Aug 2023 at 18:39, Patrick Tucci <patrick.tu...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm attempting to run a query on Spark 3.4.0 through the Spark >>>>> ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in >>>>> standalone mode using HDFS for storage. >>>>> >>>>> The query is as follows: >>>>> >>>>> SELECT ME.*, MB.BenefitID >>>>> FROM MemberEnrollment ME >>>>> JOIN MemberBenefits MB >>>>> ON ME.ID = MB.EnrollmentID >>>>> WHERE MB.BenefitID = 5 >>>>> LIMIT 10 >>>>> >>>>> The tables are defined as follows: >>>>> >>>>> -- Contains about 3M rows >>>>> CREATE TABLE MemberEnrollment >>>>> ( >>>>> ID INT >>>>> , MemberID VARCHAR(50) >>>>> , StartDate DATE >>>>> , EndDate DATE >>>>> -- Other columns, but these are the most important >>>>> ) STORED AS ORC; >>>>> >>>>> -- Contains about 25m rows >>>>> CREATE TABLE MemberBenefits >>>>> ( >>>>> EnrollmentID INT >>>>> , BenefitID INT >>>>> ) STORED AS ORC; >>>>> >>>>> When I execute the query, it runs a single broadcast exchange stage, >>>>> which completes after a few seconds. Then everything just hangs. The >>>>> JDBC/ODBC tab in the UI shows the query state as COMPILED, but no stages >>>>> or >>>>> tasks are executing or pending: >>>>> >>>>> [image: image.png] >>>>> >>>>> I've let the query run for as long as 30 minutes with no additional >>>>> stages, progress, or errors. I'm not sure where to start troubleshooting. >>>>> >>>>> Thanks for your help, >>>>> >>>>> Patrick >>>>> >>>>