The Thread dump result table of Spark UI can provide some clues to find out 
thread locks issue, such as:

  Thread ID | Thread Name                  | Thread State | Thread Locks
  13        | NonBlockingInputStreamThread | WAITING      | Blocked by Thread 
Some(48) Lock(jline.internal.NonBlockingInputStream@103008951})
  48        | Thread-16                    | RUNNABLE     | 
Monitor(jline.internal.NonBlockingInputStream@103008951})

And echo thread row can show the call stacks after being clicked, then you can 
check the root cause of holding locks like this(Thread 48 of above):

  org.fusesource.jansi.internal.Kernel32.ReadConsoleInputW(Native Method)
  
org.fusesource.jansi.internal.Kernel32.readConsoleInputHelper(Kernel32.java:811)
  org.fusesource.jansi.internal.Kernel32.readConsoleKeyInput(Kernel32.java:842)
  
org.fusesource.jansi.internal.WindowsSupport.readConsoleInput(WindowsSupport.java:97)
  jline.WindowsTerminal.readConsoleInput(WindowsTerminal.java:222)
  <snip...>

Hope it can help you.

-- 
Cheers,
-z

On Thu, 16 Apr 2020 16:36:42 +0900
Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote:

> Do thread dump continuously, per specific period (like 1s) and see the
> change of stack / lock for each thread. (This is not easy to be done in UI
> so maybe doing manually would be the only option. Not sure Spark UI will
> provide the same, haven't used at all.)
> 
> It will tell which thread is being blocked (even it's shown as running) and
> which point to look at.
> 
> On Thu, Apr 16, 2020 at 4:29 PM Ruijing Li <liruijin...@gmail.com> wrote:
> 
> > Once I do. thread dump, what should I be looking for to tell where it is
> > hanging? Seeing a lot of timed_waiting and waiting on driver. Driver is
> > also being blocked by spark UI. If there are no tasks, is there a point to
> > do thread dump of executors?
> >
> > On Tue, Apr 14, 2020 at 4:49 AM Gabor Somogyi <gabor.g.somo...@gmail.com>
> > wrote:
> >
> >> The simplest way is to do thread dump which doesn't require any fancy
> >> tool (it's available on Spark UI).
> >> Without thread dump it's hard to say anything...
> >>
> >>
> >> On Tue, Apr 14, 2020 at 11:32 AM jane thorpe <janethor...@aol.com.invalid>
> >> wrote:
> >>
> >>> Here a is another tool I use Logic Analyser  7:55
> >>> https://youtu.be/LnzuMJLZRdU
> >>>
> >>> you could take some suggestions for improving performance  queries.
> >>> https://dzone.com/articles/why-you-should-not-use-select-in-sql-query-1
> >>>
> >>>
> >>> Jane thorpe
> >>> janethor...@aol.com
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <janethor...@aol.com.INVALID>
> >>> To: janethorpe1 <janethor...@aol.com>; mich.talebzadeh <
> >>> mich.talebza...@gmail.com>; liruijing09 <liruijin...@gmail.com>; user <
> >>> user@spark.apache.org>
> >>> Sent: Mon, 13 Apr 2020 8:32
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing Removing
> >>> Guess work from trouble shooting
> >>>
> >>>
> >>>
> >>> This tool may be useful for you to trouble shoot your problems away.
> >>>
> >>>
> >>> https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html
> >>>
> >>>
> >>> "APM tools typically use a waterfall-type view to show the blocking
> >>> time of different components cascading through the control flow within an
> >>> application.
> >>> These types of visualizations are useful, and AppOptics has them, but
> >>> they can be difficult to understand for those of us without a PhD."
> >>>
> >>> Especially  helpful if you want to understand through visualisation and
> >>> you do not have a phD.
> >>>
> >>>
> >>> Jane thorpe
> >>> janethor...@aol.com
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: jane thorpe <janethor...@aol.com.INVALID>
> >>> To: mich.talebzadeh <mich.talebza...@gmail.com>; liruijing09 <
> >>> liruijin...@gmail.com>; user <user@spark.apache.org>
> >>> CC: user <user@spark.apache.org>
> >>> Sent: Sun, 12 Apr 2020 4:35
> >>> Subject: Re: Spark hangs while reading from jdbc - does nothing
> >>>
> >>> You seem to be implying the error is intermittent.
> >>> You seem to be implying data is being ingested  via JDBC. So the
> >>> connection has proven itself to be working unless no data is arriving from
> >>> the  JDBC channel at all.  If no data is arriving then one could say it
> >>> could be  the JDBC.
> >>> If the error is intermittent  then it is likely a resource involved in
> >>> processing is filling to capacity.
> >>> Try reducing the data ingestion volume and see if that completes, then
> >>> increase the data ingested  incrementally.
> >>> I assume you have  run the job on small amount of data so you have
> >>> completed your prototype stage successfully.
> >>>
> >>> ------------------------------
> >>> On Saturday, 11 April 2020 Mich Talebzadeh <mich.talebza...@gmail.com>
> >>> wrote:
> >>> Hi,
> >>>
> >>> Have you checked your JDBC connections from Spark to Oracle. What is
> >>> Oracle saying? Is it doing anything or hanging?
> >>>
> >>> set pagesize 9999
> >>> set linesize 140
> >>> set heading off
> >>> select SUBSTR(name,1,8) || ' sessions as on '||TO_CHAR(CURRENT_DATE,
> >>> 'MON DD YYYY HH:MI AM') from v$database;
> >>> set heading on
> >>> column spid heading "OS PID" format a6
> >>> column process format a13 heading "Client ProcID"
> >>> column username  format a15
> >>> column sid       format 999
> >>> column serial#   format 99999
> >>> column STATUS    format a3 HEADING 'ACT'
> >>> column last      format 9,999.99
> >>> column TotGets   format 999,999,999,999 HEADING 'Logical I/O'
> >>> column phyRds    format 999,999,999 HEADING 'Physical I/O'
> >>> column total_memory format 999,999,999 HEADING 'MEM/KB'
> >>> --
> >>> SELECT
> >>>           substr(a.username,1,15) "LOGIN"
> >>>         , substr(a.sid,1,5) || ','||substr(a.serial#,1,5) AS
> >>> "SID/serial#"
> >>>         , TO_CHAR(a.logon_time, 'DD/MM HH:MI') "LOGGED IN SINCE"
> >>>         , substr(a.machine,1,10) HOST
> >>>         , substr(p.username,1,8)||'/'||substr(p.spid,1,5) "OS PID"
> >>>         , substr(a.osuser,1,8)||'/'||substr(a.process,1,5) "Client PID"
> >>>         , substr(a.program,1,15) PROGRAM
> >>>         --,ROUND((CURRENT_DATE-a.logon_time)*24) AS "Logged/Hours"
> >>>         , (
> >>>                 select round(sum(ss.value)/1024) from v$sesstat ss,
> >>> v$statname sn
> >>>                 where ss.sid = a.sid and
> >>>                         sn.statistic# = ss.statistic# and
> >>>                         -- sn.name in ('session pga memory')
> >>>                         sn.name in ('session pga memory','session uga
> >>> memory')
> >>>           ) AS total_memory
> >>>         , (b.block_gets + b.consistent_gets) TotGets
> >>>         , b.physical_reads phyRds
> >>>         , decode(a.status, 'ACTIVE', 'Y','INACTIVE', 'N') STATUS
> >>>         , CASE WHEN a.sid in (select sid from v$mystat where rownum = 1)
> >>> THEN '<-- YOU' ELSE ' ' END "INFO"
> >>> FROM
> >>>          v$process p
> >>>         ,v$session a
> >>>         ,v$sess_io b
> >>> WHERE
> >>> a.paddr = p.addr
> >>> AND p.background IS NULL
> >>> --AND  a.sid NOT IN (select sid from v$mystat where rownum = 1)
> >>> AND a.sid = b.sid
> >>> AND a.username is not null
> >>> --AND (a.last_call_et < 3600 or a.status = 'ACTIVE')
> >>> --AND CURRENT_DATE - logon_time > 0
> >>> --AND a.sid NOT IN ( select sid from v$mystat where rownum=1)  --
> >>> exclude me
> >>> --AND (b.block_gets + b.consistent_gets) > 0
> >>> ORDER BY a.username;
> >>> exit
> >>>
> >>> HTH
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>> LinkedIn * 
> >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> >>> any loss, damage or destruction of data or any other property which may
> >>> arise from relying on this email's technical content is explicitly
> >>> disclaimed. The author will in no case be liable for any monetary damages
> >>> arising from such loss, damage or destruction.
> >>>
> >>>
> >>>
> >>> On Fri, 10 Apr 2020 at 17:37, Ruijing Li <liruijin...@gmail.com> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I am on spark 2.4.4 and using scala 2.11.12, and running cluster mode on
> >>> mesos. I am ingesting from an oracle database using spark.read.jdbc. I am
> >>> seeing a strange issue where spark just hangs and does nothing, not
> >>> starting any new tasks. Normally this job finishes in 30 stages but
> >>> sometimes it stops at 29 completed stages and doesn’t start the last 
> >>> stage.
> >>> The spark job is idling and there is no pending or active task. What could
> >>> be the problem? Thanks.
> >>> --
> >>> Cheers,
> >>> Ruijing Li
> >>>
> >>> --
> > Cheers,
> > Ruijing Li
> >

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to