Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999
https://issues.apache.org/jira/browse/HDFS-7005
https://issues.apache.org/jira/browse/HDFS-7145

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

What problems could the flink-shaded-hadoop jar being included introduce?

*// *ah**

*From:*Chesnay Schepler <ches...@apache.org>
*Sent:* Thursday, May 28, 2020 9:26 AM
*To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>; user@flink.apache.org
*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

On 27/05/2020 20:57, Hailu, Andreas wrote:

    Hi Chesney, apologies for not getting back to you sooner here. So
    I did what you suggested - I downloaded a few files from my
    jobmanager.archive.fs.dir HDFS directory to a locally available
    directory named
    /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then
    changed my historyserver.archive.fs.dir to
    file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/
    and that seemed to work. I’m able to see the history of the
    applications I downloaded. So this points to a problem with
    sourcing the history from HDFS.

    Do you think this could be classpath related? This is what we use
    for our HADOOP_CLASSPATH var:

    
//gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar/

    //

    You can see we have references to Hadoop mapred/yarn/hdfs libs in
    there.

    *// *ah

    *From:*Chesnay Schepler <ches...@apache.org>
    <mailto:ches...@apache.org>
    *Sent:* Sunday, May 3, 2020 6:00 PM
    *To:* Hailu, Andreas [Engineering] <andreas.ha...@ny.email.gs.com>
    <mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
    <mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

    yes, exactly; I want to rule out that (somehow) HDFS is the problem.

    I couldn't reproduce the issue locally myself so far.

    On 01/05/2020 22:31, Hailu, Andreas wrote:

        Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve
        only just started to archive them in the past couple weeks.
        Could you clarify on how you want to try local filesystem
        archives? As in changing jobmanager.archive.fs.dir and
        historyserver.web.tmpdir to the same local directory?

        *// *ah

        *From:*Chesnay Schepler <ches...@apache.org>
        <mailto:ches...@apache.org>
        *Sent:* Wednesday, April 29, 2020 8:26 AM
        *To:* Hailu, Andreas [Engineering]
        <andreas.ha...@ny.email.gs.com>
        <mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
        <mailto:user@flink.apache.org>
        *Subject:* Re: History Server Not Showing Any Jobs - File Not
        Found?

        hmm...let's see if I can reproduce the issue locally.

        Are the archives from the same version the history server runs
        on? (Which I supposed would be 1.9.1?)

        Just for the sake of narrowing things down, it would also be
        interesting to check if it works with the archives residing in
        the local filesystem.

        On 27/04/2020 18:35, Hailu, Andreas wrote:

            bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

            total 8

            drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
            flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

            drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
            flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

            There are just two directories in here. I don’t see cache
            directories from my attempts today, which is interesting.
            Looking a little deeper into them:

            bash-4.1$ ls -lr
            
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

            total 1756

            drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

            bash-4.1$ ls -lr
            
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

            total 0

            -rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43
            overview.json

            There are indeed archives already in HDFS – I’ve included
            some in my initial mail, but here they are again just for
            reference:

            -bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

            Found 44282 items

            -rw-r----- 3 delp datalake_admin_dev      50569 2020-03-21
            23:17
            /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

            -rw-r----- 3 delp datalake_admin_dev      49578 2020-03-03
            08:45
            /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

            -rw-r----- 3 delp datalake_admin_dev      50842 2020-03-24
            15:19
            /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

            ...

            *// *ah

            *From:*Chesnay Schepler <ches...@apache.org>
            <mailto:ches...@apache.org>
            *Sent:* Monday, April 27, 2020 10:28 AM
            *To:* Hailu, Andreas [Engineering]
            <andreas.ha...@ny.email.gs.com>
            <mailto:andreas.ha...@ny.email.gs.com>;
            user@flink.apache.org <mailto:user@flink.apache.org>
            *Subject:* Re: History Server Not Showing Any Jobs - File
            Not Found?

            If historyserver.web.tmpdir is not set then java.io.tmpdir
            is used, so that should be fine.

            What are the contents of
            /local/scratch/flink_historyserver_tmpdir?

            I assume there are already archives in HDFS?

            On 27/04/2020 16:02, Hailu, Andreas wrote:

                My machine’s /tmp directory is not large enough to
                support the archived files, so I changed my
                java.io.tmpdir to be in some other location which is
                significantly larger. I hadn’t set anything for
                historyserver.web.tmpdir, so I suspect it was still
                pointing at /tmp. I just tried setting
                historyserver.web.tmpdir to the same location as my
                java.io.tmpdir location, but I’m afraid I’m still
                seeing the following issue:

                2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG
                HistoryServerStaticFileServerHandler - Unable to load
                requested file /overview.json from classloader

                2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG
                HistoryServerStaticFileServerHandler - Unable to load
                requested file /jobs/overview.json from classloader

                flink-conf.yaml for reference:

                jobmanager.archive.fs.dir:
                hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                historyserver.archive.fs.dir:
                hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                historyserver.web.tmpdir:
                /local/scratch/flink_historyserver_tmpdir/

                Did you have anything else in mind when you said
                pointing somewhere funny?

                *// *ah

                *From:*Chesnay Schepler <ches...@apache.org>
                <mailto:ches...@apache.org>
                *Sent:* Monday, April 27, 2020 5:56 AM
                *To:* Hailu, Andreas [Engineering]
                <andreas.ha...@ny.email.gs.com>
                <mailto:andreas.ha...@ny.email.gs.com>;
                user@flink.apache.org <mailto:user@flink.apache.org>
                *Subject:* Re: History Server Not Showing Any Jobs -
                File Not Found?

                overview.json is a generated file that is placed in
                the local directory controlled by
                /historyserver.web.tmpdir/.

                Have you configured this option to point to some
                non-local filesystem? (Or if not, is the
                java.io.tmpdir property pointing somewhere funny?)

                On 24/04/2020 18:24, Hailu, Andreas wrote:

                    I’m having a further look at the code in
                    HistoryServerStaticFileServerHandler - is there an
                    assumption about where overview.json is supposed
                    to be located?

                    *// *ah

                    *From:*Hailu, Andreas [Engineering]
                    *Sent:* Wednesday, April 22, 2020 1:32 PM
                    *To:* 'Chesnay Schepler' <ches...@apache.org>
                    <mailto:ches...@apache.org>; Hailu, Andreas
                    [Engineering] <andreas.ha...@ny.email.gs.com>
                    <mailto:andreas.ha...@ny.email.gs.com>;
                    user@flink.apache.org <mailto:user@flink.apache.org>
                    *Subject:* RE: History Server Not Showing Any Jobs
                    - File Not Found?

                    Hi Chesnay, thanks for responding. We’re using
                    Flink 1.9.1. I enabled DEBUG level logging and
                    this is something relevant I see:

                    2020-04-22 13:25:52,566
                    [Flink-HistoryServer-ArchiveFetcher-thread-1]
                    DEBUG DFSInputStream - Connecting to datanode
                    10.79.252.101:1019

                    2020-04-22 13:25:52,567
                    [Flink-HistoryServer-ArchiveFetcher-thread-1]
                    DEBUG SaslDataTransferClient - SASL encryption
                    trust check: localHostTrusted = false,
                    remoteHostTrusted = false

                    2020-04-22 13:25:52,567
                    [Flink-HistoryServer-ArchiveFetcher-thread-1]
                    DEBUG SaslDataTransferClient - SASL client
                    skipping handshake in secured configuration with
                    privileged port for addr = /10.79.252.101,
                    datanodeId = DatanodeI

                    
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

                    *2020-04-22 13:25:52,571
                    [Flink-HistoryServer-ArchiveFetcher-thread-1]
                    DEBUG DFSInputStream - DFSInputStream has been
                    closed already*

                    *2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6]
                    DEBUG HistoryServerStaticFileServerHandler -
                    Unable to load requested file /jobs/overview.json
                    from classloader*

                    2020-04-22 13:25:52,576 [IPC Parameter Sending
                    Thread #0] DEBUG Client$Connection$3 - IPC Client
                    (1578587450) connection to
                    d279536-002.dc.gs.com/10.59.61.87:8020 from
                    d...@gs.com <mailto:d...@gs.com> sending #1391

                    Aside from that, it looks like a lot of logging
                    around datanodes and block location metadata. Did
                    I miss something in my classpath, perhaps? If so,
                    do you have a suggestion on what I could try?

                    *// *ah

                    *From:*Chesnay Schepler <ches...@apache.org
                    <mailto:ches...@apache.org>>
                    *Sent:* Wednesday, April 22, 2020 2:16 AM
                    *To:* Hailu, Andreas [Engineering]
                    <andreas.ha...@ny.email.gs.com
                    <mailto:andreas.ha...@ny.email.gs.com>>;
                    user@flink.apache.org <mailto:user@flink.apache.org>
                    *Subject:* Re: History Server Not Showing Any Jobs
                    - File Not Found?

                    Which Flink version are you using?

                    Have you checked the history server logs after
                    enabling debug logging?

                    On 21/04/2020 17:16, Hailu, Andreas [Engineering]
                    wrote:

                        Hi,

                        I’m trying to set up the History Server, but
                        none of my applications are showing up in the
                        Web UI. Looking at the console, I see that all
                        of the calls to /overview return the following
                        404 response: {"errors":["File not found."]}.

                        I’ve set up my configuration as follows:

                        JobManager Archive directory:

                        *jobmanager.archive.fs.dir*:
                        hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                        -bash-4.1$ hdfs dfs -ls
                        /user/p2epda/lake/delp_qa/flink_hs

                        Found 44282 items

                        -rw-r----- 3 delp datalake_admin_dev     
                        50569 2020-03-21 23:17
                        
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

                        -rw-r----- 3 delp datalake_admin_dev     
                        49578 2020-03-03 08:45
                        
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

                        -rw-r----- 3 delp datalake_admin_dev     
                        50842 2020-03-24 15:19
                        
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

                        ...

                        ...

                        History Server will fetch the archived jobs
                        from the same location:

                        *historyserver.archive.fs.dir*:
                        hdfs:///user/p2epda/lake/delp_qa/flink_hs/

                        So I’m able to confirm that there are indeed
                        archived applications that I should be able to
                        view in the histserver. I’m not able to find
                        out what file the overview service is looking
                        for from the repository – any suggestions as
                        to what I could look into next?

                        Best,

                        Andreas

                        
------------------------------------------------------------------------


                        Your Personal Data: We may collect and process
                        information about you that may be subject to
                        data protection laws. For more information
                        about how we use and disclose your personal
                        data, how we protect your information, our
                        legal basis to use your information, your
                        rights and who you can contact, please refer
                        to: www.gs.com/privacy-notices
                        <http://www.gs.com/privacy-notices>

                    
------------------------------------------------------------------------


                    Your Personal Data: We may collect and process
                    information about you that may be subject to data
                    protection laws. For more information about how we
                    use and disclose your personal data, how we
                    protect your information, our legal basis to use
                    your information, your rights and who you can
                    contact, please refer to:
                    www.gs.com/privacy-notices
                    <http://www.gs.com/privacy-notices>

                
------------------------------------------------------------------------


                Your Personal Data: We may collect and process
                information about you that may be subject to data
                protection laws. For more information about how we use
                and disclose your personal data, how we protect your
                information, our legal basis to use your information,
                your rights and who you can contact, please refer to:
                www.gs.com/privacy-notices
                <http://www.gs.com/privacy-notices>

            
------------------------------------------------------------------------


            Your Personal Data: We may collect and process information
            about you that may be subject to data protection laws. For
            more information about how we use and disclose your
            personal data, how we protect your information, our legal
            basis to use your information, your rights and who you can
            contact, please refer to: www.gs.com/privacy-notices
            <http://www.gs.com/privacy-notices>

        ------------------------------------------------------------------------


        Your Personal Data: We may collect and process information
        about you that may be subject to data protection laws. For
        more information about how we use and disclose your personal
        data, how we protect your information, our legal basis to use
        your information, your rights and who you can contact, please
        refer to: www.gs.com/privacy-notices
        <http://www.gs.com/privacy-notices>

    ------------------------------------------------------------------------


    Your Personal Data: We may collect and process information about
    you that may be subject to data protection laws. For more
    information about how we use and disclose your personal data, how
    we protect your information, our legal basis to use your
    information, your rights and who you can contact, please refer to:
    www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>


------------------------------------------------------------------------

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>


Reply via email to