Hi Petr, Is there anything else needed from me on this to diagnose this further?
Last I had shared the log and pcap corresponding to the case 1. i.e., pxe-service entries with tag:proxy with dhcp-boot . Regards, Shrenik On Thu, 30 Sep, 2021, 16:17 Shrenik Bhura, <shrenik.bh...@gmail.com> wrote: > > 1. seems to have wrong pcap file or it does not use configuration > attached in linked archive. It seems it offers menu items from 2. archive > with custom pxe-services. > > Apologies, there was definitely some mistake. > > We have applied the patch and tried with and without dhcp-no-override but > it still fails to boot. Herein are the pcap and the logs for this case. > > https://drive.google.com/file/d/1-GvsId99FC8f8B2I0YaTVuje5385u4LC/view?usp=sharing > > Additionally, also included is the qemu pcap wherein it does boot > successfully. > > On Wed, 29 Sept 2021 at 20:29, Petr Menšík <pemen...@redhat.com> wrote: > >> It is somehow hard to guess described results for each configuration (1. >> 2. 3.). It is unclear to me, what you saw for each variant printed by the >> computer. >> >> 1. seems to have wrong pcap file or it does not use configuration >> attached in linked archive. It seems it offers menu items from 2. archive >> with custom pxe-services. >> >> Option 43 Suboption: (9) PXE boot menu >> Length: 41 >> boot menu: >> 8000155058454c494e555820285838362d36345f4546492980010e5058454c494e555820… >> Type: Unknown (32768) >> Length: 21 >> Description: PXELINUX (X86-64_EFI) >> Type: Unknown (32769) >> Length: 14 >> Description: PXELINUX (EFI) >> >> Above is not present in config file presented for it, but in 2. Are you >> sure you have killed dnsmasq and started it again? >> >> I think it might be difference between pxe-service served file chosen via >> menuboot. I have noticed there are two way to specify file to boot in DHCP >> for IPv4. One is in fixed header and first try chosen from menu is in that. >> pxe-service options makes it to request direct query to DHCP server, marked >> proxyDHCP in wireshark. This proxy ACK is followed by TFTP. >> >> I used filter in wireshark: "dhcp or (!tftp.destination_file && tftp)" >> >> However following DHCP offers boot file path ONLY in option 67 value. >> Fixed header boot file is all zeroed. It seems to me this is the part the >> snponly.efi firmware does not understand. It does not try to use path in >> option, but may insist only on file. Since option #52 overload is not in >> packet, I guess dnsmasq should have used mess->file for path and not option >> 67. But rules of rfc2131.c:2476 are simple. If client have requested option >> 67, it should handle it as option 67. I guess it is bug in snponly.efi. >> Either it should not include option 67 between requested options or it >> should actually handle the option. Dnsmasq would offer boot path in both >> cases. >> >> Interesting enough, dnsmasq is inconsistent with itself. It behaves a bit >> different way in PXE proxy mode, where file header part is always used. In >> normal mode unless --dhcp-no-override is used, option is used if requested. >> >> Can you please try if dhcp-no-override option would fix your issues? I >> think it should behave the same way in both situations. >> >> I attached patch, which would set boot file on pxe-service the same way >> as dhcp-boot. It may require dhcp-no-override where it did not before. >> Could you please try it? >> On 9/28/21 11:54, Shrenik Bhura wrote: >> >> Hi Petr, >> >> As per your guidance, we have enabled logging (LOG_ALL in >> config/consolle.h) and recompiled the ipxe binaries. Below are the latest >> observations. >> >> Taking down the scenarios from the previous post for ease of reference - >> 1. Default dnsmasq config with default ltsp's pxe-service entries - >> https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing >> 2. Custom pxe-service entries (just to prove that pxe-service and >> dhcp-boot do seem to successfully co-exist) - >> https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing >> 3. Without pxe-service entries - >> https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing >> >> I'll try to summarise the understanding and prevailing ambiguities thus >> far to help allot responsibility of multiple things that may be going wrong >> here : >> >> Between scenario (1) and (2), we see that ltsp.ipxe is being served in >> (2) which doesn't happen in (1). >> In (1), the primary issue is that EFI clients do not receive snponly.efi, >> thus they do not advertise option 175 and hence are not sent the ltsp.ipxe. >> Since it has not got to the iPXE stage as yet, there are no logs available >> from ipxe. All that is visible momentarily on the client side is these two >> lines - >> >> *Station IP address is 192.168.67.134 * >> *PXE-E21: Remote boot cancelled.* >> Quoting from an explanation herein [1] for "Remote boot cancelled" - >> *" This message is also displayed when a DHCP/proxyDHCP server sends a >> menu that auto-selects Local Boot and when a bootserver sends a bootstrap >> program that returns control to the PXE LoadFile protocol. "* >> >> In scenario (2), PXE boot menu is displayed as defined in the pxe-service >> lines, option 175 is received back from the client, ltsp.ipxe is sent but >> is not "downloaded" by the client. There is nothing reported in the ipxe >> logs. On the client, the last line says - >> No more network devices. >> >> But, above all, if we simply comment out all the pxe-service lines, as in >> scenario (3), including the one with tag:rpi, the EFI clients boot up >> perfectly. iPXE log has - >> ipxe: Downloaded "ltsp.ipxe" >> ipxe: Executing "ltsp.ipxe" >> ipxe: Downloaded "vmlinuz" >> ipxe: Downloaded "ltsp.img" >> ipxe: Downloaded "initrd.img" >> ipxe: Executing "vmlinuz" >> >> The question thus arises that why does dnsmasq not ignore the pxe-service >> lines which have an unmatched "tag:proxy" or "tag:rpi" when dnsmasq is >> operating in non-proxy mode? Or does it ignore and yet there is a problem >> outside dnsmasq? With respect to scenario (1), there could be a problem in >> the UEFI implementation, with respect to (2), there could be an issue with >> iPXE but what we can immediately control within dnsmasq is to ignore lines >> of pxe-service with tags that have not been set. >> >> Your thoughts? >> >> [1] >> https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html >> >> On Mon, 27 Sept 2021 at 22:56, Petr Menšík <pemen...@redhat.com> wrote: >> >>> Hello, >>> >>> I made a mistake when reading the code. You are right. The part I >>> mentioned is only affected on vendor-class information option 43, only in >>> DHCPREQUEST or DHCPINFORM. Which is not in request in pcap you have sent. >>> >>> It seems to me problem is somewhere on IPXE side in decoding reply >>> dnsmasq sent to it. I took a look at the second offer of both without-pxe >>> and default-ltsp. It seems the only difference is in vendorclass >>> information containing PXE menu. Without pxe continues to TFTP, where >>> default is stuck. The answer is on its decoding side. Assignment got the >>> same boot file successfully in both configurations. I am afraid it would be >>> problem at PXE decoding client, which may not understand menu dnsmasq tried >>> to send. >>> >>> According to option 43 decoding in wireshark, pxe suboptions look well. >>> Except suboption 9 boot menu. Type unknown 0x8000 does seem weird, but >>> should be just Vendor use according to IBM docs [1]. Why it did not do >>> anything else should be answered by ipxe people. It should continue after 2 >>> seconds even without any action. Did it display at least boot menu on that >>> station? Did it show anything? Are those machines with normal VGA output? >>> Perhaps LOG_LEVEL in PXE [2] might reveal true reason. >>> >>> Cheers, >>> Petr >>> >>> 1. >>> https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions >>> 2. https://ipxe.org/buildcfg/log_level >>> On 9/27/21 16:04, Shrenik Bhura wrote: >>> >>> Hello Petr, >>> >>> Thanks for your guidance. >>> >>> It does seem that dhcp-boot is being reached even when pxe-service is >>> successfully executed. Taking a hint from this discussion on UEFI and PXE ( >>> https://bbs.archlinux.org/viewtopic.php?id=237655), we tried this >>> custom configuration - >>> >>> pxe-prompt="Press any key for boot menu",2 >>> pxe-service=X86-64_EFI,"PXELINUX (X86-64_EFI)",ltsp/snponly.efi >>> pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi >>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe >>> dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi >>> dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe >>> >>> (full file attached below) >>> >>> Server does proceed to offering ltsp.ipxe to the client via dhcp but is >>> eventually not being transferred via tftp. >>> >>> Have attached logs, pcap and dnsmasq configuration of three scenarios - >>> 1. Default dnsmasq config with default ltsp's pxe-service entries >>> 2. Custom pxe-service entries >>> 3. Without pxe-service entries >>> >>> We have tested these with two systems - Intel NUC and Dell Optiplex 3040 >>> with their updated firmware and have found the same results. >>> >>> I hope this helps to zoom further into the problem area. >>> >>> Best regards, >>> Shrenik >>> >>> >>> >>> >>> On Mon, 27 Sept 2021 at 17:00, Petr Menšík <pemen...@redhat.com> wrote: >>> >>>> Hi Alkis, >>>> >>>> It would be helpful, if you could record pcap with those lines commented >>>> out and enabled. It seems suspicious dhcp-boot option is present at the >>>> same time with pxe-service. From what I undestood, pxe-service should >>>> offer boot options only to PXEClient vendor string. I think it saves you >>>> the need to dhcp-match=set:X86PC,option:client-arch,0 >>>> >>>> then matched in >>>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe >>>> dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe >>>> >>>> I just checked my Raspberry 3. I guess architecture of RPi in DHCP >>>> request is clearly wrong. Unfortunately it reports it wrong also in >>>> vendorclass ARCH:0000. >>>> >>>> Anyway, it might not handle tags correctly. Around src/rfc2131.c:891, it >>>> searches for pxe service without using tags. It is not used to find >>>> correct service, just to find correct context. >>>> >>>> Also it seems if any pxe-service is defined and incoming DHCP packet >>>> contains PXEClient in VendorClass option, it MUST be handled by >>>> pxe-service. If no correct service & context is found, reply is not >>>> handled for it. It cannot fall back to normal DHCP reply in that case, >>>> which can be fixed. But current situation seems to me clear. If any >>>> pxe-service is present, all PXEClient packets has to be handled by it. >>>> It seems to me you define tags per arch anyway, so I guess you can avoid >>>> pxe-service just fine. >>>> >>>> I made an attempt to respond to PXE request only when correct service >>>> matches. But I have no setup prepared for it, I tested just it compiles. >>>> Could you try it would help? >>>> >>>> Cheers, >>>> Petr >>>> >>>> On 3/19/21 10:05, Alkis Georgopoulos wrote: >>>> > Hi all, >>>> > >>>> > I'm one of the LTSP developers; I asked Shrenik to contact the dnsmasq >>>> > mailing list because I feel this might be a dnsmasq issue. >>>> > >>>> > Specifically, success or failure depends on whether these five lines >>>> > are commented out or not: >>>> > >>>> > >>>> #pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe >>>> > >>>> #pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi >>>> > >>>> > #pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe >>>> > #pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe >>>> > #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot ",unused >>>> > >>>> > You may find the full configuration files and logs at: >>>> > https://github.com/ltsp/ltsp/pull/417 >>>> > >>>> > The reason I feel it might be a dnsmasq issue, is that these tags are >>>> > NOT matched in Shrenik's use case. He's not using proxy mode and he's >>>> > not booting a Raspberry Pi. >>>> > >>>> > So, "pxe-service" lines that are NOT matched, cause the problem, >>>> > yet if they're commented out, the problem is gone... >>>> > >>>> > Would that be an issue with dnsmasq, or with the UEFI PXE stack? >>>> > >>>> > Thanks, >>>> > Alkis Georgopoulos >>>> > >>>> > _______________________________________________ >>>> > Dnsmasq-discuss mailing list >>>> > Dnsmasq-discuss@lists.thekelleys.org.uk >>>> > >>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss >>>> > >>>> -- >>>> Petr Menšík >>>> Software Engineer >>>> Red Hat, http://www.redhat.com/ >>>> email: pemen...@redhat.com >>>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB >>>> _______________________________________________ >>>> Dnsmasq-discuss mailing list >>>> Dnsmasq-discuss@lists.thekelleys.org.uk >>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss >>>> >>> -- >>> Petr Menšík >>> Software Engineer >>> Red Hat, http://www.redhat.com/ >>> email: pemen...@redhat.com >>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB >>> >>> -- >> Petr Menšík >> Software Engineer >> Red Hat, http://www.redhat.com/ >> email: pemen...@redhat.com >> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB >> >>
_______________________________________________ Dnsmasq-discuss mailing list Dnsmasq-discuss@lists.thekelleys.org.uk https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss