• update to 1.26.6
  • systend-resolved: use DefaultRoute instead of wildcard search domain (rh #1884352)

1.26.6 NEWS:

  • Change the behavior of nm-initrd-generator so that the 'ip=off|none' kernel cmdline argument actually generates a connection which disables both ipv4 and ipv6. Previously the generated connection would disable ipv4 but ipv6 would be set to the 'auto' method.

  • Fix systemd-resolved DNS plugin to configure DefaultRoute option and to only configure wildcard DNS search domain with exclusive DNS priority.

  • Various minor fixes.

How to install

sudo dnf upgrade --advisory=FEDORA-2020-7851982ff6

This update has been submitted for testing by thaller.

4 months ago

This update's test gating status has been changed to 'ignored'.

4 months ago

This update's test gating status has been changed to 'waiting'.

4 months ago

This update's test gating status has been changed to 'ignored'.

4 months ago
User Icon adamwill commented & provided feedback 4 months ago
karma

In openQA testing, this seems to break FreeIPA client enrolment. I'm technically off for the month but still checking update test results, so I haven't looked into this in detail, will give it a quick look if time allows. Tagging some FreeIPA folks: @abbra , @cipherboy , @rcritten

Leaving negative feedback for now, if this turns out to be an artifact of how we run the tests or something, will change.

User Icon adamwill commented & provided feedback 4 months ago

So briefly the situation in openQA testing is: we have a test that deploys as a FreeIPA server. It sets itself up for static networking with IP 172.16.2.100 and hostname ipa001.domain.local using the commands you can see here. It then sets up some repo stuff for testing the update, reboots, and runs the commands from this test code to do the actual deployment. It's all basically just running commands at a console, anywhere it says assert_script_run, it's running a command and expecting it to succeed. All those commands apparently succeed and we reach the comment # we're ready for children to enrol, now, where we wait on several client tests to enrol against us. All of those fail. The one I previously linked is the simplest one. It sets its DNS server to be 172.16.2.100 - the FreeeIPA server's IP, remember - then runs realm join --user=admin ipa001.domain.local, as you can see here, and that fails with "realm: No such realm found".

In tests of other F33 updates this same test is passing.

User Icon thaller commented & provided feedback 4 months ago

@adamwill it is probably related to the changes how NetworkManager configures systemd-resolved. Is it possible to add some printf debugging statements before the failure? In particular resolvectl ; nmcli device ; ip addr ; ip route.

Btw, yesterday we also released 1.28.0 which is now in rawhide (package "1.28.0-1"). That might have exactly the same problem. Does it?

Thank you for having extensive CI tests! That is awesome!!

User Icon abbra commented & provided feedback 4 months ago

FreeIPA server, when set up with integrated DNS server, configures both NM and systemd-resolved to hand over all zones to itself as it is authoritative to the zone hosted by FreeIPA: https://pagure.io/freeipa/blob/master/f/ipaplatform/redhat/tasks.py#_616

The configuration file snippet for NM looks like this: https://pagure.io/freeipa/blob/master/f/ipaplatform/redhat/tasks.py#_66, with explicit zone override.

Corresponding systemd-resolved setup is here: https://pagure.io/freeipa/blob/master/f/ipaplatform/base/tasks.py#_325 and configuration snippet template is here: https://pagure.io/freeipa/blob/master/f/ipaplatform/base/tasks.py#_41

User Icon abbra commented & provided feedback 4 months ago

note that on the client enrollment none of the logic above applies because it is only for the integrated DNS server setup on IPA master. On the client we rely on the general networking setup administrators provide. Since OpenQA environment sets upstream DNS server to IPA server already, the breakage to follow this is an obvious mistake from NM or systemd-resolved sides.

This update has been pushed to testing.

4 months ago

Bodhi is disabling automatic push to stable due to negative karma. The maintainer may push manually if they determine that the issue is not severe.

4 months ago
User Icon bojan commented & provided feedback 4 months ago
karma

Works here.

User Icon pizzadude commented & provided feedback 4 months ago
karma

Breaks mullvad vpn client, requires workaround

User Icon adamwill commented & provided feedback 4 months ago

@thaller will do that. sorry i'm a bit slow, i'm technically off work till January. :)

User Icon adamwill commented & provided feedback 4 months ago

Oh, for Rawhide, I can't tell if it has this problem because Rawhide is running into a bug in FreeIPA server deployment, which means we never reach client enrolment.

User Icon jfoechsler commented & provided feedback 4 months ago
karma

WiFi and OpenVPN works

User Icon adamwill commented & provided feedback 4 months ago

@thaller here's all the output from those commands (smooshed into one text file but you should be able to tell what's from what):

Global
       LLMNR setting: resolve             
MulticastDNS setting: no                  
  DNSOverTLS setting: no                  
      DNSSEC setting: no                  
    DNSSEC supported: no                  
Fallback DNS Servers: 1.1.1.1             
                      8.8.8.8             
                      1.0.0.1             
                      8.8.4.4             
                      2606:4700:4700::1111
                      2001:4860:4860::8888
                      2606:4700:4700::1001
                      2001:4860:4860::8844

Link 2 (ens4)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
DefaultRoute setting: yes                      
       LLMNR setting: yes                      
MulticastDNS setting: no                       
  DNSOverTLS setting: no                       
      DNSSEC setting: no                       
    DNSSEC supported: no                       
  Current DNS Server: 172.16.2.100             
         DNS Servers: 172.16.2.100             
DEVICE  TYPE      STATE      CONNECTION         
ens4    ethernet  connected  Wired connection 1 
lo      loopback  unmanaged  --                 
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:12:00:ff brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    inet 172.16.2.103/24 brd 172.16.2.255 scope global noprefixroute ens4
       valid_lft forever preferred_lft forever
    inet6 fe80::3545:4731:61f3:61e9/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
default via 172.16.2.2 dev ens4 proto static metric 100 
172.16.2.0/24 dev ens4 proto kernel scope link src 172.16.2.103 metric 100
User Icon adamwill commented & provided feedback 4 months ago

@abbra of course it's possible - in fact I'd say likely - that the actual bug is on the server end. The client may be (probably is) doing the right thing and sending the DNS query to the FreeIPA server, but not getting the expected response. It's harder for me in openQA to get logs out of the server when it's the client that fails, though, because of how openQA behaves (when the client test fails it just kills the server test, without sending it through the post-fail hook stage where we usually upload logs and stuff). I'll try and hack it up, though.

User Icon adamwill commented & provided feedback 4 months ago

Logs from the server end of a failed test can be found here now - there's a tarball of /var/log and also named.run, which turned out to be useful in previous debugging so I still have the tests set to upload it.

Yell if there's anything else needed from the server end to figure this out, and I can probably make it happen.

User Icon adamwill commented & provided feedback 4 months ago

Those logs were captured right after the clients tried and failed to enrol, BTW.

User Icon abbra commented & provided feedback 4 months ago

I checked both client and server logs -- the server has no requests from the client at all. Sadly, we have no named logs (and no query logs enabled) so we cannot see if the client ever tried to connect.

User Icon adamwill commented & provided feedback 4 months ago

If you tell me what to do to get the relevant logs, I can do it.

User Icon geraldosimiao commented & provided feedback 4 months ago
karma

works here

Test Case DNS-over-SSL
Test Case NM Ethernet
Test Case NM Wireless
User Icon adamwill commented & provided feedback 4 months ago

https://openqa.stg.fedoraproject.org/tests/983692/file/role_deploy_domain_controller-varnamed.tar.gz has the contents of /var/named from the server after a test where we ran rndc querylog on after deploying the server (on instructions from @abbra).

User Icon adamwill commented & provided feedback 4 months ago

Hum, so it looks like the problem here actually is on the client end. If I hack up the test so the server uses the updated NetworkManager but the client uses the current stable one (1.26.4-1.fc33), it works.

I also checked the client logs from the previous run against the server logs with the named logging enabled. This is where the client fails:

Dec 10 22:25:34 client003.domain.local realmd[905]: Using 'r552.902' operation for method 'Discover' invocation on 'org.freedesktop.realmd.Provider' interface
Dec 10 22:25:34 client003.domain.local realmd[905]: Registered cancellable for operation 'r552.902'
Dec 10 22:25:34 client003.domain.local realmd[905]:  * Resolving: _ldap._tcp.ipa001.domain.local
Dec 10 22:25:34 client003.domain.local realmd[905]:  * Resolving: _ldap._tcp.ipa001.domain.local
Dec 10 22:25:34 client003.domain.local realmd[905]: Resolving ipa001.domain.local failed: Temporarily unable to resolve “_kerberos._udp.ipa001.domain.local”
Dec 10 22:25:34 client003.domain.local realmd[905]: Temporarily unable to resolve “_ldap._tcp.ipa001.domain.local”
Dec 10 22:25:34 client003.domain.local realmd[905]:  * Resolving: ipa001.domain.local
Dec 10 22:25:34 client003.domain.local realmd[905]:  * Resolving: ipa001.domain.local
Dec 10 22:25:34 client003.domain.local realmd[905]: Resolving ipa001.domain.local failed: Temporarily unable to resolve “_kerberos._tcp.ipa001.domain.local”
Dec 10 22:25:34 client003.domain.local realmd[905]: Error resolving “ipa001.domain.local”: Name or service not known
Dec 10 22:25:34 client003.domain.local realmd[905]:  * No results: ipa001.domain.local
Dec 10 22:25:34 client003.domain.local realmd[905]:  * No results: ipa001.domain.local

but there is nothing at all in the server named logs at the corresponding time, they go straight from :25:15 to :26:05 (the difference in hours is just local time vs. UTC):

10-Dec-2020 17:25:15.930 client @0x7f5e14008cc0 172.16.2.102#59589 (dl.fedoraproject.org): endrequest
10-Dec-2020 17:26:05.462 client @0x7f5e14060c10 172.16.2.102#39716: UDP request

so that matches up. It seems like, with the updated NM, this request somehow never makes it off the client box and hits the server; it's failing entirely on the client end somehow.

User Icon renault commented & provided feedback 4 months ago
karma

Works fine, no regressions found

This update can be pushed to stable now if the maintainer wishes

4 months ago
User Icon abbra commented & provided feedback 4 months ago
karma

As said by @adamwill, the update is broken for IPA clients, so we should not push it to stable.

BZ#1884352 wg-quick doesn't support systemd-resolved resulting in DNS leaks: NetworkManager should better handle default routing domains with systemd-resolved
User Icon adamwill commented & provided feedback 4 months ago

@thaller can you say what if any additional logs/info would help debug this on NM end?

User Icon thaller commented & provided feedback 4 months ago

That is still under investigation, sorry. But clearly, this update should be retired...

This update has been unpushed.

User Icon adamwill commented & provided feedback 4 months ago

For the record, I think we do have this issue in Rawhide also. I can't tell for the simple "deploy directly on Rawhide" tests as server deployment fails in that case (so the client tests never reach the point where this bug would happen), but I think we're seeing it on the upgrade tests. On the upgrades tests, we deploy server + client on F32 or F33, then upgrade server to Rawhide, then upgrade clients to Rawhide and run client tests. In that scenario, the server is deploying and upgrading apparently successfully and from the logs is working OK after upgrade...but the client tests, after upgrade to Rawhide, cannot resolve ipa001.domain.local (they fail when trying to browse to it in Firefox, to access the FreeIPA web UI). That looks a lot like the same bug.

User Icon adamwill commented & provided feedback 4 months ago

Aha - so, significantly, this seems to be specific to the domain name I used for this test. I just tweaked the staging instance to use test.openqa.fedoraproject.org instead of domain.local as the domain, and the tests pass with that change. So it's likely that the issue here is specific to using .local.

It still seems like an incorrect behaviour change somewhere, but less of a big deal.

This update has been submitted for testing by thaller.

2 months ago
User Icon thaller commented & provided feedback 2 months ago

Adam apparently changed the used domain (from domain.local), which avoids the test failure. While I don't understand what the problem is, I'd like to re-open this update and hopefully get it through.

This update has been pushed to testing.

2 months ago
User Icon ersen commented & provided feedback 2 months ago
karma

Works fine for regular home use with Wi-Fi and Ethernet.

Test Case NM Ethernet
Test Case NM Wireless
Test Case NM nmcli
User Icon sector commented & provided feedback 2 months ago
karma

The cause of the test failure may be connected to systemd-resolved which dropped the "bad practice" of resolving ".local".

'This means that on networks where the ".local" domain is defined in a site-specific DNS server, explicit search or routing domains need to be configured to make lookups work within this DNS domain. Note that these days, it's generally recommended to avoid defining ".local" in a DNS server, as RFC6762 reserves this domain for exclusive MulticastDNS use.' https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html

User Icon sector commented & provided feedback 2 months ago

Adding "Domains=local" to your systemd-resolved config should do the trick.

User Icon fabregas commented & provided feedback 2 months ago
karma

Works for me. No more DNS leaks.

BZ#1884352 wg-quick doesn't support systemd-resolved resulting in DNS leaks: NetworkManager should better handle default routing domains with systemd-resolved
User Icon lhirlimann commented & provided feedback 2 months ago
karma

.

User Icon lhirlimann commented & provided feedback 2 months ago
karma

works

This update has been submitted for stable by thaller.

2 months ago

This update has been pushed to stable.

2 months ago

Please login to add feedback.

Metadata
Type
unspecified
Karma
5
Signed
Content Type
RPM
Test Gating
Settings
Unstable by Karma
-3
Stable by Karma
disabled
Stable by Time
disabled
Dates
submitted
4 months ago
in testing
2 months ago
in stable
2 months ago
BZ#1884352 wg-quick doesn't support systemd-resolved resulting in DNS leaks: NetworkManager should better handle default routing domains with systemd-resolved
-1
1

Automated Test Results

Test Cases

0 0 Test Case QA/TestCases/NM Mobile Broadband
0 0 Test Case QA/TestCases/NM VPN OpenVPN
0 0 Test Case QA/TestCases/NM VPN vpnc
0 1 Test Case QA/TestCases/NM WEP
0 2 Test Case QA/TestCases/NM WPA
0 2 Test Case QA/TestCases/NM Wifi
0 1 Test Case DNS-over-SSL
0 0 Test Case DNSSEC-trigger
0 0 Test Case NM Bonding
0 2 Test Case NM Ethernet
0 0 Test Case NM Gnome Hotspot
0 0 Test Case NM KDE Hotspot
0 2 Test Case NM Wireless
0 1 Test Case NM nmcli
0 0 Test Case NetworkManager assume
0 0 Test Case NetworkManager bt pan
0 0 Test Case NetworkManager celldata
0 0 Test Case NetworkManager ipv6