The openQA failure here seems to be reproducible - it has failed the same way four times (twice each on prod and stg). The test that's failing is deployment of a FreeIPA replica (deployment of the first server instance works fine). The logs show:
@adamwill How can we reproduce this in an environment where we can run the failing process under a debugger, or with certain environment variables configured? Thanks.
this is, again, a very basic 'ipa-getkeytab' operation that attempts to store/delete a key in the keytab: a single process, single thread operation, nothing fancy. I think this is the code corresponding to krb5int_key_delete (there are macro definitions that bring it from k5_key_delete name): https://github.com/krb5/krb5/blob/master/src/util/support/threads.c#L362-L397
As this needs a client machine which is talking to a configured FreeIPA, and thus at least two VMs which talk to each other, this is unfortunately quite involved. If you have a FreeIPA setup, you could just run the ipa-getkeytab command. From scratch, here is how you can reproduce the cockpit test. Note that this is safe -- they don't run as root, don't create/change permanent files on the host (only temp dirs/files) or change the qemu/libvirt config (all transient VMs with socket networking).
You can do this in our development toolbox, so that you don't have to install all the nodejs, libvirt and QEMU packages (if you already have them installed, you can skip this):
This is a Fedora 38 cloud image with some extra packages installed, so you can install debug symbols, run gdb, etc.
Thank you. I got to this point and could reproduce the assert, but the VM with ipa-getkeytab does not have a default route. Any idea how to fix that? DHCP assigns 172.27.0.2 for the eth0 interface, but no default route.
I'd love to give you some useful feedback on the new build, but something else has broken cisco's vpn agent in the past week unrelated to this seemingly. I can confirm my system is otherwise functional though.
This update's test gating status has been changed to 'passed'.
Many thanks Florian for puzzling this out! Our nightly run still failed, but dnf only "saw" 2.37-6.fc38 on the mirrors still. I'll let you know tomorrow morning. But I'm sure it'll be good, as you tested it on the very thing.
This update's test gating status has been changed to 'passed'.
I'm still facing " Termination reason code 59: Connection attempt failed due to certificate problems" with cisco anyconnect after upgrade glibc to glibc-2.37-7.fc38.x86_64
Thank you for o.m. link
"sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-7f0a294b1a"
which worked FINE on Sep,19. for glibc 2.37-5.fc38 --> glibc 2.37-6.fc38
systemd-resolve --status #looked like:
Global
Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: uplink
Current DNS Server: [ IP local-dns1]
DNS Servers: [ IP local-dns2] [ IP local-dns3] [ IP local-dns1] #IP local-dns2+3 are given by vpn-connection and Certificate
DNS Domain: ~.
After update (and to glibc 2.37-7.fc38) it DOESN'T work again!
and like jostra I'm facing: "in 2 parts overlayed as before:
The certificate on the secure gateway is invalid. A VPN connection will not be established.
Anyconnect was not able to establish a connection to the specified secure gateway. Please try connecting again.
--> Cisco AC SMC Ver 4.10.07073"
systemd-resolve --status #looks as before
Global
Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: uplink
Link 2 (enp0s31f6)
Current Scopes: none
Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 3 (wlp0s20f3)
Current Scopes: DNS LLMNR/IPv4
Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: [IP local-dns1]
DNS Servers: [ IP local-dns1]
Link 4 (virbr0)
Current Scopes: none
Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
thanks for working out the crash, @fweimer , sorry I didn't respond - I'm bad at checking bodhi email. in future feel free to ping me on matrix. I can fiddle with openqa to disable the limits and get a backtrace in this kinda situation if you need it.
This update has been submitted for testing by fweimer.
This update's test gating status has been changed to 'waiting'.
This update's test gating status has been changed to 'failed'.
The openQA failure here seems to be reproducible - it has failed the same way four times (twice each on prod and stg). The test that's failing is deployment of a FreeIPA replica (deployment of the first server instance works fine). The logs show:
Looks like we didn't get a coredump because of resource limits, from the system journal :(
@fweimer
This update has been pushed to testing.
Bodhi is disabling automatic push to stable due to negative karma. The maintainer may push manually if they determine that the issue is not severe.
@adamwill How can we reproduce this in an environment where we can run the failing process under a debugger, or with certain environment variables configured? Thanks.
Judging by the errors, it is the code in https://github.com/krb5/krb5/blob/krb5-1.21.1-final/src/lib/krb5/keytab/kt_file.c#L523-L552 which is a file-based keytab backend. The locking mutex is local to this code.
ipa-getkeytab
is a single-process program that useskrb5_kt_add_entry()
function to store an entry into a keytab.Works.
It works for me as well! Thank you!
Can confirm it resolves the issue with cisco secure client/anyconnect.
Crashes confirmed with cockpit tests, they now segfault
ipa-getkeytab
andiscsid
at least.The journal for ipa-getkeytab crash shows the crash in
krb5int_key_delete()
, and the test output confirms the assertion:The journal for the iscsi crash doesn't even get that far, it crashes right at the beginning:
Works without issues till now
This backtrace is more interesting:
It has
_dl_fini
in it, so it's very likely it's caused by the changes in this update.@adamwill @martinpitt How can I create a VM (or set of VMs) that reproduces this issue? Thanks.
@fweimer this seems like unloading GSSAPI mechglue plugin after
ipa-getkeytab
successfully completed.Judging by cockpit logs:
this is, again, a very basic 'ipa-getkeytab' operation that attempts to store/delete a key in the keytab: a single process, single thread operation, nothing fancy. I think this is the code corresponding to
krb5int_key_delete
(there are macro definitions that bring it fromk5_key_delete
name): https://github.com/krb5/krb5/blob/master/src/util/support/threads.c#L362-L397As this needs a client machine which is talking to a configured FreeIPA, and thus at least two VMs which talk to each other, this is unfortunately quite involved. If you have a FreeIPA setup, you could just run the ipa-getkeytab command. From scratch, here is how you can reproduce the cockpit test. Note that this is safe -- they don't run as root, don't create/change permanent files on the host (only temp dirs/files) or change the qemu/libvirt config (all transient VMs with socket networking).
You can do this in our development toolbox, so that you don't have to install all the nodejs, libvirt and QEMU packages (if you already have them installed, you can skip this):
Then check out cockpit and build a test image:
This is without updates-testing still. Now run a FreeIPA test:
This ought to succeed. If not, and it's not obvious why (like, missing libvirt packages or so), please ping me here or on Slack, I'm happy to assist.
Now install the glibc update into the VM:
Now run the test again, but this time with the
-s
option, which will make it "sit" on a test failure without cleaning up the VMs:This should fail with this assertion error, and give you some information how to log in via SSH:
Move that terminal to the side -- as soon as you press enter, it'll continue, i.e. clean up all the test VMs.
Inside the test VM, you can now reproduce the crash:
This is a Fedora 38 cloud image with some extra packages installed, so you can install debug symbols, run gdb, etc.
Works great on my Fedora 38 workstation, I was down until this was applied.
Thank you. I got to this point and could reproduce the assert, but the VM with
ipa-getkeytab
does not have a default route. Any idea how to fix that? DHCP assigns172.27.0.2
for theeth0
interface, but no default route.Sorry @fweimer, indeed these VMs are offline by default, to make sure none of our tests depends on something outside. Please apply this local hack:
But I think I see what's wrong with the current ELF destructor ordering approach. I'll experiment with something else.
With the new approach:
I'll do another build, so that the AnyConnect users can test it as well.
fweimer edited this update.
New build(s):
Removed build(s):
Karma has been reset.
This update has been submitted for testing by fweimer.
This update's test gating status has been changed to 'waiting'.
I'd love to give you some useful feedback on the new build, but something else has broken cisco's vpn agent in the past week unrelated to this seemingly. I can confirm my system is otherwise functional though.
This update's test gating status has been changed to 'passed'.
This update's test gating status has been changed to 'failed'.
This update has been pushed to testing.
This update's test gating status has been changed to 'waiting'.
Many thanks Florian for puzzling this out! Our nightly run still failed, but dnf only "saw" 2.37-6.fc38 on the mirrors still. I'll let you know tomorrow morning. But I'm sure it'll be good, as you tested it on the very thing.
This update's test gating status has been changed to 'passed'.
Works great! LGTM! =)
This update can be pushed to stable now if the maintainer wishes
OpenQA tests for FreeIPA succeeded. The upgrade test showed known SELinux AVCs which are being taken care of already.
Should this work on FC37? I find it doesn't install/change anything on my system:
No security updates needed, but 35 updates available Dependencies resolved. Nothing to do. Complete!
and the problem with the VPN remains
Applied this to another fc38 system and can confirm the vpn issue remains.
I'm still facing " Termination reason code 59: Connection attempt failed due to certificate problems" with cisco anyconnect after upgrade glibc to glibc-2.37-7.fc38.x86_64
no regressions noted
Cockpit's tests are happy again with glibc 2.37-7.fc38, thank you!
Thank you for o.m. link "sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-7f0a294b1a" which worked FINE on Sep,19. for glibc 2.37-5.fc38 --> glibc 2.37-6.fc38
systemd-resolve --status #looked like:
Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: uplink Current DNS Server: [ IP local-dns1] DNS Servers: [ IP local-dns2] [ IP local-dns3] [ IP local-dns1] #IP local-dns2+3 are given by vpn-connection and Certificate DNS Domain: ~.
After update (and to glibc 2.37-7.fc38) it DOESN'T work again! and like jostra I'm facing: "in 2 parts overlayed as before: The certificate on the secure gateway is invalid. A VPN connection will not be established. Anyconnect was not able to establish a connection to the specified secure gateway. Please try connecting again. --> Cisco AC SMC Ver 4.10.07073"
systemd-resolve --status #looks as before Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: uplink
Link 2 (enp0s31f6) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 3 (wlp0s20f3) Current Scopes: DNS LLMNR/IPv4 Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported Current DNS Server: [IP local-dns1] DNS Servers: [ IP local-dns1]
Link 4 (virbr0) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
and
vpnagentd.service gives errors like:
acvpnagent[108857]: Function: determinePublicAddrCandidateFromDefRoute File: ../../vpn/AgentUtilities/HostConfigMgr.cpp Line: 3057 Invoked Function: CHos> acvpnagent[108857]: Function: updatePotentialPublicAddresses File: ../../vpn/AgentUtilities/HostConfigMgr.cpp Line: 3190 Invoked Function: CHostConfigMgr> acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: OnIpcMessageReceived File: ../../vpn/Common/IPC/IPCDepot.cpp Line: 1240 Invoked Function: CIpcTransport::OnSocketReadComplete Return Code: -33292279 (0xFE040009) Description: IPCTRANSPORT_ERROR_UNEXPECTED remote peer: gui
I kindly ask you to provide a final solution!
Thank you in advance!
thanks for working out the crash, @fweimer , sorry I didn't respond - I'm bad at checking bodhi email. in future feel free to ping me on matrix. I can fiddle with openqa to disable the limits and get a backtrace in this kinda situation if you need it.
This update has been obsoleted by glibc-2.37-10.fc38.