obsolete

glibc-2.37-7.fc38

FEDORA-2023-7f0a294b1a created by fweimer 8 months ago for Fedora 38

This update contains changes to ELF destructor ordering, improving compatibility with a certain VPN software product.

This update has been submitted for testing by fweimer.

8 months ago

This update's test gating status has been changed to 'waiting'.

8 months ago

This update's test gating status has been changed to 'failed'.

8 months ago
User Icon adamwill commented & provided feedback 8 months ago
karma

The openQA failure here seems to be reproducible - it has failed the same way four times (twice each on prod and stg). The test that's failing is deployment of a FreeIPA replica (deployment of the first server instance works fine). The logs show:

2023-09-19T16:11:40Z DEBUG args=['/usr/sbin/ipa-getkeytab', '-k', '/etc/dirsrv/ds.keytab', '-p', 'ldap/ipa003.test.openqa.fedoraproject.org@TEST.OPENQA.FEDORAPROJECT.ORG', '-H', 'ldaps://ipa
002.test.openqa.fedoraproject.org']
2023-09-19T16:11:41Z DEBUG Process finished, return code=-6
2023-09-19T16:11:41Z DEBUG stdout=
2023-09-19T16:11:41Z DEBUG stderr=Keytab successfully retrieved and stored in: /etc/dirsrv/ds.keytab
k5_mutex_lock: Received error 22 (Invalid argument)
ipa-getkeytab: ../../include/k5-thread.h:376: k5_mutex_lock: Assertion `r == 0' failed.
User Icon adamwill commented & provided feedback 8 months ago

Looks like we didn't get a coredump because of resource limits, from the system journal :(

User Icon adamwill commented & provided feedback 8 months ago

This update has been pushed to testing.

8 months ago

Bodhi is disabling automatic push to stable due to negative karma. The maintainer may push manually if they determine that the issue is not severe.

8 months ago
User Icon fweimer commented & provided feedback 8 months ago

@adamwill How can we reproduce this in an environment where we can run the failing process under a debugger, or with certain environment variables configured? Thanks.

User Icon abbra commented & provided feedback 8 months ago

Judging by the errors, it is the code in https://github.com/krb5/krb5/blob/krb5-1.21.1-final/src/lib/krb5/keytab/kt_file.c#L523-L552 which is a file-based keytab backend. The locking mutex is local to this code. ipa-getkeytab is a single-process program that uses krb5_kt_add_entry() function to store an entry into a keytab.

User Icon bojan commented & provided feedback 8 months ago
karma

Works.

User Icon reanimator commented & provided feedback 8 months ago
karma

It works for me as well! Thank you!

BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
User Icon snehring commented & provided feedback 8 months ago
karma

Can confirm it resolves the issue with cisco secure client/anyconnect.

BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
User Icon martinpitt commented & provided feedback 8 months ago
karma

Crashes confirmed with cockpit tests, they now segfault ipa-getkeytab and iscsid at least.

The journal for ipa-getkeytab crash shows the crash in krb5int_key_delete(), and the test output confirms the assertion:

ipa-getkeytab: ../../include/k5-thread.h:376: k5_mutex_lock: Assertion `r == 0' failed.

The journal for the iscsi crash doesn't even get that far, it crashes right at the beginning:

systemd-coredump[34279]: Process 34270 (iscsid) of user 0 dumped core.    

Module libpcre2-8.so.0 from rpm pcre2-10.42-1.fc38.1.x86_64    
Module liblz4.so.1 from rpm lz4-1.9.4-2.fc38.x86_64    
Module libcap.so.2 from rpm libcap-2.48-6.fc38.x86_64    
Module liblzma.so.5 from rpm xz-5.4.1-1.fc38.x86_64    
Module libzstd.so.1 from rpm zstd-1.5.5-1.fc38.x86_64    
Module libselinux.so.1 from rpm libselinux-3.5-1.fc38.x86_64    
Module libblkid.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64    
Module libz.so.1 from rpm zlib-1.2.13-3.fc38.x86_64    
Module libopeniscsiusr.so.0.2.0 from rpm iscsi-initiator-utils-6.2.1.4-10.git2a8f9d8.fc38.x86_64    
Module libsystemd.so.0 from rpm systemd-253.10-1.fc38.x86_64    
Module libkmod.so.2 from rpm kmod-30-4.fc38.x86_64    
Module libmount.so.1 from rpm util-linux-2.38.1-4.fc38.x86_64    
Module libcrypto.so.3 from rpm openssl-3.0.9-2.fc38.x86_64    
Module libisns.so.0 from rpm isns-utils-0.101-6.fc38.x86_64                             
Module iscsid from rpm iscsi-initiator-utils-6.2.1.4-10.git2a8f9d8.fc38.x86_64                             
Stack trace of thread 34270:                             
#0  0x00007fc7ef527324 __poll (libc.so.6 + 0x105324)                             
#1  0x0000558cec9fcbc5 main (iscsid + 0x8bc5)                             
#2  0x00007fc7ef449b8a __libc_start_call_main (libc.so.6 + 0x27b8a)                             
#3  0x00007fc7ef449c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)                             
#4  0x0000558cec9fd805 _start (iscsid + 0x9805)                             
ELF object binary architecture: AMD x86-64                             
systemd[1]: iscsid.service: Main process exited, code=dumped, status=6/ABRT                    
User Icon abhis3k commented & provided feedback 8 months ago
karma

Works without issues till now

User Icon fweimer commented & provided feedback 8 months ago

This backtrace is more interesting:

Stack trace of thread 1959:
#0  0x00007f81c8ab0884 __pthread_kill_implementation (libc.so.6 + 0x8e884)
#1  0x00007f81c8a5fafe raise (libc.so.6 + 0x3dafe)
#2  0x00007f81c8a4887f abort (libc.so.6 + 0x2687f)
#3  0x00007f81c8a4879b __assert_fail_base.cold (libc.so.6 + 0x2679b)
#4  0x00007f81c8a58187 __assert_fail (libc.so.6 + 0x36187)
#5  0x00007f81c9030323 krb5int_key_delete (libkrb5support.so.0 + 0x6323)
#6  0x00007f81c86f0e8b gssint_mechglue_fini (libgssapi_krb5.so.2 + 0xee8b)
#7  0x00007f81c91f50f2 _dl_call_fini (ld-linux-x86-64.so.2 + 0x10f2)
#8  0x00007f81c91f8e5e _dl_fini (ld-linux-x86-64.so.2 + 0x4e5e)
#9  0x00007f81c8a621e6 __run_exit_handlers (libc.so.6 + 0x401e6)
#10 0x00007f81c8a6232e exit (libc.so.6 + 0x4032e)
#11 0x00005622a14583bc main (ipa-getkeytab + 0x63bc)
#12 0x00007f81c8a49b8a __libc_start_call_main (libc.so.6 + 0x27b8a)
#13 0x00007f81c8a49c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)
#14 0x00005622a1459bb5 _start (ipa-getkeytab + 0x7bb5)

It has _dl_fini in it, so it's very likely it's caused by the changes in this update.

@adamwill @martinpitt How can I create a VM (or set of VMs) that reproduces this issue? Thanks.

User Icon abbra commented & provided feedback 8 months ago

@fweimer this seems like unloading GSSAPI mechglue plugin after ipa-getkeytab successfully completed.

Judging by cockpit logs:

/usr/libexec/cockpit-certificate-helper: line 86:  2147 Aborted                 (core dumped) ipa-getkeytab -p "HTTP/${HOST}" -k "${KEYTAB}"

this is, again, a very basic 'ipa-getkeytab' operation that attempts to store/delete a key in the keytab: a single process, single thread operation, nothing fancy. I think this is the code corresponding to krb5int_key_delete (there are macro definitions that bring it from k5_key_delete name): https://github.com/krb5/krb5/blob/master/src/util/support/threads.c#L362-L397

User Icon martinpitt commented & provided feedback 8 months ago

As this needs a client machine which is talking to a configured FreeIPA, and thus at least two VMs which talk to each other, this is unfortunately quite involved. If you have a FreeIPA setup, you could just run the ipa-getkeytab command. From scratch, here is how you can reproduce the cockpit test. Note that this is safe -- they don't run as root, don't create/change permanent files on the host (only temp dirs/files) or change the qemu/libvirt config (all transient VMs with socket networking).

You can do this in our development toolbox, so that you don't have to install all the nodejs, libvirt and QEMU packages (if you already have them installed, you can skip this):

toolbox create --image quay.io/cockpit/tasks -c cockpit
toolbox enter cockpit

Then check out cockpit and build a test image:

git clone https://github.com/cockpit-project/cockpit
cd cockpit
test/image-prepare -q fedora-38

This is without updates-testing still. Now run a FreeIPA test:

test/verify/check-system-realms TestKerberos.testNegotiate

This ought to succeed. If not, and it's not obvious why (like, missing libvirt packages or so), please ping me here or on Slack, I'm happy to assist.

Now install the glibc update into the VM:

bots/image-customize -v -r 'dnf upgrade --enablerepo=updates-testing -y --refresh --advisory=FEDORA-2023-7f0a294b1a >&2' fedora-38

Now run the test again, but this time with the -s option, which will make it "sit" on a test failure without cleaning up the VMs:

test/verify/check-system-realms TestKerberos.testNegotiate -s

This should fail with this assertion error, and give you some information how to log in via SSH:

ssh -p 2201 -i bots/machine/identity root@127.0.0.2

Move that terminal to the side -- as soon as you press enter, it'll continue, i.e. clean up all the test VMs.

Inside the test VM, you can now reproduce the crash:

# ipa-getkeytab -p HTTP/x0.cockpit.lan -k /etc/cockpit/krb5.keytab
Keytab successfully retrieved and stored in: /etc/cockpit/krb5.keytab
k5_mutex_lock: Received error 22 (Invalid argument)
ipa-getkeytab: ../../include/k5-thread.h:376: k5_mutex_lock: Assertion `r == 0' failed.
Aborted (core dumped)

This is a Fedora 38 cloud image with some extra packages installed, so you can install debug symbols, run gdb, etc.

User Icon jnyuhas commented & provided feedback 8 months ago
karma

Works great on my Fedora 38 workstation, I was down until this was applied.

BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
User Icon fweimer commented & provided feedback 8 months ago

This is a Fedora 38 cloud image with some extra packages installed, so you can install debug symbols, run gdb, etc.

Thank you. I got to this point and could reproduce the assert, but the VM with ipa-getkeytab does not have a default route. Any idea how to fix that? DHCP assigns 172.27.0.2 for the eth0 interface, but no default route.

User Icon martinpitt commented & provided feedback 8 months ago

Sorry @fweimer, indeed these VMs are offline by default, to make sure none of our tests depends on something outside. Please apply this local hack:

--- test/common/testlib.py
+++ test/common/testlib.py
@@ -1290,7 +1290,7 @@ class MachineCase(unittest.TestCase):
                 if cleanup:
                     self.addCleanup(network.kill)
                 self.network = network
-            networking = self.network.host(restrict=restrict, forward=forward or {})
+            networking = self.network.host(restrict=False, forward=forward or {})
             machine = machine_class(verbose=opts.trace, networking=networking, image=image, **kwargs)
             if opts.fetch and not os.path.exists(machine.image_file):
                 machine.pull(machine.image_file)
User Icon fweimer commented & provided feedback 8 months ago

But I think I see what's wrong with the current ELF destructor ordering approach. I'll experiment with something else.

User Icon fweimer commented & provided feedback 8 months ago

With the new approach:

# ipa-getkeytab -p HTTP/x0.cockpit.lan -k /etc/cockpit/krb5.keytab 
Keytab successfully retrieved and stored in: /etc/cockpit/krb5.keytab

I'll do another build, so that the AnyConnect users can test it as well.

fweimer edited this update.

New build(s):

  • glibc-2.37-7.fc38

Removed build(s):

  • glibc-2.37-6.fc38

Karma has been reset.

8 months ago

This update has been submitted for testing by fweimer.

8 months ago

This update's test gating status has been changed to 'waiting'.

8 months ago
User Icon snehring commented & provided feedback 8 months ago
karma

I'd love to give you some useful feedback on the new build, but something else has broken cisco's vpn agent in the past week unrelated to this seemingly. I can confirm my system is otherwise functional though.

This update's test gating status has been changed to 'passed'.

8 months ago

This update's test gating status has been changed to 'failed'.

8 months ago

This update has been pushed to testing.

8 months ago

This update's test gating status has been changed to 'waiting'.

8 months ago
User Icon martinpitt commented & provided feedback 8 months ago

Many thanks Florian for puzzling this out! Our nightly run still failed, but dnf only "saw" 2.37-6.fc38 on the mirrors still. I'll let you know tomorrow morning. But I'm sure it'll be good, as you tested it on the very thing.

This update's test gating status has been changed to 'passed'.

8 months ago
User Icon besser82 commented & provided feedback 8 months ago
karma

Works great! LGTM! =)

This update can be pushed to stable now if the maintainer wishes

8 months ago
User Icon abbra commented & provided feedback 8 months ago
karma

OpenQA tests for FreeIPA succeeded. The upgrade test showed known SELinux AVCs which are being taken care of already.

BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
User Icon chernoff commented & provided feedback 8 months ago

Should this work on FC37? I find it doesn't install/change anything on my system:

No security updates needed, but 35 updates available Dependencies resolved. Nothing to do. Complete!

and the problem with the VPN remains

User Icon snehring commented & provided feedback 8 months ago

Applied this to another fc38 system and can confirm the vpn issue remains.

BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
User Icon jostra commented & provided feedback 8 months ago

I'm still facing " Termination reason code 59: Connection attempt failed due to certificate problems" with cisco anyconnect after upgrade glibc to glibc-2.37-7.fc38.x86_64

User Icon filiperosset commented & provided feedback 8 months ago
karma

no regressions noted

User Icon martinpitt commented & provided feedback 8 months ago
karma

Cockpit's tests are happy again with glibc 2.37-7.fc38, thank you!

User Icon mpfeiler commented & provided feedback 8 months ago
karma

Thank you for o.m. link "sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-7f0a294b1a" which worked FINE on Sep,19. for glibc 2.37-5.fc38 --> glibc 2.37-6.fc38

systemd-resolve --status #looked like:

Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: uplink Current DNS Server: [ IP local-dns1] DNS Servers: [ IP local-dns2] [ IP local-dns3] [ IP local-dns1] #IP local-dns2+3 are given by vpn-connection and Certificate DNS Domain: ~.

After update (and to glibc 2.37-7.fc38) it DOESN'T work again! and like jostra I'm facing: "in 2 parts overlayed as before: The certificate on the secure gateway is invalid. A VPN connection will not be established. Anyconnect was not able to establish a connection to the specified secure gateway. Please try connecting again. --> Cisco AC SMC Ver 4.10.07073"

systemd-resolve --status #looks as before Global Protocols: LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported resolv.conf mode: uplink

Link 2 (enp0s31f6) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (wlp0s20f3) Current Scopes: DNS LLMNR/IPv4 Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported Current DNS Server: [IP local-dns1] DNS Servers: [ IP local-dns1]

Link 4 (virbr0) Current Scopes: none Protocols: -DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported

and

vpnagentd.service gives errors like:

acvpnagent[108857]: Function: determinePublicAddrCandidateFromDefRoute File: ../../vpn/AgentUtilities/HostConfigMgr.cpp Line: 3057 Invoked Function: CHos> acvpnagent[108857]: Function: updatePotentialPublicAddresses File: ../../vpn/AgentUtilities/HostConfigMgr.cpp Line: 3190 Invoked Function: CHostConfigMgr> acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: GetSettings File: ../../vpn/Agent/ServicePluginMgr.cpp Line: 289 m_pIServicePlugin is NULL acvpnagent[108857]: Function: OnIpcMessageReceived File: ../../vpn/Common/IPC/IPCDepot.cpp Line: 1240 Invoked Function: CIpcTransport::OnSocketReadComplete Return Code: -33292279 (0xFE040009) Description: IPCTRANSPORT_ERROR_UNEXPECTED remote peer: gui

I kindly ask you to provide a final solution!

Thank you in advance!

User Icon adamwill commented & provided feedback 8 months ago

thanks for working out the crash, @fweimer , sorry I didn't respond - I'm bad at checking bodhi email. in future feel free to ping me on matrix. I can fiddle with openqa to disable the limits and get a backtrace in this kinda situation if you need it.

karma

This update has been obsoleted by glibc-2.37-10.fc38.

7 months ago

Please login to add feedback.

Metadata
Type
unspecified
Karma
5
Signed
Content Type
RPM
Test Gating
Settings
Unstable by Karma
-3
Stable by Karma
disabled
Stable by Time
disabled
Dates
submitted
8 months ago
in testing
8 months ago
modified
8 months ago
approved
8 months ago
BZ#2239304 glibc: Revert change to run ELF destructor in reverse constructor order
-1
1

Automated Test Results