This needs an update to the lorax templates. This change will have to be applied to f40: https://gitlab.com/redhat/centos-stream/rpms/lorax-templates-rhel/-/merge_requests/66
@markec It's a left-over from the previous version, not the “Running post-uninstall scriptlet: glibc-gconv-extra-0:2.40-3.fc41.x86_64” part.
That was what I ment with incompatible with libglvnd. Thanks for these amazing tests!
I've got a new build that hopefully fixes this: glibc-2.40.9000-13.fc42
Upstream discussion will happen here: [PATCH 1/2] Revert "elf: Run constructors on cyclic recursive dlopen (bug 31986)"
(I thought I had given negative karma before, but maybe I can't do that for my own updates?)
Update is incompatible with libglvnd.
Not built against -31.
The kernel.s390x scratch build failure appears to be an infrastructure issue:
$ git clone -n https://src.fedoraproject.org/rpms/kernel.git /var/lib/mock/f40-build-side-86585-49916704-5958020/root/chroot_tmpdir/scmroot/kernel
Cloning into '/var/lib/mock/f40-build-side-86585-49916704-5958020/root/chroot_tmpdir/scmroot/kernel'...
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output
https://kojipkgs.fedoraproject.org//work/tasks/4215/115504215/checkout.log
The fedora-ci.koji-build.tier0.functionalfailure is not diagnosable because the URL is broken.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2259333 in my testing.
Should not be pushed.
Hmph, I see it. I misinterpreted the nature of the ns-slapd bug, and the upstream workaround I pushed does not actually work around it. Hmmph. I guess I'll need another fix.
Before the workaround, glibc had this loop:
220 while (cmp (run_ptr, tmp_ptr, arg) < 0)
221 tmp_ptr -= size;
The loop is known to terminate if the comparison function is correct because eventually, run_ptr == tmp_ptr, and cmp must return zero. If that never happens, we eventually run into non-allocated memory regions. The only access to that memory is from the cmp function here, not from the qsort implementation, so that crash will happen in the comparison callback.
Thanks. The comparison function can never return zero: https://github.com/389ds/389-ds-base/blob/main/ldap/servers/plugins/cos/cos_cache.c#L2933
This is clearly a 389-ds-base bug. The old qsort implementation in glibc did not tickle it because it rarely called the comparison function with equal pointer arguments. We already worked around similar application problems in other places in the new implementation, we can probably do it in the insertion sort phase as well.
With the new approach:
# ipa-getkeytab -p HTTP/x0.cockpit.lan -k /etc/cockpit/krb5.keytab
Keytab successfully retrieved and stored in: /etc/cockpit/krb5.keytab
I'll do another build, so that the AnyConnect users can test it as well.
But I think I see what's wrong with the current ELF destructor ordering approach. I'll experiment with something else.
This is a Fedora 38 cloud image with some extra packages installed, so you can install debug symbols, run gdb, etc.
Thank you. I got to this point and could reproduce the assert, but the VM with ipa-getkeytab does not have a default route. Any idea how to fix that? DHCP assigns 172.27.0.2 for the eth0 interface, but no default route.
This backtrace is more interesting:
Stack trace of thread 1959:
#0 0x00007f81c8ab0884 __pthread_kill_implementation (libc.so.6 + 0x8e884)
#1 0x00007f81c8a5fafe raise (libc.so.6 + 0x3dafe)
#2 0x00007f81c8a4887f abort (libc.so.6 + 0x2687f)
#3 0x00007f81c8a4879b __assert_fail_base.cold (libc.so.6 + 0x2679b)
#4 0x00007f81c8a58187 __assert_fail (libc.so.6 + 0x36187)
#5 0x00007f81c9030323 krb5int_key_delete (libkrb5support.so.0 + 0x6323)
#6 0x00007f81c86f0e8b gssint_mechglue_fini (libgssapi_krb5.so.2 + 0xee8b)
#7 0x00007f81c91f50f2 _dl_call_fini (ld-linux-x86-64.so.2 + 0x10f2)
#8 0x00007f81c91f8e5e _dl_fini (ld-linux-x86-64.so.2 + 0x4e5e)
#9 0x00007f81c8a621e6 __run_exit_handlers (libc.so.6 + 0x401e6)
#10 0x00007f81c8a6232e exit (libc.so.6 + 0x4032e)
#11 0x00005622a14583bc main (ipa-getkeytab + 0x63bc)
#12 0x00007f81c8a49b8a __libc_start_call_main (libc.so.6 + 0x27b8a)
#13 0x00007f81c8a49c4b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c4b)
#14 0x00005622a1459bb5 _start (ipa-getkeytab + 0x7bb5)
It has _dl_fini in it, so it's very likely it's caused by the changes in this update.
@adamwill @martinpitt How can I create a VM (or set of VMs) that reproduces this issue? Thanks.
@adamwill How can we reproduce this in an environment where we can run the failing process under a debugger, or with certain environment variables configured? Thanks.
The bz699724 test is recently added and apparently still under development, so I'm not particularly worried about it. It still needs porting to Python 3.
@adamwill Which failure specific worries you? I have trouble finding it in the results.
Looks like one of the tests runs against rawhide:
Not sure if this expected to work because rawhide has DNF 5.