A bad (non ECC) DIMM and segfault error 4 in libc.so.6
February 10, 2025
The failure #
2025-02-09T17:59:35.308837+01:00 u32 kernel: [15847.132809] d[1339]: segfault at c0 ip 00007f9881ab043e sp 00007ffcc510b990 error 4 in libc.so.6[7f9881a7a000+155000] likely on CPU 6 (core 2, socket 0)
2025-02-09T17:59:35.308847+01:00 u32 kernel: [15847.132832] Code: 05 00 00 48 89 34 24 89 4c 24 3c 64 48 8b 04 25 28 00 00 00 48 89 84 24 f8 04 00 00 48 8b 05 a9 59 17 00 64 8b 00 89 44 24 54 <8b> 87 c0 00 00 00 85 c0 0f 85 b4 02 00 00 c7 87 c0 00 00 00 ff ff
With a few iterations swapping the SODIMM modules the culprit could be identified (the particular SODIMM type that has failed was this: https://www.kingston.com/datasheets/HX424S14IBK2_32.pdf).
NON ECC memory (dmidecode) #
A non-ECC memory just fails in case of a “bitflip” error and delivers values that haven’t been stored as such. The results are completely unpredictable, in this case we were lucky that the failure was indicated by a segfault (among other strange behaviour like stalls during packet processing).
root@u32:~# dmidecode -t memory
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.
Handle 0x003D, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 64 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x003E, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x003D
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 16 GB
Form Factor: SODIMM
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2400 MT/s
Manufacturer: Kingston
Serial Number: C79EF94A
Asset Tag: 9876543210
Part Number: KHX2400C14S4/16G
Rank: 2
Configured Memory Speed: 2400 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
[truncated]
ECC memory (dmidecode) #
On a system with ECC memory installed (here a Synology NAS) dmidecode -t memory
is capable
to show any detected memory ECC errors.
ash-4.4# dmidecode -t memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.1 present.
Handle 0x0020, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 8 GB
Error Information Handle: No Error
Number Of Devices: 2
Handle 0x0021, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x0020
Error Information Handle: No Error
Total Width: 64 bits
Data Width: 64 bits
Size: 4096 MB
Form Factor: SODIMM
Set: None
Locator: ChannelA-DIMM0
Bank Locator: BANK 0
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 2400 MT/s
Manufacturer: 08F7
Serial Number: B039151C
Asset Tag:
Part Number: D4SS12161SH26A-C
Rank: 1
Configured Memory Speed: 2400 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V
[truncated]