A bad (non ECC) DIMM and segfault error 4 in libc.so.6
 
 

A bad (non ECC) DIMM and segfault error 4 in libc.so.6

February 10, 2025
stories

The failure #


2025-02-09T17:59:35.308837+01:00 u32 kernel: [15847.132809] d[1339]: segfault at c0 ip 00007f9881ab043e sp 00007ffcc510b990 error 4 in libc.so.6[7f9881a7a000+155000] likely on CPU 6 (core 2, socket 0)
2025-02-09T17:59:35.308847+01:00 u32 kernel: [15847.132832] Code: 05 00 00 48 89 34 24 89 4c 24 3c 64 48 8b 04 25 28 00 00 00 48 89 84 24 f8 04 00 00 48 8b 05 a9 59 17 00 64 8b 00 89 44 24 54 <8b> 87 c0 00 00 00 85 c0 0f 85 b4 02 00 00 c7 87 c0 00 00 00 ff ff

With a few iterations swapping the SODIMM modules the culprit could be identified (the particular SODIMM type that has failed was this: https://www.kingston.com/datasheets/HX424S14IBK2_32.pdf).

NON ECC memory (dmidecode) #

A non-ECC memory just fails in case of a “bitflip” error and delivers values that haven’t been stored as such. The results are completely unpredictable, in this case we were lucky that the failure was indicated by a segfault (among other strange behaviour like stalls during packet processing).


root@u32:~# dmidecode -t memory
# dmidecode 3.4
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x003D, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: None
	Maximum Capacity: 64 GB
	Error Information Handle: Not Provided
	Number Of Devices: 4

Handle 0x003E, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x003D
	Error Information Handle: Not Provided
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 16 GB
	Form Factor: SODIMM
	Set: None
	Locator: ChannelA-DIMM0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MT/s
	Manufacturer: Kingston
	Serial Number: C79EF94A
	Asset Tag: 9876543210
	Part Number: KHX2400C14S4/16G    
	Rank: 2
	Configured Memory Speed: 2400 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

[truncated]

ECC memory (dmidecode) #

On a system with ECC memory installed (here a Synology NAS) dmidecode -t memory is capable to show any detected memory ECC errors.


ash-4.4# dmidecode -t memory
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.1 present.

Handle 0x0020, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: None
	Maximum Capacity: 8 GB
	Error Information Handle: No Error
	Number Of Devices: 2

Handle 0x0021, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0020
	Error Information Handle: No Error
	Total Width: 64 bits
	Data Width: 64 bits
	Size: 4096 MB
	Form Factor: SODIMM
	Set: None
	Locator: ChannelA-DIMM0
	Bank Locator: BANK 0
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MT/s
	Manufacturer: 08F7
	Serial Number: B039151C
	Asset Tag:
	Part Number: D4SS12161SH26A-C
	Rank: 1
	Configured Memory Speed: 2400 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

[truncated]