Linus Tovalds blames Intel for killing ECC RAM in consumer systems

This site may earn affiliate commissions from the links on this page. Terms of use.

Linus Torvalds is not happy with the way Intel has handled the ECC (Error Correcting Code) support, and he blames the silicon giant for actually killing the technology outside of servers. ECC memory is used to detect and correct single bit errors in the memory. It cannot fix multibit errors, but just repairing single bits can make a significant difference to system stability.

There was a time when you could buy ECC support on regular power systems, but Intel phased out the capability on non-Xeon platforms a number of years ago. The 975X was perhaps the last Intel platform for consumers to support it, and the family was introduced 15 years ago. The Xeon 3450 chipset was cross-compatible with certain high-quality processors in the Nehalem family, but it’s still a Xeon chipset – not a main body.

As a result, support for ECC in consumer products – and the availability of ECC RAM for consumer products – both fell off a cliff. Linus sums up his case in a rather long position, arguing that Rowhammer’s persistent persistence and the fact that single – bit errors never disappeared to declare Intel’s ECC policy bad and wrong. He actually takes over the entire DRAM industry and writes:

The memory manufacturers claim that this is due to economy and lower power. And they lie villains – let me point out again to the hammer how the problems have been around for a few generations, but these f * ckers happily sold broken hardware to consumers, claiming it was an ‘attack’, when it always was was’ We cut corners.

Torvalds also refers to numerous incidents of core “options” which, according to him, can be better explained by a hardware error. While it’s difficult to obtain objective data on this kind of thing, a 2009 Google report on memory errors provides proof that it’s right, although a 2009 article may obviously have limited applicability to DDR4 RAM by 2020.

Image by Wikimedia Commons, by Kjerish. CC BY-SA 4.0

Google’s conclusion from 2009 was simple: ‘We have found that the incidence of memory errors and the magnitude of the error rates in different DIMMs (dual in-line memory modules) are much higher than previously reported … Memory errors are not uncommon not.’ The team detected bugs that it described as “larger order than previously reported.”

They conclude: ‘error-correcting codes are crucial in reducing the large number of memory errors to a manageable number of irreparable errors.’

AMD’s current limited value support

On paper, AMD’s Ryzen family unofficially supports ECC (Threadripper has official ECC support). As Ian Cutress later points out in the thread, just because a motherboard claims that ECC support does not mean that support is enabled. We do not occur very often in this situation, but CPUs and motherboards report their various function sets via registers, which applications such as CPUID then look to determine and report which functions a chip supports. An application that claims to be checked to ensure that a given function is supported (SSE, AVX, ECC, etc.), can only register what the CPU or motherboard claims about its own operation via register flags. It can actually not check if support exists unless the application contains a function test, such as a small measure that literally cannot work unless AVX support is functional.

Because AMD’s support is unofficial, it means no one is more than a whip on OEMs to make sure they apply the feature properly, and that they do not test to make sure the feature works. Because it is possible to set up a bit for ‘Support ECC’ in a motherboard registry without actually implementing functional ECC, there are motherboards that claim to support the standard and appear to do so if you use it with ‘ scan a tool but do not implement ECC at all. The only way to ensure that ECC compatibility works on an AMD Ryzen motherboard is to use a tool that forces an ECC error.

Whether this will happen or the feature will return to Intel desktops or officially launch for Ryzen is unclear. It would require a purchase of memory manufacturers, and it is not clear that many people in the computer market would strive for it. Most people buy at bargain prices, and because you never know about the PC accidents you do not have, it is difficult to sell people to the benefits. Once again, we’ll see the x86 CPU manufacturers face much tougher ARM challenges over the next 2-5 years than we’ve ever seen before. It would not be surprising to see that Intel and / or AMD ‘rediscover’ some features, especially if these features enable them to demand increased stability compared to previous products.

Function image shows registered DDR4-2133 DIMMs. Registered DIMMs also support ECC, but it is also possible to find unbuffered ECC RAM.

Read now:

Source