Linux: When massive load on the disk makes the system freeze

The bad news are that Linux has been behaving pretty badly when the disk is under heavy load. The good news are that all you need is a kernel upgrade.
First, let’s run a simple experiment:
$ dd if=/dev/zero of=junkfile bs=1k count=20M
Don’t expect anything dramatic to happen right away. It takes half a minute or so. In the meanwhile, on another command prompt, let’s keep track on the memory info:
$ watch cat /proc/meminfo
What will soon happen, is that MemFree goes low and Cached goes high. This is normal behavior: As the dd command floods the disk queue with blocks to be written, unused memory is allocated as temporary buffers for the data just to be written or just written. Which is a sensible thing to do, given that the memory is unused anyhow, so why not remember what we just wrote, in case someone asks to read it?
So far so good. The trouble begins when MemFree gets really low, say around 50000 kB, and the heavy disk load continues. For reasons beyond me, this makes the system stall, and then recover when the disk load is stopped. For example, our watch program stops updating. Have a look on the time stamp at the upper right, and see that it freezes for several seconds.
If the memory goes low on your system, but the system doesn’t get stuck, nevermind. If it does, you may need the patch for vmscan.c, or just upgrade to the current stable kernel (2.6.36 when these lines were written).
A simple way to check if your kernel has this fixed or not, is to look for the should_reclaim_stall() function is mm/vmscan.c in your kernel source. If the function is there, you’re fine. If not, and you’re running on some 2.6.32 or something like that, it’s very likely that you need the fix.


