Wednesday, January 26, 2011

OOM and Android

Some of you were wondering what the last two lines in my sysctl.conf file are.  They control the behavior of the kernel when it detects a "OOM" event.

# reboot when OOM happens
vm.panic_on_oom = 2
# wait 5 sec before rebooting when OOM
kernel.panic = 5
The reason I wanted my phone to just go ahead reboot is mostly because I had some bad experiences with the OOM Killer in the past on other linux systems.  On my phone, I want my phone to just kernel panic and reboot instead of me having to deal with the potentially crippled phone after the OOM killer is finished with the running OS processes and applications.

So I decided to put my sysctl oom settings to the test.

First, I needed to force the phone to run out of memory completely.  Step one, create a 600MB file:
# dd if=/dev/zero of=/mnt/sdcard/largefile bs=1 count=600000000
Step two, clear dmesg buffer:
# dmesg -c
Step three, create a process that attempts to use 600MB of memory:
# vi /sdcard/largefile
Step four, observe.

Step five, analyze.

So based on my test, it turns out Android kernel's OOM Killer is *not* invoked.  There's a watchdog process that detects that the phone is about to run out of memory and kills my vi process along with a bunch of other processes.  Evidence in dmesg (it's just a snippet):
<4>[79814.464782] select 11855 (.android.tasker), adj 2, size 4694, to kill
<4>[79814.465087] select 11863 (d.process.acore), adj 4, size 7712, to kill
<4>[79814.465667] send sigkill to 11863 (d.process.acore), adj 4, size 7712
<4>[79814.559112] select 11855 (.android.tasker), adj 2, size 4694, to kill
<4>[79814.559204] select 11871 (e.process.gapps), adj 2, size 5449, to kill
<4>[79814.559356] send sigkill to 11871 (e.process.gapps), adj 2, size 5449
And what is process id 4?
root      4     2     0      0     c009a4e8 00000000 S watchdog/0
How many processes did watchdog kill?
# grep -c "to kill" /sdcard/dmesg.log
Yikes.  Anyway, I ran the experiment several times (oom killer on, off, etc.) all producing the same results, and still could not invoke kernel's OOM killer or cause the kernel to panic.  I'm guessing Android developers decided that the watchdog managing oom situation is superior than leaving it to the kernel.  Probably a good thing too - OOM Killer sucks.

So what does this mean?  This means all the oom related settings in sysctl are probably useless.  (It also means that for all intents and purposes the watchdog process is the new oom killer, but outside of the kernel). Because android kernel is derived from a standard linux kernel, there are going to be a number of kernel parameters that are not used, and the oom ones are possibly part of that.  Therefore, my latest sysctl.conf file reflects my latest experiement:
# cat /etc/sysctl.conf

# try to keep at least 4MB in memory
vm.min_free_kbytes = 4096

# favor block cache
vm.dirty_ratio = 90
vm.dirty_background_ratio = 55

# extremely favor file cache
vm.vfs_cache_pressure = 1
# ignore below for now - not used by android
vm.swappiness = 0

vm.panic_on_oom = 2
kernel.panic = 5
 As far as the watchdog process is concerned, I looked around a bit.  Didn't find /etc/watchdog.conf, didn't see an obvious place where it starts up, etc.  Must be part of the android base os somehow.