Saturday, February 23, 2008

A Linux Tip for Offshore System Adminitrators

Once a Linux box has a hung daemon in /proc, one will not be able to reboot the box in the normal manner. The 'ps' commands will hang - and an 'init 6' will not work because the running processes on the server cannot be killed. Without physical access and remote power control, a SA may use the "Magic System Request" method to force kernel operations like sync, remount of all filesystems read-only, and reboot. Solaris administrators may be familiar sys the 'uadmin' command which also does the same thing.

Here is the short form:

If you're on the console, you must first enable the subsystem with a command:
#echo 1 > /proc/sys/kernel/sysrq
Alternatively, you may use the equivalent sysctl command as follows:
#sysctl -w kernel.sysrq="1"kernel.sysrq = 1
Then you can press Alt+SysRq followed by one of the following commands (and there are many more commands than these):
s Sync Forces a sync, and prints 'OK' to the console when complete
u Umount Try to umount all filesystems & remount read-only
b Boot Reboot the system without killing any processes
Best if you use Alt+SysRq-s and Alt-SysRq-u first to avoid data loss

Similarly, you can also control sysrq via /proc/sysrq-trigger by:
#echo ‘key’ > /proc/sysrq-trigger

Below are some examples:
#echo s > /proc/sysrq-trigger (like Alt+SysRq-s)
#echo u > /proc/sysrq-trigger (like Alt+SysRq-u)
#echo b > /proc/sysrq-trigger (like Alt+SysRq-b)

Here is a list of ‘key’ available:
'r' - Turns off keyboard raw mode and sets it to XLATE.
'k' - Secure Access Key (SAK) Kills all programs on the current virtual console.

NOTE: See important comments below in SAK section.
'b' - Will immediately reboot the system without syncing or unmounting your disks.
'c' - Intentionally crash the system without syncing or unmounting your disks.
'o' - Will shut your system off (if configured and supported).
's' - Will attempt to sync all mounted filesystems.
'u' - Will attempt to remount all mounted filesystems read-only.
'p' - Will dump the current registers and flags to your console.
't' - Will dump a list of current tasks and their information to your console.
'm' - Will dump current memory info to your console.
'0'-'9' - Sets the console log level, controlling which kernel messages will be printed to your console.
'0', for example would make it so that only emergency messages like PANICs or OOPSes would make it to your console.
'e' - Send a SIGTERM to all processes, except for init.
'i' - Send a SIGKILL to all processes, except for init.
'l' - Send a SIGKILL to all processes, INCLUDING init. (Your system will be non-functional after this.)
'h' - Will display help ( actually any other key than those listed above will display help.

See sysrq.txt somewhere underneath /usr/src/linux-XXX/Documentation for more information.

No comments: