Recovering from a freeze in Linux

Ever had your Linux system freeze on you?  The mouse either doesn’t move, doesn’t register clicks, or both?  The keyboard seems completely unresponsive?  Can’t think of any way to regain control of your system except to press the power button?  Worried you might corrupt the filesystem if you do?  Fear not.

solidiceThe Linux kernel is a very robust piece of software, and it’s extremely rare that a user can actually cause it to break in such a way that is completely unrecoverable.

The Graphic User Interface (GUI) however, is not built to the same standard.  Nor are extensions to it.  Nor are applications.  Every-so-often one of these may get a bit confused, and ‘lock up’ or ‘freeze’ your system.

If your system ‘freezes’ then the chances are that the kernel is running happily in the background, and that one (or more) of the other components are responsible.  You can spend a lot of time trying to work out what processes are responsible and try to surgically remove them from the equation, or you can just bypass all of them, establish a direct line of communication with the kernel, then instruct it to gracefully reboot your machine in such a way that minimises the possibility of filesystem corruption.

I, personally, never have large amounts of unsaved work at any point in time, so a quick and safe reboot makes far, far more sense to me.  The rest of this post assumes you think the same way.

Step 1

The Linux kernel’s keyboard driver is the software responsible for turning physical keystrokes on your keyboard into events that the kernel and other pieces of software can act upon.  The driver can be in one of four modes:

  • Scancode (aka RAW)
  • Keycode (aka MEDIUMRAW)
  • ASCII (aka XLATE)
  • UTF-8 (aka Unicode)

Software is usually designed to only understand one of the keyboard modes.  If, for whatever reason, the keyboard mode has been switched to something else, then the software will no longer be able to understand what is coming out of the keyboard.  This is one of the ways that keyboards become “unresponsive” — the mode has changed and output is in a different ‘language’ that the software in front of you no longer understands.

To proceed with the recovery process, the very first thing we want to do is set the keyboard mode to something that the Linux kernel can understand.

Most keyboards have a print screen|sysrq key.  If you press it by itself it takes a screen shot.  If you hold down alt and press it then the very next key you press will get sent as a ‘System Request’ straight to the kernel.

alt-sysrq-r sends the keystroke ‘r’ as a system request to the kernel.  The kernel, upon receiving the keystroke ‘r’ as a system request, will reset the keyboard driver to a specific (XLATE) mode.

XLATE just means ‘translate’ — meaning that the keyboard driver will translate the raw key/scan codes into ASCII codes.

From this point on, we can be sure that the kernel will understand future keystrokes and that they should have the intended effect.

If it makes it easier, think of alt-sysrq-r as simply meaning “reset the keyboard”.

Step 2

One (or more) processes have compromised your system and caused it to freeze.  You don’t know how many there are.  You don’t know where they are.  You don’t know when they’ll strike next.  The best response?

Blow them all away.

alt-sysrq-e instructs the kernel to send the SIGTERM signal to all processes (with the exception of init).

SIGTERM is a termination signal that can be caught by processes.  Properly written programs will react to the signal by gracefully shutting down.  This may involve flushing buffers, closing any open filehandles, ending persistent HTTP connections, and so-on.

If you happen to have a program open at the time (e.g. a text editor) that program may (or may not) save any recent changes before shutting down.  There is no enforced standard, however, so one cannot generalise as to how individual programs will behave.

Think of alt-sysrq-e as simply meaning “encourage all processes to shut down”.

Step 3

Not all processes are well-behaved.  Some ignore SIGTERMs completely.  Some are just too slow to react.  The process(es) responsible for freezing your system may actually be in such a state that they are incapable of catching a SIGTERM.  The next step is to obliterate those remaining processes.

alt-sysrq-i instructs the kernel to send the SIGKILL signal to all remaining processes — resulting in their immediate termination.

Whilst SIGTERM is a catchable signal sent to individual processes that gives them a chance to gracefully shut down on their own terms, SIGKILL signals are actually handled by the scheduler within the kernel and forced upon the processes — CPU time slices are denied and death immediately follows.  Processes terminated in such a way do not exit gracefully, won’t be able to close any open filehandles, and so-on.

Think of alt-sysrq-i as simply meaning “insist that all processes shut down”.

At this stage nothing but the kernel (and the init process) are left.

Step 4

Since no random processes are running any more, the system is now completely under the kernel’s control.  No new (unexpected) reads/writes to filesystems can occur.

Flushing out buffers/caches gives the best chance of incomplete filesystem operations actually being completed.  This is called ‘syncing‘ memory and storage.

alt-sysrq-s instructs the kernel to sync all mounted filesystems.

At this point we’ve done about as much as we can to maintain data and filesystem integrity.

Step 5

Filesystems are normally mounted in RW (Read/Write) mode.  Since there is nothing left that can write to a filesystem, we want to unmount all of them and remount them in RO (ReadOnly) mode.

alt-sysrq-u instructs the kernel to unmount and remount all filesystems in RO mode.

The only reason we want to remount the filesystems is because we have one last command to run.

Step 6

All processes are gone, all filesystems are synced and now in a protected RO mode.  It’s time to bounce the box.

alt-sysrq-b instructs the kernel to reboot the system.

Note that ‘b’ does nothing but immediately reboot the system.  It doesn’t sync or unmount filesystems.  That’s why we needed to perform those operations in previous steps.  Skipping previous steps — jumping straight to alt-sysrq-b — is a Bad Idea™.

Step 7

There is no Step 7.  You’re done.  Your system should reboot and you should be back at your desktop in no time.

All computer systems glitch occasionally — they are very complex beasts.  The vast majority of the time the reason is transient and will not recur.  If you use Linux as a desktop OS, and mess with it in the same way that most folks mess with it, then you can probably expect a freeze once or twice a year.  Nothing to worry about — stay calm, reboot, and carry on.

If you are experiencing more frequent freezes, however, then you should really troubleshoot the problem and sort it out.  Lots (and lots) of websites already exist to help troubleshoot specific problems — make use of those.

The purpose of this post was simply to give you a virtually sure-fire way of recovering your system without having to crash it.

  1. alt-sysrq-r
  2. alt-sysrq-e
  3. alt-sysrq-i
  4. alt-sysrq-s
  5. alt-sysrq-u
  6. alt-sysrq-b

Or, more succinctly:

alt – sysrq – r – e – i – s – u – b

Wait a few seconds (say three or four) between typing each of the letters to give a chance for the commands to be completed, and you should be fine.

Take it easy.

PS:  Some distributions block certain system request commands because they can (potentially) be exploited by someone with physical access to your keyboard.  If you issue the command sysctl kernel.sysrq in a terminal and get any numeric answer except for “1” then this is the case on your system.  Ubuntu 18.04, for example, returns “176” — which means that the ‘e’ and ‘i’ commands are blocked.  If you want to restore full access to system request commands, use sudo to edit the file /etc/sysctl.conf and uncomment the line that reads #kernel.sysrq=1.  It will take effect the next time you boot.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s