Tuesday, June 13, 2006

System Administrators Toolkit: Monitoring a slow system

The console is more likely to allow you to log in, because there will already be a login process (which will be replaced with your shell) running. If, once you log in, you are unable to run any processes through your shell, it indicates that your system has run out of process space; a reboot will probably be the only way of returning your system to normal.

Connecting to the console via the serial port is very important when dealing with servers. Their have been occasions when a system wasn't responding remotely and I had to log in via the serial port to kill a process. Usually its a run away process taking up too much memory, not allowing other processes to spawn (Hence, you can't ssh into the server). Its quite easy to redirect console output in Linux. However, you can't control things like powering down a server, or rebooting a server that doesn't respond.

Some sun servers come with the Light Owl Management (lom) which allows you to poweron or restart a server via the console which works over a serial port. Very helpful for administration purposes.

