Sunday, February 24, 2013

Ubuntu Performance – Troubleshooting

http://pinehead.tv/linux/ubuntu-performance-troubleshooting


So now that our Linux distribution is installed, things just don’t seem “right”. Today we will be talking about ways to troubleshoot performance on our Linux installation. Although today’s article will have a decidedly Ubuntu slant, almost everything we will discuss equally applies to every distribution. If there are distribution specific notes for any of the commands, I will make an effort to point that out (or feel free to leave anything you notice in the comments and I will include them as appropriate).

Have I Forgotten Anything?
You dropped a bit of money on your configuration, springing for that additional 8gb of memory, so why is your installation running so slow? Well, quickest way to be sure that your system sees all the memory you have installed is the ‘free’ command. By default, this command will list all the memory (physical or swap) it “sees” installed on your system (keep in mind that may not be all that is installed, just what the system sees). Here is the cleanest way to see what you are after:
free -h --si
Which will show something like this:
Which will show something like this:
The parameters ‘-h –si’ after our command tell it to display the output in ‘human readable’ format (automatically picking the most appropriate type, megabytes, gigabytes, terabytes) and use the ’1000′ vs. ’1024′ definition of mega/giga/terabyte. If what you see here does not match up to what you know is physically installed on your computer, you have a few things you can try. The first, be sure you installed the 64bit version of your distribution if you have more than 4gb of memory installed, otherwise you will only see up to 4gb of space (if you must have the 32bit version installed AND need more than 4gb of memory, you can install a special kernel called ‘PAE’, which stands for ‘Physical Address Extension’ – it will allow your system to see memory above 4gb on a 32bit OS).
If your system still does not see the full amount of memory after you update your kernel to PAE (32bit) or reinstall your distribution (64bit), then you will want to shut off your system and reseat your memory. You can also pull all memory except one chip and through the process of elimination determine if you have a chip hardware issue.
What Is Going On Anyway?
A more powerful tool to determine what is going on on your new installation, is ‘atsar’ (note: most other distributions have the same tool but it is called ‘sar’ like the original Berkley Unix tool). This application can give you statistics on memory, CPU, load, network, threads, sockets, errors, swapping, etc. The quickest way to get a “full” picture is as follows:
atsar -A
This will give you a scattershot readout of pretty much everything on your system like this (the screen shot below is only partial, the full readout is MUCH longer):
This will give you a scattershot readout of pretty much everything on your system like this (the screen shot below is only partial, the full readout is MUCH longer):
The ‘-A’ parameter means ‘show me everything you possibly can’ and can be a good way to get a full system view at a glance to see if anything sticks out (i.e. are you seeing a lot of swapping? why is MySQL using all that CPU? why are there so many threads for Apache? my system load is what?). This can help you zero in on an area that you need a little more information on.
Is It Inside or Outside?
We have an indication now that something is going on from our ‘atsar’ report above. There are a large number of IOWaits on the system, where are they coming from? Well, IO can be disk related (read/write) or network related (send/receive). We can drill down into the statistics using ‘iostat’ as follows:
iostat -h -p ALL
To show the following long output
To show the following long output:
To show the following long output:
This will show you (in human readable format, the ‘-h’ parameter again) all network, memory and disk devices (including Samba or NFS mounts if you have them) and their transactions, reads and writes per second. If you want to see a constant stream of this information (or, more usefully, output it to a file) over a period of time, add a whole number to the end (so ‘iostat -h -t -p ALL 5 > results.txt’ for example would generate this report once every five seconds, each section with a start time, and save it to a file – note this will continue until you ‘CTRL-C’ the process or kill it if you run in in the background).
At this point (memory, CPU, load, network, disk, IO), you should have some idea as to what is happening on your system. Don’t forget to use our friend good old ‘top’ to see exactly what processes are running, this will help you correlate what is running with the type of performance metrics you have observed during our exercises.
Locked Up Plain and Simple
Sometimes, especially when troubleshooting, you will find that you might have done something to make the situation worse (you killed off the wrong process, which locked XWindows, etc). You don’t seem to be able to do anything at all. So, you can always do the ‘CTRL-ALT-F1′ drill and see if you can get a plain text shell. If you can, you can simply reboot and try again (‘sudo reboot’). Sometimes, even that won’t work (and seems to be broken more often than not in Ubuntu 12.04/12.10, particularly when running Unity for some reason).
Here is a little know trick that will save you from having to power off your system and hoping that filesystem journaling in EXT3/4 saves your bacon from file corruption – REISUB. This is the safest alternative to a cold boot and almost always works no matter how ‘locked’ your system is. You perform this feat of magic as follows:
While holding ALT and the SYSREQ(PRINT SC) key, type R E I S U B
Now, a couple of considerations. First, your keyboard has to have that key, some more modern or compact keyboards have eliminated it. If yours doesn’t have it, this won’t work for obvious reasons. Second, don’t just type those letters as fast as possible. Since they each perform an action, allow five seconds or so between each one so they can complete their work. Specifically, the letters stand for:

R = Switch to XLATE
E= Terminate signal to all running processes except INIT
I= Kill all processes except init (for those that don't respond to terminate)
S= Sync all filesystems
U= Remount filesystems read-only
B= Reboot the system

This little trick has been forgotten, almost lost to history. I find the easiest way to remember the sequence as Reboot Even If System Utterly Broken. I have heard others refer to it as ‘BUSIER’ backwards, but that just seems too easy for me.
Final Thoughts
Like many things in the internet age, the ability to troubleshoot problems is becoming a lost art. Just like this article, there are hundreds of places to look up the answer you need. However, just knowing some of the basic commands and the sequence of effective troubleshooting can save you time and may just get you that next position. Hit us up in the comments below with your experiences and tips and maybe we can follow this up with another article on the topic.

No comments:

Post a Comment