Sunday, April 4, 2010

Using smartctl to get SMART status information on your hard drives

Computer hard drives today come with SMART (Self-Monitoring, Analysis, and Reporting Technology) built-in, which allows you to see the status or overall “health” of a hard drive.

This information is invaluable in providing early warning signs of problems with a hard drive.

All Linux distributions provide the smartmontools package, which contain the smartctl program used to display SMART information from attached drives.

This package also provides the smartd daemon which periodically polls the drives to obtain SMART information.

Using smartd is essential as it can let you know immediately when a SMART attribute fails.

To begin, edit /etc/smartd.conf and add entries for your drives:
/dev/sda -d ata -H -m root
/dev/sdb -d ata -H -m root
...

The above tells smartd to perform a very silent check and to email the root user if the overall SMART health status fails.

It also tells smartd that these are ATA devices. There are a number of other options that can be added as well; the smartd.conf file has examples of these.

When smartd is configured, make sure to enable the monitoring daemon if it is not already started. On a Red Hat Enterprise Linux system, use:
# chkconfig smartd on
# service smartd start

The smartctl program also allows for you to view and test SMART attributes of a drive. You can quickly check the overall health of a drive by using:
# smartctl -H /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Obtaining information on the drive is useful as well. With the -i option, you can view the type of drive, its serial number, and so forth.

In a system with a lot of drives, having this information recorded can assist in knowing which drive device (i.e., /dev/sda) corresponds with which physical drive. For instance:
# smartctl -i /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3320620AS
Serial Number:    9QF26NGD
Firmware Version: 3.AAJ
User Capacity:    320,072,933,376 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Mar  7 14:20:18 2010 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Next, the -a option shows the specifics of the SMART attributes and test history.

This shows various SMART status information, such as the drive temperature, how many hours it has been powered on, and so forth.

It also indicates when tests have been performed and what the results of those tests were.

Finally, smartctl can be used to initiate long and short tests for the drive. These should be run periodically to do quick, or full, self-tests of the drive:
# smartctl --test=short /dev/sda
# smartctl --test=long /dev/sda
# smartctl -a /dev/sda

The above will first perform a short test of the /dev/sda device. This usually takes about a minute to perform, and the smartctl output will tell you when you can check the results.

Next, the long test: this one can take quite a bit longer (about two hours here on a 320GB SATA drive).

Finally, use the -a option to view the results, which may look like this:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     17877         -
# 2  Extended offline    Completed without error       00%      8449         -
# 3  Short offline       Completed without error       00%      8446         -
# 4  Short offline       Completed without error       00%      1307         -
# 5  Short offline       Completed without error       00%         2         -
# 6  Extended offline    Self-test routine in progress 90%     17877         -

In the above example, tests have been run over the lifetime of the drive and the short offline test was recently completed without error, while there is still 90% of the extended test remaining.

Being pro-active with the health of your hard drives can pay off huge by being aware of most problems before they can lead to catastrophic failures.

While SMART monitoring is not an exact thing (it won’t always report failures before they occur), the chances of catching a problem and being able to retrieve data before replacing a drive are more likely to not result in data loss or other problems than if you did not use it.

No comments:

Post a Comment