A hard disk drive (HDD) can suffer several types of failures: logical failures, mechanical failures or firmware failures. Since we cannot avoid mechanical or firmware failures, we will try to avoid (or minimize) logical failures.
It’s a good practice to check the health of your HDD (Hard Disk Drives) from time to time and repair them if neccesary. It will avoid a lot of data loss and headaches.
The process can take anywhere from a few minutes to a few hours, but it’s worth it. Also, unless it is your main HDD, you can continue working while the disk is being checked and fixed.
How can we check and fix our HDD in GNU/linux?
Umounting the disk
First of all is know the device assigned to the disk we want to check. You can know the device assigned using fdisk -l
or lsblk
. Let’s say our disk is /dev/sdb
.
As the disk should be umounted to be able to run fsck, now, we need to umount it:
1
$ sudo umount /dev/sdb
Check hard drive health using smartctl
smartctl
is a utility contained on the smartmontools
package. smartctl
serves for check the HDD S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology) attributes and is the utility that we’ll use for run some tests and check our HDD overall status.
Check if SMART is enabled
1
$ sudo smartctl -i /dev/sdb | grep support
Our output should be:
1
2
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
But, if our disk has SMART available and disabled, we should enable it with:
1
$ sudo smartctl -s on /dev/sdb
Check the disk status
1
$ sudo smartctl -a /dev/sda
There’s a lot of info displayed, but we should pay special attention to the next fields:
- Reallocated_Sector_Ct (Reallocated Sectors Count): The raw value represents a count of the bad sectors that have been found and remapped.
- Power_On_Hours: Count of hours in power-on state. It’s not useful to check for errors, but it’s useful to get an idea of the hours of life that the disk has left.
- Reported_Uncorrect: Reported Uncorrectable Errors. The count of errors that could not be recovered using hardware.
- Current_Pending_Sector: Count of “unstable” sectors (waiting to be remapped, because of unrecoverable read errors).
If the RAW_VALUE
is greater than 0 for any of these fields, we should backup our files (if necessary) and we should fix the disk later.
Estimate the test time
The smartctl
utility can perform a variety of tests:
- offline: A short foreground test of less than two minutes.
- short: Runs SMART Short Self Test (usually under ten minutes).
- long: A more accurate version of the “short” test. Could take a few hours.
- conveyance: Checks for possible damages occurred during the transportation of the device. Should take a few minutes.
And we can known the estimated duration of the tests executing:
1
$ sudo smartctl -c /dev/sdb
Test the disk
I prefer to run the long test as it will give us a better overall disk health.
1
$ sudo smartctl -t long /dev/sdb
Once executed, smartctl
will give us the neccesary time to complete the test:
1
Please wait 303 minutes for test to complete.
And, we can always cancel the test execution:
1
Use smartctl -X to abort test.
After the time specified by smartctl
we can check the test results with:
1
$ sudo smartctl -a /dev/sdb
or
1
$ sudo smartctl -l selftest /dev/sdb
Fix the filesystem using fsck
The disk should be umounted to be able to run fsck
fsck
(File System Consistency Check) comes by default on GNU/Linux distributions. fsck
is used to check to check and, optionally, repair one or more Linux filesystems.
Check the partitions
Let’s say we want repair our /dev/sdb1
partition.
Sometimes the disk is marked as clean, but we know for sure that the disk has some damage, because we had errors using it. So, we can force a check on the partition:
1
$ sudo fsck -f /dev/sdb1
Don’t worry, this test is fast ;)
Fix the filesystem automatically
The most confortable wat to repair the this is do it in “autopilot mode” or automatically. We can do this in two ways:
- Automatic repair (no questions):
1
$ sudo fsck -p /dev/sdb1
- Assume “yes” to all questions:
1
$ sudo fsck -y /dev/sdb1
Enjoy! ;)