Home Maintenance of HDD in GNU/Linux
Post
Cancel

Maintenance of HDD in GNU/Linux

A hard disk drive (HDD) can suffer several types of failures: logical failures, mechanical failures or firmware failures. Since we cannot avoid mechanical or firmware failures, we will try to avoid (or minimize) logical failures.

It’s a good practice to check the health of your HDD (Hard Disk Drives) from time to time and repair them if neccesary. It will avoid a lot of data loss and headaches.

The process can take anywhere from a few minutes to a few hours, but it’s worth it. Also, unless it is your main HDD, you can continue working while the disk is being checked and fixed.

How can we check and fix our HDD in GNU/linux?

Umounting the disk

First of all is know the device assigned to the disk we want to check. You can know the device assigned using fdisk -l or lsblk. Let’s say our disk is /dev/sdb.

As the disk should be umounted to be able to run fsck, now, we need to umount it:

1
$ sudo umount /dev/sdb

Check hard drive health using smartctl

smartctl is a utility contained on the smartmontools package. smartctl serves for check the HDD S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology) attributes and is the utility that we’ll use for run some tests and check our HDD overall status.

Check if SMART is enabled

1
$ sudo smartctl -i /dev/sdb | grep support 

Our output should be:

1
2
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

But, if our disk has SMART available and disabled, we should enable it with:

1
$ sudo smartctl -s on /dev/sdb

Check the disk status

1
$ sudo smartctl -a /dev/sda

There’s a lot of info displayed, but we should pay special attention to the next fields:

  • Reallocated_Sector_Ct (Reallocated Sectors Count): The raw value represents a count of the bad sectors that have been found and remapped.
  • Power_On_Hours: Count of hours in power-on state. It’s not useful to check for errors, but it’s useful to get an idea of the hours of life that the disk has left.
  • Reported_Uncorrect: Reported Uncorrectable Errors. The count of errors that could not be recovered using hardware.
  • Current_Pending_Sector: Count of “unstable” sectors (waiting to be remapped, because of unrecoverable read errors).

If the RAW_VALUE is greater than 0 for any of these fields, we should backup our files (if necessary) and we should fix the disk later.

Estimate the test time

The smartctl utility can perform a variety of tests:

  • offline: A short foreground test of less than two minutes.
  • short: Runs SMART Short Self Test (usually under ten minutes).
  • long: A more accurate version of the “short” test. Could take a few hours.
  • conveyance: Checks for possible damages occurred during the transportation of the device. Should take a few minutes.

And we can known the estimated duration of the tests executing:

1
$ sudo smartctl -c /dev/sdb

Test the disk

I prefer to run the long test as it will give us a better overall disk health.

1
$ sudo smartctl -t long /dev/sdb

Once executed, smartctl will give us the neccesary time to complete the test:

1
Please wait 303 minutes for test to complete.

And, we can always cancel the test execution:

1
Use smartctl -X to abort test.

After the time specified by smartctl we can check the test results with:

1
$ sudo smartctl -a /dev/sdb

or

1
$ sudo smartctl -l selftest /dev/sdb

Fix the filesystem using fsck

The disk should be umounted to be able to run fsck

fsck (File System Consistency Check) comes by default on GNU/Linux distributions. fsck is used to check to check and, optionally, repair one or more Linux filesystems.

Check the partitions

Let’s say we want repair our /dev/sdb1 partition.

Sometimes the disk is marked as clean, but we know for sure that the disk has some damage, because we had errors using it. So, we can force a check on the partition:

1
$ sudo fsck -f /dev/sdb1

Don’t worry, this test is fast ;)

Fix the filesystem automatically

The most confortable wat to repair the this is do it in “autopilot mode” or automatically. We can do this in two ways:

  • Automatic repair (no questions):
    1
    
    $ sudo fsck -p /dev/sdb1
    
  • Assume “yes” to all questions:
    1
    
    $ sudo fsck -y /dev/sdb1
    

Enjoy! ;)

This post is licensed under CC BY 4.0 by the author.