Rescue Crashed Synology Volume

One of the 1.5TB WD15EARS (Western Digital Green) drives on the Synology DS410 running firmware DSM 4.2-3211 failed earlier this week. The RAID5 volume recovery process ended up being a little tricky thanks to ext3 journal corruption on the actual filesystem “/volume1” block device /dev/md2.

Screen Shot 2013-08-20 at 2.13.15 PM

The standard steps for replacing the failed drive is simple:

  1. Turn off the beeping sound
  2. Removed failed drive
  3. Replace failed drive with same or greater storage capacity
  4. Run “extended” SMART tests on the replacement drive. Takes about a day to complete the test followed by another day for RAID sync
  5. Repair the volume

After replacing the failed drive and repairing the RAID volume, “/volume1” ended up with ext3 journal inconsistencies. An “e2fsck” had to be manually performed to check the filesystem for inconsistencies and recover the journal from the command line.

  1. Shutdown down all running processes
  2. Unmount /volume1 (on /dev/md2)
  3. Unmount /opt (also on /dev/md2). There is a running sshd which prevented me from unmounting /opt. The fix was to run an alternate sshd from /usr/sbin/ on an alternate port and kill the original sshd running from /opt/
  4. Run e2fsck -v -y -f /dev/md2. Took a full day to fsck the 4TB volume
  5. Reboot

The volume certainly looks healthy after the fsck and no more beeping sounds. More importantly, my data appears to be intact.

Screen Shot 2013-08-25 at 5.14.37 PM

Screen Shot 2013-08-25 at 5.14.25 PM

Learnings from the disk failure:

  1. Ensure lsof is installed on the NAS. It can greatly speed up identifying processes which are holding on to a volume from being unmounted cleanly. Package installation wont be possible once /opt goes read-only thanks to journal corruption
  2. Force a full fsck regularly to ensure file systems are health
  3. Rather than have one large 4TB RAID5 volume spanning 4 disks, have 2 smaller volumes to speed up fsck times and spread the risk.
  4. Have a 3TB external drive to backup critical data before attempting repait

Update (20130927): A 2nd WD hard disk failed earlier this week causing another round of journal corruption and re-syncing. At this point, it made more sense to perform a full backup and restore over a fresh install of DSM with SHR and an extended SMART test. I am replacing the failed 1.5TB drives with 3TB WD Caviar Greens and SHR can make use of the additional space automatically.

Screen Shot 2013-09-27 at 1.29.02 AM

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

More Posts - Website

Follow Me:
TwitterLinkedIn

Published by

Shanker Balan

Shanker Balan is a devops and infrastructure freelancer with over 14 years of industry experience in large scale Internet systems. He is available for both short term and long term projects on contract. Please use the Contact Form for any enquiry.

14 thoughts on “Rescue Crashed Synology Volume”

  1. awesome shanker balan is awesome! I tried your steps on friday and let it go over the weekend. now Im back in office and restarted the NAS (lost the PUTTY session from CH to USA) and its up and running again. All the data’s back again. Man I’m so thankful. You’re awesome! Thanks so much

    By the way: On friday I made a Ticket to Synology. Nothing heard until now πŸ˜‰

  2. Hello Shanker, my Synology 1512 volume crashed. 1 disk is damaged and one is not intitialised anymore. I can not acccess my data anymore. 3 drive are healthy. I had setup a raid 5 to be able to replace 2 drives.

    Can you help on me, I already bought a new disk to replace the damaged, But I am not sure, if I can touch the system.

    Best wishes and thanks for reading

  3. Hi Theo,

    Thanks for your comments. Without knowing what exactly is wrong I am afraid any advice I give might be harmful. I have had good success by just replacing the disks one at a time.

    Hth.

  4. Shankar Balan, Volume Creation always fails in a fresh installation of DSM with error “connection failed”. Synology has given up on me. Do you have any tips? SMART Extended tests returns normal for all drives.

  5. On my “Crashed” Volume (due to a failed disk with bad sectors) , repair is not possible when I replaced the failed disk with a new one. I am stuck in step 5. Any ideas ? (I have contacted Synology and the current info is that I have to clone the failed disk on a new one and the proceed. I wonder why this is mandatory…)

  6. Everything was working perfectly until I did one of Synology’s (frequent) updates. I just updated to DSM 5.2-5592 Update 1. Now it says Volume 1 has crashed and there is no option to repair or recover. My data was still accessible yesterday but now I don’t see any of it anymore. All I have is a 3TB drive in my DS214play… I take it I must purchase a(n expensive) second hard drive to have any hope of recovering my data?

    I’m starting to get a little suspicious of Synology’s reliability. I am on my 3rd hard drive that has had issues (2 different brands) and my 2nd synology disk station that has had volumes crash for no apparent reason. I even had one of the drives in my computer working for years with no data loss yet had issues once I put it into a synology NAS. This drive is less than a year old, has always passed SMART tests (still says Healthy), and has been kept well ventilated at cool temperatures (25-30c).

Leave a Reply