One of the 1.5TB WD15EARS (Western Digital Green) drives on the Synology DS410 running firmware DSM 4.2-3211 failed earlier this week. The RAID5 volume recovery process ended up being a little tricky thanks to ext3 journal corruption on the actual filesystem “/volume1” block device /dev/md2.
The standard steps for replacing the failed drive is simple:
- Turn off the beeping sound
- Removed failed drive
- Replace failed drive with same or greater storage capacity
- Run “extended” SMART tests on the replacement drive. Takes about a day to complete the test followed by another day for RAID sync
- Repair the volume
After replacing the failed drive and repairing the RAID volume, “/volume1” ended up with ext3 journal inconsistencies. An “e2fsck” had to be manually performed to check the filesystem for inconsistencies and recover the journal from the command line.
- Shutdown down all running processes
- Unmount /volume1 (on /dev/md2)
- Unmount /opt (also on /dev/md2). There is a running sshd which prevented me from unmounting /opt. The fix was to run an alternate sshd from /usr/sbin/ on an alternate port and kill the original sshd running from /opt/
- Run e2fsck -v -y -f /dev/md2. Took a full day to fsck the 4TB volume
- Reboot
The volume certainly looks healthy after the fsck and no more beeping sounds. More importantly, my data appears to be intact.
Learnings from the disk failure:
- Ensure lsof is installed on the NAS. It can greatly speed up identifying processes which are holding on to a volume from being unmounted cleanly. Package installation wont be possible once /opt goes read-only thanks to journal corruption
- Force a full fsck regularly to ensure file systems are health
- Rather than have one large 4TB RAID5 volume spanning 4 disks, have 2 smaller volumes to speed up fsck times and spread the risk.
- Have a 3TB external drive to backup critical data before attempting repait
Update (20130927): A 2nd WD hard disk failed earlier this week causing another round of journal corruption and re-syncing. At this point, it made more sense to perform a full backup and restore over a fresh install of DSM with SHR and an extended SMART test. I am replacing the failed 1.5TB drives with 3TB WD Caviar Greens and SHR can make use of the additional space automatically.
14 replies on “Rescue Crashed Synology Volume”
keeps failing on me…going back to cloud storage?
Do check with synology support. I am very happy with mine.
I’m trying now. its running since 24hours. But its a ds2411+ system with a 12tb raid 5 volume 😀
awesome shanker balan is awesome! I tried your steps on friday and let it go over the weekend. now Im back in office and restarted the NAS (lost the PUTTY session from CH to USA) and its up and running again. All the data’s back again. Man I’m so thankful. You’re awesome! Thanks so much
By the way: On friday I made a Ticket to Synology. Nothing heard until now 😉
Your welcome. 🙂
How to determine which disk is the problem? I’m having this problem (volume crashed), but both disks show as Status ‘normal’.
Hello Shanker, my Synology 1512 volume crashed. 1 disk is damaged and one is not intitialised anymore. I can not acccess my data anymore. 3 drive are healthy. I had setup a raid 5 to be able to replace 2 drives.
Can you help on me, I already bought a new disk to replace the damaged, But I am not sure, if I can touch the system.
Best wishes and thanks for reading
Hi Theo,
Thanks for your comments. Without knowing what exactly is wrong I am afraid any advice I give might be harmful. I have had good success by just replacing the disks one at a time.
Hth.
I usually SSH into the device and check for disk errors in the “dmesg” output. Using the UI, you can check the SMART status as well.
Shankar Balan, Volume Creation always fails in a fresh installation of DSM with error “connection failed”. Synology has given up on me. Do you have any tips? SMART Extended tests returns normal for all drives.
I don’t know offhand what could be the issue. Sorry.
On my “Crashed” Volume (due to a failed disk with bad sectors) , repair is not possible when I replaced the failed disk with a new one. I am stuck in step 5. Any ideas ? (I have contacted Synology and the current info is that I have to clone the failed disk on a new one and the proceed. I wonder why this is mandatory…)
Hi Pantelist,
Sorry, I dont know how to fix your issue.
Everything was working perfectly until I did one of Synology’s (frequent) updates. I just updated to DSM 5.2-5592 Update 1. Now it says Volume 1 has crashed and there is no option to repair or recover. My data was still accessible yesterday but now I don’t see any of it anymore. All I have is a 3TB drive in my DS214play… I take it I must purchase a(n expensive) second hard drive to have any hope of recovering my data?
I’m starting to get a little suspicious of Synology’s reliability. I am on my 3rd hard drive that has had issues (2 different brands) and my 2nd synology disk station that has had volumes crash for no apparent reason. I even had one of the drives in my computer working for years with no data loss yet had issues once I put it into a synology NAS. This drive is less than a year old, has always passed SMART tests (still says Healthy), and has been kept well ventilated at cool temperatures (25-30c).