My fun project to track the insane amount of running done on my hamster's wheel.
So, with my unRAID server now passed its 10-year anniversary (see my original build notes here), it has gotten another big upgrade. Unlike its last major upgrade 7 years ago, this one wasn’t about drive/storage upgrades so much as performance and functionality. This all started with me making upgrades to my main workstation at the house. My intention was to simply take the core components that had been in my workstation and hand them down to the unRAID server, since they were still relatively recent components and had a solid amount of performance spec. However, things did not go smoothly. I’m gonna try to note my fun during the process, both for the amusement of other tech nerds, and for me to reference in the future for myself. This is being written up after all of it happened, and I’m surely forgetting bits or conflating them.
As I said, the first step was the upgrading of my main workstation, which freed up some parts to hand down as upgrades to my unRAID server. It has been chugging along for 7+ years on an old AMD A6-5400 setup, which 7 years ago was plenty for unRAID. But, times have changed. unRAID has gained plenty more capabilities, such as running docker apps and VM images. While I don’t run VMs (well, not yet, anyway), I do run a number of docker apps. Also, the quality and complexity of digital media has come a ways since then, too (4K HDR, etc), making things like Plex transcoding choke. So, having freed up a perfectly good first-gen AMD Ryzen 7 1800x CPU, motherboard and 32gb RAM to go with it, those would work well as a major power boost to my unRAID server.
One of the first hurdles I realized was that my existing motherboard I was taking out of my main workstation had 6 SATA ports on it, whereas the existing motherboard in my unRAID server had 8 SATA ports onboard. This was an issue because I have 24 drive bays in my Norco 4224 case. 16 of those were handled by two PCIe x16 SAS controller cards that handled 8 drives each, leaving the motherboard to handle the other 8. While considering that, however, I realized that I had a small PCIe x1 card with 2 SATA ports on it to handle my 250gb Samsung 2.5″ SSD cache drive. I figured that since the board I had removed from my workstation had an onboard NVMe slot, I would just get a 500gb NVMe drive, and use this opportunity to also upgrade and double the size of my cache drive, and give it a performance boost at the same time. That would free up the PCIe x1 controller’s 2 SATA ports in addition to the motherboard’s 6 ports and get me that total of 8 I needed.
And thus began phase 1 of my idiotic journey of this upgrade process. I pulled out the old motherboard and put in the new one. The first challenge I ran into was when I went to plug in the drive cables to that PCIe x1 card’s 2 ports, only to realize that the cables were an inch short of reaching. That was my first moment of cursing the universe. I then figured out if I took those 2 cables out of the hole they fed through from the front of the case to the back and went over that divider, they would BARELY reach. I could get away with that temporarily as I ordered and waited for a longer cable to arrive. I wouldn’t be able to put the top on the case with those cables over the top, but it would work temporarily. So, I got it all hooked up and fired the server up. It powered on for 10 seconds, then did a hardware reset and tried again, and did that over and over. Couldn’t figure out what it was unhappy about. I wasn’t getting anything on the monitor. I finally unhooked the monitor and plugged it into my main workstation, only to find out the ancient VGA monitor (one of the first flat screen monitor models I’d ever used – an old Dell 17″) had died. I rarely used the monitor locally on the server, so I wasn’t terribly surprised the old thing had died and I hadn’t noticed. So, I was down at MicroCenter later that day, anyway, so I snagged a cheap open box monitor.
I hooked it up when I got back and still got no video. “What the heck?” Finally, I shut it back down. After 20 minutes of digging around in the case to figure out what I’d hooked up wrong during the upgrades, I suddenly realized that I was an idiot. See, my old A6-5400 CPU was an APU. If you don’t know what that is, it’s a CPU with an onboard video controller (usually fairly basic). So, if you have a motherboard with onboard video ports, which most have, you need an APU type processor to use those ports, which in my old, existing setup was true. However, that was NOT the case with the Ryzen CPU I was now putting into it. I had simply hooked up the monitors to the motherboard’s onboard video ports, which without an APU type processor, were useless. So, therefor, it needed a video card to be added (for those wondering, I did check to see if this particular motherboard could run headless without a GPU, but it couldn’t). That’s all fine and good, because EVENTUALLY, one of my goals was to add a dedicated video card so that Plex could do hardware accelerated transcoding. HOWEVER, that was not part of my initial plan, because of a current limitation. See, I had 2 of the SAS controller cards to handle 8 drives each, and both needed an x16 sized slot (though only at least x4 bandwidth). The motherboard I was removing from my main workstation to put in the unRAID server only had two x16 slots (pretty standard), and both would be needed for those controller cards. There would be no slot available for a video card. This was a showstopper problem with the hardware I had in hand.
So, I did some research into a new motherboard, and thus began phase 2 of this idiotic process. I found a couple good deals on new X570 chipset motherboards and widdled it down to which one I liked best. That’s when I looked at the supported memory & cpu list, just to make sure what I had would be properly supported. AMD has done an admirable job at maintaining CPU & motherboard compatibility via BIOS updates through the run of the Ryzen CPUs. However, I came to learn that the new 500 series chipset motherboards were the very first ones to drop support of a Ryzen chip series, as they dropped support of the original first-gen Ryzen CPUs, which my 1800x CPU was. Sigh. I then started researching previous generation X470 motherboards, most of which were still pretty expensive. I finally found one with 3 PCIe x16 slots that didn’t cost a fortune, an Asus one for $99. The problem was that it, like most boards, had 6 SATA ports on the motherboard, so I would still need to use the x1 SATA controller to add the additional 2 ports, so I ordered a longer cable to reach that.
A few days later, I had the new motherboard and longer cable and I then installed that into the server. I had some trouble getting that motherboard to boot from the USB drive that unRAID runs from, but finally got it to work. I brought it up, only to have a few random drives showing as offline. I shut it down, reconnected all the cables, then brought it up again, and had a couple DIFFERENT drives showing as offline. “Well, this is definitely not good”. I swapped controller cards between slots, cables between controller cards, and a variety of things, but always had at least one drive showing as offline. I finally let out a sigh of defeat, figuring that there was something between the controllers and this new motherboard that wasn’t happy. My server had been offline for almost a week by this point, so I gave up and just put all the original hardware back, exactly as it had been before starting any of it.
And then… I fired it up and a drive was showing offline. “Uh oh.” I shut it down and reseated that drive and cable and the controller it was plugged into. I held my breath and turned it on again and was delight to see every drive showing up properly, finally. I started the array and watched some stuff from the array for the evening. And then… about 6 hours later a different drive suddenly started reporting CRC errors and unRAID then dropped it offline and began emulating it in the array. “Doggone it…” I shut it all down and started digging through all the screenshots I had taken during all the missing drive moments and steps along the way. After a while of looking at them all, I realized that the problem was following one of the 2 controller cards. The newer of the two, an LSI chip one, seemed to be what every drive that had a problem was hooked up to when it had the issue. That was a definite a-ha moment. The LSI controllers are better liked by the unRAID community over the Marvell chip based Dell PERC controller that was my other, older one, so it was surprising that the LSI was the one causing the problems.
So, apparently, while doing the replacements and pulling cards in and out, I must have physically damaged that card or something, though there was no visual indicator of it. It wasn’t limited to one backplane, cable or port on the card. The drive problems happened across both ports on the card and cables & backplanes it was hooked up to. So I went back to the drawing board. After taking a step back, I realized there was something I could do that would kill a couple birds with one stone. See, I had 2 controller cards that each had 2 SAS ports (which each break out to control 4 drives) because I had added on the cards one at a time as my server grew in size. HOWEVER, there is such a thing as a controller card that has 4 of the SAS ports and can control 16 drives total from the single card. Getting one of those as a replacement for the failing card would mean I could hook up all 16 drives that had previously been on both controller cards onto just the single card. Doing that would then free up an x16 slot where I could put in a dedicated video card, and open me up to the option of any x470 motherboard, whether it had 2 or 3 x16 slots, leaving me with a wider set of boards to choose from, meaning I could probably find one with 8 SATA ports on the motherboard and ditch the extra x1 card.
So I set about researching what 4-port SAS controller to get. The LSI cards were THE cards that all the unRAID system builders preferred. So, I set about looking around eBay for one of the cards. However, 90% of the listings for the $125-$150 cards were from China and the write-ups were kinda dubious. I did some more looking on the unRAID forums and found multiple stories of people getting this model card only to get a cheap Chinese knock-off card that had problems. It was apparently quite common for that model card. Even the couple vendors from inside the USA had poor English in their listings. However, I found one seller who was clearly a major storage server enthusiast, who had a couple dozen different controller cards listed that he had fully tested and had great, detailed write-ups on all his listings, and had very reasonable pricing on them all. But of course, the only model card he didn’t seem to have listed was the exact one I wanted. So I used the contact seller option to see if maybe he had one he hadn’t listed yet, explaining that I was hesitant to order from the other listings, given the stories I’d read of the cheap knock-off cards. I got a quick and detailed response back from him. He said he was well aware of the bad knock-off problem of that model card, and had come across a number of them himself. He said he unfortunately didn’t have any currently, but he gave me a tip. He told me that the next model up controller card from LSI, the 9305-16i, didn’t suffer from the knock-off problem. It was a newer chipset and had the fancier, newer SAS SFF8643 connectors. It was also a newer PCIe 3.0 spec card and would be higher performance. The problem was that those cards cost at least twice as much. He said if I found a deal on one of those, I should be golden.
So, I started hunting high and low for one, but most listings were $400 or more. Then, after doing a bunch of digging around, I stumble upon an open box deal on one from CDW, which they listed as being unused, but just as open box. And it was only $175. I jumped at the deal, only to find out that my CDW account had been dormant for so long (more than 10 years), it had been deactivated. I had to call them, at the end of the day on a Friday, to get them to reactivate it. The kicker was that the site said that I could get it by Monday if I ordered in the next 14 minutes. So, I sat on hold and got a salesperson to reactivate the account, which he said would take a few minutes. I thanked him and hung up and kept refreshing the page, waiting for my account to activate so I could submit the order, as the “within the next x minutes” counter kept counting down. Finally, with like 2 minutes on the counter, the account went active. I quickly went through the checkout, and had this set of shipping options: $10 to get it Monday, or $25 to also get it Monday, or $45 to also get it Monday, or $70 to get it next day. I picked the $10 option and submitted the order just in time. I also needed 4 new cables, because this card used the newer SFF8643 ports. I found 2-packs of those on Amazon for $20, so I ordered 2 of those 2-packs, which were set to arrive on that Sunday. The other thing I needed to get was another motherboard that had the 8 onboard SATA ports. What I ended up getting was kind of ironic. Turns out, there are very few boards with 8 SATA ports. I ended up finding another open box deal, this time at Newegg, for a fancy featured ASRock X470 Taichi board. Their Taichi boards are expensive ones, and the older X470s were still going for $300+, but I snagged an open box deal on one for $140. I ordered that, which was due to arrive on that Monday. What makes that choice kind of ironic is that it ALSO had 3 PCIe x16 slots, so I went from a having 2 slots and nowhere to put a video card to 3 slots and not actually needing the 3rd. Then amusingly, at some point the next morning on Saturday, the card from CDW arrived. Glad I didn’t pay more for faster shipping (CDW often uses a local shipper around Chicago called “Veterans Distrubution” for orders fulfilled from their Vernon Hills warehouse, who usually deliver the next day regardless, which probably explains why it arrived so fast anyway). So, it ended up being the 4 cables and the new ASRock motherboard that I ended up waiting on, which arrived right on time over the next two days.
Now, something else I had to consider was the drive that had dropped offline due to the CRC errors. Given all the problems I had been having, I kinda figured the CRC errors weren’t really a problem with the drive. Regardless, with unRAID having dropped the drive from the array, it meant I would have to run a rebuild on it no matter what. I pulled that drive and hooked it up to my main workstation and started a full extended test on it, which takes more than a day, which passed with flying colors. But, since it was an older 4TB drive, I decided to hunt around for a 10tb drive (the largest I can currently put into my unRAID server for parity reasons), since I was gonna have to go through a rebuild anyway. Seemed like a good time to do a bit of an upgrade in the process. I could hook that old 4TB drive up to my main workstation as a scratch disk or whatever. So, I found a deal at Staples for a 10tb Seagate Backup Plus drive. As it turns out, the Plus models of it actually have a full-on IronWolf Pro drive inside, which is their premium NAS storage server model drive. I hooked it up in its USB enclosure and ran a full extended test on it, which like I said takes more than a day, and it passed with no problems. So I shucked the drive from the enclosure. For those that don’t know, shucking a drive is the process of removing it from the external enclosure to use it as a standard internal drive. The reason this is a popular practice for home storage server enthusiasts is that, for some idiotic reason, external USB versions of drives often cost a LOT less than internal drive models, despite just being an internal drive that is inside a full enclosure. Yeah, it makes no sense, but that’s how its been for years. At least half the drives in my storage server were ones I shucked from external USB enclosures.
I had an older Radeon video card at the house I could put in the server to get things going, but it was too hold to help me with any Plex hardware acceleration. I had the GTX 1050 video card in my main workstation that I intended to eventually hand down to the unRAID server for the harward acceleration, but I’ve been unable to upgrade my video card in the main workstation due to, well… the entire video card market right now is beyond insane. However, I realized that my old home theater PC that I was no longer using (I switched to an nVidia Shield to run Kodi and home theater stuff last summer) might have a better video card than the old Radeon, but that PC was at the office, because my laptop died and I’m on a long waiting list for the Asus ROG I’m waiting for to replace it. So, I brought the old Radeon with me to work the next day and took a look to see what was in the old home theater PC. It had an nVidia 1030 card in it. I did a bit of quick research and was pleasantly surprised to find out that it was the first model card to fully do hardware accelerated decoding of full 4K HEVC HDR sources. So I took that 1030 out of that machine and put the old Radeon in it and took the 1030 back home. That night, I did yet ANOTHER full motherboard swap, this time adding in the 1030 video card. I triple checked everything and fired it up. Much to my delight, every drive was showing up fine, including the 10tb shucked drive that was showing as ready to be rebuilt from the old 4tb drive that was now offline.
I held my breath and began the rebuild process, which takes a long time, particularly for a 10tb drive. I kept checking on the progress every couple hours as it rebuilt the drive for a little more than a day. While I technically could have used the server as it did the rebuild, I didn’t wanna do anything else on it during the process, given the hardware luck I had been having. Finally, it made it to the end and finished successfully. I held my breath again and started bringing Plex and all the other dockers online and started doing a variety of stuff with it, and had zero problems.
As I add this final statement to the post to publish it, it’s been a couple weeks now since the end of the process, and it’s been working perfectly, thank goodness. I ended up doing a return of that original Asus motherboard and the longer SATA cable back to Amazon. The process was very convoluted, and I’m sure I left out plenty of hiccups and other details, but hopefully at least SOME of all of the above made sense. :)