The idea behind this infrastructure is to run on commodity servers. No need to buy big racks of expensive servers as we see in data centers. Simple homemade computers will do the job. At work, I have access to cheap hard drives that were used in servers and either are out of warranty or not suitable for enterprise workload. They generally are half their market price. I have a mix of brand new and re-used drives to reduce the risk of having two disks failing at the same time in the same host.
There are three components in the infrastructure:
Every storage server is designed to be hosted on a different location. Each one could be unplugged from a location then plugged somewhere else and work the same way as before. They require an Internet access to be able to contact the VPS to join the VPN.
The technology that holds data is ZFS. I have the chance to use it at work for production workloads and it makes life way easier. I am used to manage GNU/Linux servers (Debian) and I know that FreeBSD has built-in ZFS support, so I wanted to give it a try. I didn’t choose FreeNAS because I wanted to do everything by myself to learn and use only the features I needed.
The right balance I found to maximize available disk space while keeping data safe is to use three disks in a RAID-Z. Storage servers are allowed to lose one disk at a time without breaking the service. In the meantime, almost all the cumulative space is available to use. Datasets are configured to use lz4 compression because it saves disk space without pushing too much pressure on the CPU.
Like any system administrator, I want to be alerted when something goes wrong on the infrastructure. I also want to browse the history with graphs to see trends. There was a Raspberry Pi waiting to be used in a drawer. It is now connected to the Wi-Fi network somewhere in the house, perfectly hidden, to do this job in the background.
I am not a network engineer. Actually, this is not my job and I don’t want it to be. There are numerous experts in the field that do this very well and I am thankful to them. But a computer without network connectivity is not very useful. When self-hosting, you have to deal with your ISP modem settings. There is no standard as far as I know. Mine has no fixed public IPv4 address. I tried to develop scripts to automatically update a subdomain name with the current public IP address and try to contact it from the outside. The name worked, but the communication always failed.
To solve this problem, I rent a VPS hosted close to storage locations and I have configured an OpenVPN server. This is a single point of failure and a bottleneck because all the traffic goes to this server to communicate with others. In fact, Internet bandwidth at home is the real bottleneck so the VPS should not be a problem. It also acts as the entry point from the outside world for metrics and monitoring websites.