I sat out to write a post about choosing the right storage. I found that so many of the topics that I wanted to discuss required some terminology explanation that I decided to make this post just about that. Stay with me for the next post when I’ll get into choosing actual storage.
Host – The server or application which will access the storage.
File System – This is the part of the storage system responsible for managing storage resources and brokering read and write transactions. It handles such things as block and file locking to ensure that no two systems are trying to write to the same parts of the disk at the same time. It also stores metadata about files such as which disk(s) they are located on, and where; filenames, sizes, etc.
SAN vs. NAS – These are the two [main] different ways to access shared storage systems. It is easiest to understand them in relation to each other. The key difference is where the file system resides.
– In a NAS, the file system is on the storage system itself. This allows multiple hosts to naively access data without worrying about all the technicalities mentioned above. The host can read and write to files blindly, content in the knowledge that the file system will not allow it to make a mistake. The downside of NAS is that it does not offer the application the ability to control the disks directly, and therefore the application cannot tune disks access to it’s own ends. Also, some of the helpful things that file systems do – like file locking – may pose a problem for certain types of applications.
– In a SAN the file systems resides on the hosts or applications that are connecting to it. The hosts themselves must carefully manage disk access between themselves in order to prevent data corruption. The is a fairly high development burden, so only applications that can truly benefit from highly customized disk access usually support it. In other cases it may be an explicit requirement that multiple hosts be able to access the same shared files. Some common ones are databases and various sorts of HA clusters, like with VMware for example.
Controller – This is the device that contains all the brainpower for a storage array. It’s really a suped-up, custom-built server, specifically designed for managing storage. It contains CPUs, RAM, and various storage-related add-on cards. This is where the file system will run, and all the disks will connect to it. They are often paired-up in a single chassis called a head unit to form an HA storage cluster. In large array clusters there may be multiple tiers of head units.
Disk Shelf – The disk shelves are the devices that hold the disks. They have little-to-no brainpower at all, and simply serve to power up the disks and pass-through all the read/write operations.