Tuesday, December 28, 2010

Partition alignment for storage performance


Partition alignment has proven a %10-60 gain in storage performance - the higher gains are found when storage is under highest load for sequential reads on a striped raid system (RAID0,5,6,10), with small gains for random writes. Small gains are also noticed on single disk or mirrored systems of conventional drives (512b sectors).

The good news - Windows 7 and Windows Server 2008 automatically align partitions to the drive structure. Bad news - everyone else has to do it by hand, and new hard drive designs coming out are throwing wrenches into this.

------------------------------------------------Sector BS (bullshit):
Hard drives are made of sectors, until recently all drives had 512 bytes per sector - meaning 2 sectors make 1024 bytes of storage (1 kilobyte). The biggest issue: for the past 30 years drive makers have been throwing in DOS compatibility and emulating the old drive structure of CHS (cylinders, heads, sectors, +tracks) - this will tell you all about CHS and tracks:

The old system was CHS, the new(er) is called LBA (Logical Block Addressing), and even this should be overhauled.

The head is what reads the platter on the drive (like a record player), most drives will have somewhere around 4-8 heads, usually 2 per platter, and 1-4 platters. If you look at your drive with fdisk you will see a big inconsistency:

Disk /dev/sda: 499.5 GB, 499558383616 bytes
255 heads, 63 sectors/track, 60734 cylinders

This is because it is trying to be compatible with DOS, and to arrive at the huge size of ~500GB, it must use 255 heads - the actual size exceeds this head count, 255 is just the max allowed. - If you have ever manually partitioned a drive over ~8.5GB you will note the warning that "The partition passes some xyz limit usable by some operating systems" (it means DOS).

------------------------------------------------New sector sizes:
New solid state drives and large drives coming out in the last few months break this rule. Many new SSD drive will have whats called a 128kb erase block size - meaning that it handles data in 128KB chunks. Many new drives 2TB and up have a 4kb sector size (8 times larger than 512b) - some will try to emulate both 4kb and 512b, but best performance is at the native sector size. More on the SSD junk here.

------------------------------------------------Other layers:
If you are using a RAID controller, this will also arrange data in chunks or stripes, usually 32 or 64kb has been the norm, larger chunk sizes work better for larger files (but slower for small files). Most decent raid controllers will let you set this. If you are using a SAN - yet another arrangement is probably being made between the file system and the hardware.

Virtual file systems - if you are doing virtualization, your virtual fs will have some sort of alignment that sticks it to the underlying file system of the host OS. In this area storage performance is EVEN more important as it is usually the bottle neck.

-------------------------------------------------- The Point:
The main point through all of this - you want to keep all the file systems within the boundaries of each underlying layer to achieve the best performance, one mismatch could be bad, several mismatches down the layers of your storage design could cause severe lag.

Example: If you read 1kb of data it could force 4 sectors to be read instead of just the 2 that are required to store that amount of data (512b/sectors), or even more if you are traveling down several misaligned layers of your storage system

- it could be as bad as writing to 2 sectors when it could have been 1, or at worse - writing to 2 disks when it could have been one.

The golden rule: Best performance will be gained if all storage partitions start at a size divisible by the underlying subsystems.

ie - 1. Your partition 2 boundary starts at 102400 which is evenly divisible by 512 = good.
2. On another system partition 1 ended at 102400, causing partition 2 to start at 102401 = bad

Many file systems will allow you to manually set block sizes and start/stop points. If your raid card has a 32kb chunk or stripe size on 512b sector disks, you could create your NTFS partition to use 32kb blocks and start at 1024 bytes into the drive. Many OS installers will automatically bump the first partition up to sector 63 - which could cause a misalignment depending on your bytes/sector, or up stream storage layout.

Some will say "this only applies to SSD and large drives" - NO even most conventional drives will not be properly aligned to the drive boundaries. Sometimes the master boot record or file system metadata can cause an improper offset.

This article gives a good basis for this:

Heres a few other guides:

This is an excellent article - written by the man that made the ext4 filesystem:

To sum up - I dont have a definitive "this is how you do it" guide, this is still a sort of black majic, those that know it use weird chants and special terms to perform it, and it changes depending on your overall platform design. If I get it down I will provide some samples.

oh and thank Google blogger for the screwed up formatting of this post