UNMAP Improvements – Details, details…In Element OS 7 (Nitrogen) and previous releases, any UNMAP commands were broken up into smaller 16MB requests before being executed. This required Element OS to zero the metadata in the range to be unmapped. Since the metadata is structured as a tree, you could imagine it was like pruning a very large branch by starting at the leaves and working your way back to the main trunk. In Oxygen, improvements were made that allowed large areas of the metadata tree structure to be pruned at once, provided that Element knows ahead of time about the entire range to prune. This allows Element to prune entire branches of the metadata tree at once. Going back to the pruning example, this is in effect just lopping off the branch at the trunk. This can allow for pretty hefty performance improvements. In the case of a UNMAP command issued against a 4TB datastore, we can see at least a 3x improvement in performance over N as the following graph shows. It’s also important to note that this improvement can only be leveraged if Element knows that there is a large LBA range to unmap all at once. For example, if a customer runs the following (default) unmap command from the ESXi command line they will see slower offload performance. esxcli storage vmfs unmap -l <datastore name> This is because the default unmap command from ESXi only unmaps 200 1MB blocks at a time… over and over again until it has unmapped the entire LBA range for a datastore. Since the unmap ranges are very small, there is little room for optimization as each LBA range to unmap is discreet so far as Element OS is concerned. ESXi does give the ability to specify a larger LBA range for unmap operations as show here: esxcli storage vmfs unmap -l <datastore name> -n 40000 The “-n 40000” option tells ESXi to unmap 40,000 blocks at once. Since this gets pushed down to Element as a large operation they are able to prune much more data at once. If we could give a range as large as the datastore it could unmap instantly, but ESXi doesn’t allow the number of blocks to unmap to be more than 1% of free space for a datastore as of ESXi 5.5 P3 and later. For more information on that, see the excellent article by Cody Hosterman (Pure) here.
So What About WRITESAME?There are also improvements that extend the metadata improvements to write same (zeroing) operations. In Windows a quick format of a drive will now complete nearly instantly, even on 4TB volumes. However, be aware of the following:
- During Eagerzeroedthick (EZT) disk creation the entire VMDK is zeroed out which leverages WRITESAME. However, ESXi breaks write same operations into 1MB chunks on VMFS 5. Since a large LBA range is not summarised to SF we can’t optimize the operation.
- Format drives in Windows with the quick format option selected. Deselecting the quick format option will cause Windows to verify the entire LBA space of the volume which is not an offloaded task and will take quite some time to complete.