When configuring an array for optimal performance one of the more fundamental things that is critical to nail is the correct cache configuration. Every I/O in the VNX system flows through the DRAM cache so misconfiguring it can have negative effects on the performance of the workload running on top of it.
The DRAM cache on the VNX can be divided up into a read cache and a write cache. The read cache is configured independently for each SP and is not shared between them. The write cache is configured for both SPs together and is mirrored between the two SPs as shown in the graphic below.
Configuring the read cache
The minimum recommended read cache size is 100 MB for the block only VNX5100 and for the unified systems the recommended read cache size ranges from 400 MB to 1024 MB as shown in the table below.
|Recommended Initial Read Cache Size
The recommended read cache sizes are proposed as starting points and can be adjusted up or down depending on the type of workload that will be serviced by the array. The minimum recommended read cache size for any unified VNX system is 256 MB per storage processor. [Note: This minimum read cache is being changed to 200 MB in the upcoming best practices guide from EMC but is not yet published]
The read cache is most effective when the majority of front end I/Os are reads and are sequential in nature. If the workload is 50:50 and there are sequential read streams in the dataset the initial recommendation may work well. In virtualized environments with many hosts doing I/O to the same file system the workload tends to be much more random and the read cache starts to become less effective and therefor less relevant.
To see how effective the read cache is for a workload look at the “SP Cache Read Hit Ratio” for any traditional LUN in Analyzer. If the counter is consistently above 80% then read cache is being very effective and there may be gains to making a small increase in the read cache size to satisfy more read requests from read cache.
However, If the number is very small then the workload for that LUN may not be read cache friendly. To help confirm, look at the “used prefetches %” counter for the same LUN. If this is also low the array is not getting read requests for the data it is proactively prefetching from the LUN into the read cache when it thinks it has detected a sequential read stream. In this case, read cache is not helping much and the prefetching activity is only resulting in wasted read activity on the LUN.
Note: If you don’t see the options for “used prefetches %” or “SP Cache Read Hit Ratio” in Analyzer then go to the “customize charts” option in Analyzer and check the “Advanced” checkbox in the General tab. The counters should now show up for LUNs created from RAID Groups. LUNs created from pools do not have these counters because of how the stats are compiled against private RAID groups which are not visible through Analyzer.
If the majority of the LUNs in the system show similar behavior and if the read cache is comparatively large it may make sense to reallocate some memory to the write cache. Even though it’s named a write cache, reads can be serviced out of the write cache so if the workload tends to write data then read it back again quickly most reads are probably being serviced from the write cache, not the read cache!
Configuring write cache settings
The write cache has more bearing on array wide performance than any other single feature and its configuration and continued care and feeding are essential to maximizing the performance potential of the system. In general, the larger the size of the write cache, the better the potential performance of the array is. There are exceptions to this, but exploring them is beyond the scope of this article.
The size of the write cache increases with the “size” of the array, so a VNX7500 would have a larger write cache size than a VNX5300. However, you may run into systems where smaller arrays have a larger write cache size than a larger array. “How can this be” you ask?
The table below shows how this can be true in some circumstances. When some features are enabled on the array the amount of DRAM reserved for the storage processors increases. As the storage processor reserved DRAM pool gets larger, the rest of the system cache, of which write cache is a component, must get smaller. So by looking at the table below you could see how a fully featured VNX7500 could have a smaller write cache than a VNX5700 with no advanced features enabled.
The features that require additional DRAM to operate are FAST, FAST Cache, thin provisioning and compression, so installing any of them will lower the available write cache in the array so the features should only be installed if there are definitive plans to use the features. SnapView and MirrorView also require some system memory but EMC is moving away from both of those technologies in deference to Advanced Snaps in release 32 (Inyo) and RecoverPoint.
The net takeaway is that the write cache should be configured for as large a value as possible after allocating DRAM for the read cache and installed features. The other settings that require some attention are the watermark settings.
Write cache watermarks
The cache watermark settings exist to help the system manage write cache flushing. The goal is to minimize write cache forced flushing and maximize write cache hits by controlling how aggressively and how often the array flushes data from the write cache. Aggressive flushing allows the array to respond well to very bursty I/O patterns at the cost of cache re-hits which can make the cache somewhat less effective. Lazy flushing allows for higher cache re-hits which can make the cache more effective, but also allows less headroom for bursty I/O and can lead to forced flushing.
So what exactly do the watermarks control?
When the percentage of dirty pages is below the low water mark (LWM), the array is not flushing data out of the write cache. Typically, you will not see prolonged periods where dirty pages < LWM. Usually at this stage we are building up the cache and this happens very quickly in production work load scenarios, but may take a bit of time in limited benchmark activity.
When a LUN is idle (no I/O for two seconds) the system will commit dirty pages in the write cache to the LUN. This is referred to as idle flushing and runs as a normal background activity on the array.
During steady state load the percentage of dirty pages should float between the high water mark (HWM) and LWM. When the cache fills up to the HWM the array starts flushing pages to disk till it reaches the LWM. This is a more aggressive flush than idle flushing.
The margin above the HWM exists to absorb bursts of I/O to the array. Setting the HWM lower in effects give the array more “reserve” memory for these bursts. Once the percentage of dirty pages exceeds the HWM the array starts aggressively flushing dirty pages down to the back end disks. During this period the SP performance is minimally affected.
When a write request is received and cache is already full, a forced flush is triggered to write the pages of the destination LUN receiving the current request to disk to free up cleared pages in the write cache to receive the I/O. Even well designed systems can have some forced flushes from time to time, but sustained forced flushing activity will hamper system performance. Any LUN showing a sustained rate of more than 30 forced flushes a second should be analyzed.
Setting the watermarks
The default watermark values in VNX OE 31 and later are 80 for HWM and 60 LWM respectively. Previous to release 31 the defaults were 70/90 for unified systems. To break this down further, look at the graphic below which shows the “SP Cache Dirty Pages %” counters per SP. There are three lines I’ve added to the graph to show the LWM (green line – 70%), HWM (orange line – 90%) and cache full (red line – 100%).
In a “well behaved” environment where the I/O is more or less constant the objective is to keep the LWM and HWM set relatively high. This slows the rate at which the array flushes dirty pages to disk. By allowing data to live in the cache longer the array is able to maximize the chances that it can coalesce smaller front end I/Os into larger back end I/Os, thus making the backend more efficient. This is particularly important for parity RAID types like RAID 5 and RAID 6. Full stripe writes FTW! … another time and another article.
For the vast majority of the time slices in the above graph, the SP dirty pages are hovering between the LWM and HWM, but there are excursions above the HWM to about 95% dirty pages. This is usually indicative of either a bursty workload or a situation where we are driving the back end disks a little too aggressively. In this particular example, it would be wise to figure out which is the case before adding additional load to this array and lowering the water marks to 60/80 would be advisable.
In general a safe convention is to leave the watermarks at the default of 60/80 unless the array is servicing a very bursty workload in which case the watermarks can be lowered to 50/70 or 40/60 in some extreme cases.