Backup and Restore
Having good backups – and tested restore processes – is an important element of any mission-critical systems deployment. Commerce Server and SharePoint, like any other Microsoft server product built under the Common Engineering Criteria specification launched in 2005-2006, has been designed to work with backup and restore solutions that leverage the Volume Shadow Copy Service (VSS Writer – and not to be confused with the old Visual SourceSafe product).
Software solutions, such as Microsoft’s Data Protection Manager – or hardware solutions (such as those offered on high-end storage devices from manufacturers such as EMC) both support VSS. This is the officially supported and most non-intrusive way to backup and restore Commerce Server configurations. In general, hardware solutions will offer superior performance with less degradation during the backup/restore operation than software-based tools.
For the largest enterprises, it may make sense to have multiple sites. These can be in a strict failover configuration or actually geographically load-balanced.
Geographical Failover Configurations
Consider that much of Web hosting in North America is located in Silicon Valley, as it is the center-point of several large Internet access points such as MAE West or PAX. Given this, it makes perfect sense to host data centers for large production Web sites there. But, what happens if an earthquake takes out Silicon Valley? It has happened before, such as the 1989 Loma Prieta quake with an epicenter in nearby Santa Cruz.
If business continuity is a high priority, a second site in another location such as the Washington DC/Virginia area (located near MAE East, the center-point of Internet traffic for the Eastern United States) may make perfect sense. Failover can occur at the DNS level, which is straightforward enough to configure.
The challenge becomes getting a mirror of data in Virginia from the master in Silicon Valley. From a Commerce Sever perspective, one must look at data origins and what actually needs to be replicated:
· Catalog & Marketing data is read-only, with the master living in the business environment at corporate headquarters
· Orders and Profiles contain actual customer data, with the master living in the live site’s data center
· Inventory data is typically living at corporate headquarters in the order/warehouse management system
Given this, the following fail-over strategy probably makes best sense:
· Push Catalog and Marketing data from corporate headquarters to the backup site at the same time as it is pushed to the live site, utilizing Commerce Server Staging
· For Inventory:
o Push it to the backup site from corporate headquarters using the BizTalk adapters (assuming an up-to-date master at headquarters)
o Update simultaneously at corporate headquarters and at the backup site from the live site – also utilizing the BizTalk adapters
· Replicate Order and Profile data from the live site to the backup site utilizing:
o BizTalk Adapters
o VSS Writer with scheduled backup/restore in near real-time – possibly even implementable in hardware utilizing long-distance fiber-optic links between SAN devices
Geographical Load Balancing Configurations
It may be desirable to take the geographical failover configuration one step further and go to a multi-master approach with multiple sites active in different geographies and processing different sets of customers.
Practically speaking, this configuration can be made to work as follows:
· Catalog and Marketing data is pushed from corporate headquarters to all geographical environments – most likely utilizing Commerce Server Staging
· Inventory data is updated at time of order, and synchronized with a corporate master – most likely via BizTalk
· Orders and Profiles are synchronized amongst environments utilizing either:
o BizTalk Adapters
o SQL Server Transactional Replication (with multi-master replication configured)
o Note: Both of these methods require that Orders and Profiles be globally unique (such as by utilizing a Microsoft Globally Unique Identifier (GUID) data type) so no database key conflicts can ever exist
In the end state, geographical load balancing devices would segregate traffic amongst regional data centers. If a regional data center were to fail, traffic could be rebalanced amongst other regional data centers with ongoing business continuity.
It is imperative that all key Event Log messages and Performance Monitor counters be continuously monitored to ensure quality of service. The Operations Manager management pack that ships with Commerce Server provides an established means of doing this. If a non-Microsoft systems management tool is to be utilized, such as Hewlett-Packard’s OpenView, the management pack provided with Commerce Server can be used as a baseline to configure similar monitoring rules.
Separately, it is a best practice to configure real-world transactions simulating users browsing a site and checking out – and programming these into the health monitoring tool. This way if the actual user experience starts to degrade, it can be caught in real-time and corrected. Keynote Systems (http://www.keynote.com/) (and several others) also offers a global user experience monitoring service that can measure performance from various points around the Internet; if absolute quality of user experience matters – a service such as this is a must as there is no other way to tell what an end-user is actually seeing. For example, it is not uncommon to see great results on a local Operations Manager or OpenView deployment – but see terrible results across the country from a user’s desktop due to a telecommunications bottleneck.
Commerce Server generates quite a bit of data. Performance can degrade over time if this data is not periodically archived and/or purged. Some things that should be periodically examined include:
· Anonymous User Profiles – delete any that have not been used in a reasonable amount of time (such as 4-12 months)
· Anonymous Baskets – delete any that have not been used in a reasonable amount of time (such as 4-12 months)
· Order History – delete anything that is more than 1-2 years old if appropriate