CS2007 Software Architecture Series Part 7: System Usage Considerations

Figuring out how to correctly utilize the various Commerce Server systems is mostly a matter (in most cases) of proper schema design coupled with understanding the breaking points of the various components so as not to exceed them. And this, of course, is based upon a solid understanding of the end state business requirements – as already discussed.

The other considerations and typical “gotchas” of system architecture with Commerce Server shall be enumerated throughout the balance of this section. From a logical perspective, it probably makes the most sense to attack system design in the following order:

·         Profiles

·         Catalog & Inventory

·         Orders

·         Marketing

·         Analytics

By starting with the customer, then the goods being sold to the customer, then the orders to process the goods, then the discounts overlaid on top of the orders, and finally the reports against the entire experience – one can minimize changes as a result of dependencies between systems unearthed in the architecture process.

Profiles

The major consideration with the Profile system is where does the data live? Given the flexibility- it can live in a variety of different places. Some considerations:

·         If the number of users is the principal concern, SQL is the best place to store the data – successful production deployments have gone to 60M+ on very moderate hardware.

o    The Partitioning feature adds a lot of complexity and should be carefully weighed as to the trouble factor versus simply running on a single SQL database; keep in mind that no customer has yet come close to needing to go beyond a single database deployment given the scalability of SQL Server on hardware available today.

·         Conversely, Active Directory can only store a small subset of users compared to SQL – ~10M is the practical limit.

·         OLE/DB and ODBC sources – although supported, were not expressly tested during the development cycle of Commerce Server 2007 given the wide variety of combinations and potential sources possible; utilizing these capabilities can be potentially putting oneself into unexplored territory .

·         Having disparate sources – although convenient – is a major performance drain. The Profile system will have to query multiple data sources and create a union of the results versus being able to execute a single query.

Because there is no “canned” schema – it is up to the user to completely define the system in whatever the systems of choice are for storing data. All standard best practices of designing relational database schema (or Active Directory, as the case may be) apply. Some particular considerations relevant to Commerce Server profiles include:

·         Because this is a from-scratch defined schema, if utilizing SQL – make sure it has indexes (and in particular clustered indexes) that are relevant to the usage patterns of the data being stored.

·         Design the table structure with respect to how it will be queried and updated – pulling single tables will always be cheaper than pulling multiple tables with join operations.

·         Be sure to remember where/how attributes utilized in personalization for the Marketing system will be stored as these are often the most queried in a production system.

There are no specific limitations with respect to data sizing; however the product was not tested above 60M profiles (mix of anonymous and registered). That being said – going above this should not be a major concern point.

The other major aspect of Profile system design is compliance with Payment Card Industry (PCI) standards. Storing any credit card data has major implications on policies from the credit card merchants themselves. If at all possible it is best not to store credit cards – as that alleviates many aspects of PCI compliance and the resultant independent audits. If it is required, data encryption must be utilized, which is supported by the Profile system. This will, however, affect performance for queries/updates accordingly.

Catalog

The Catalog system is designed as such that one rarely has to touch the underlying database directly. Therefore, the usage considerations are far narrower in scope, but often times more subtle than straightforward SQL design.

Some of the particular schema design considerations include:

·         Commerce Server stores a lot of metadata; SQL Server has a row size limitation of 8K. It is very easy to exceed this limitation when utilizing character-based data types. Using text-based data types (as opposed to character-based data types) will eliminate this problem, but make querying and accessing data slightly more complicated. It is better to go with text from the beginning however – as changes later could be potentially expensive and complex to affect against an existing system. 

·         Any field that needs to be free-text searchable will need to have its indexes continuously updated by SQL Server; the potential for stale data or a significant amount of processing overhead for index rebuilding is very high. In general, it is better to be judicious and find other ways to search data (if possible) than leveraging free-text searching given these factors.

·         The Catalog is not truly multi-currency; plan on storing separate fields for each currency being supported or plan on leveraging an exchange-rate translation table. There is no in-between.

·         Adding language support is easy from a schema perspective – so this can be easily added at any time.

·         Obviously, simpler is better – keeping the structure as flat as possible will make for faster querying.

·         Storing binaries (e.g. – images or other multimedia files) was not a design consideration of the Catalog system; hence this should probably be avoided and links to the file-system or other content management systems utilized.

There is no practical limit to the number of base catalogs. The product has been tested to the levels published in the performance guide, which is available at http://www.microsoft.com/downloads/details.aspx?FamilyID=E79691F0-BE0F-40A6-940C-5D3A679C5526&displaylang=en.

Going beyond this should generally not present a problem but should be tested accordingly. The principal issue that usually arises is the re-indexing time for free text searchable properties – as noted above. Therefore the fewer of those present in the schema, the better.

The Catalog Sets feature has no specific limitations on the number of catalog sets. However, fewer is better – as the list of catalogs and catalog sets will need to be enumerated on ever single request involving catalog sets to match users to particular catalogs (and there is no way possible to cache this data). Keeping this to a small number will result in far better performance. When creating target expressions for Catalog Sets, it is best to structure the expressions such that a single table within the Profile system can be queried and join operations can be avoided to ensure fastest processing. (This may, in turn, affect Profile schema design as well.)

Virtual Catalogs provide an immense degree of flexibility – and an immense potential for complications given the flexibility. The same data sizing limitations apply to catalogs. The tested limit of number of Virtual Catalogs is 10,000; going above this is possible but requires careful testing. The other big consideration of utilizing Virtual Catalogs is materialization – this will represent in the best runtime performance but requires time to rebuild from the base catalog (which takes overhead) and stale data could be presented while this is occurring.

Orders

The principal design considerations of the Order system are the schema and the Pipeline design. With respect to the schema:

·         Baskets are stored in binary format.

·         Orders are stored in a mix of binary format as well as normal SQL database tables; what goes where is determined by the Order Mapping XML.

·         Storing binaries in baskets and orders themselves should be avoided; this was never tested and will degrade performance.

·         ANY field that will be utilized after an order is captured should be stored in SQL and not in the binary field

·         The data schema for SQL storage should be as simple and flat as possible – to minimize the effort required to persist an order to disk

·         As the SQL schema is custom-designed by the developer, indexes must be implemented appropriately:

o    Do not create clustered indexes based on fields that will slow down order storage; in fact clustered indexes may not even be appropriate at all

o    Implement indexes based on the fields that will be utilized for query and analysis post-capture

With respect to Pipeline design, there are several principles that need to be considered:

·         Try and keep the Basket pipeline as light as possible; this is typically run many more times than the Order calculation and capture pipelines – so having less work to do will greatly improve site throughput .

·         The Order pipelines are transactional – however there may be steps in there that cannot enlist in a DTC transaction (such as Web service calls for merchant services authorization); in this case one must think about how to handle failure scenarios and accommodate manually through code.

·         Pipelines are still COM components; therefore one must use COM+ transactions appropriately and ensure that the threading model supports free or neutral threading (apartment threading will not work) for everything to work properly (and allow pipeline pooling, which will greatly reduce instantiation time on the actual site).

·         Long-running operations may best be handled outside of the pipeline (e.g. – if calls to credit card authorization providers are egregiously expensive) – however this then requires separate handling outside of the order capture process and may also impact Payment Card Industry (PCI) compliance.

And with the last comment – there exists a great segue to the topic of Order capture and PCI compliance. The considerations for this can be best summarized as:

·         One is best off by capturing an order and NOT storing any credit card data – just the authorization number. For returns, one would have to re-input the same or a different credit card (which conceptually works just as it does in most physical store environments).

·         If credit card data must be stored (such as in the scenario if it is processed offline), it must be encrypted. Unfortunately, the Order system does not support encryption as an intrinsic capability. This will require a custom pipeline component to be written to encrypt orders – which will result in a fair bit of work for initial development and possible performance degradation during the capture process (plus associated post-capture maintenance as well). This tradeoff should be weighed carefully.

Marketing

The Marketing system is a relative black box compared to the rest of the Commerce Server systems, as it is more a matter of configuration than design. That being said, each aspect of the system has its own set of unique considerations to ensure successful use from an architecture perspective – especially with regards to dependencies in other systems and upon runtime performance.

From an overall perspective, performance is the most notable consideration. The system was designed to work with between 1,000-2,000 items active at one time. Beyond that and performance will end up degrading. Thinking about scenarios such as one discount per item in the Catalog will prove to be impractical in production; instead consider customized pricing and Virtual Catalogs – as an example.

The other major performance consideration is with respect to the usage of Target Expressions. More Target Expressions (especially if compounded together) will equal worse performance. Likewise, Target Expressions that can query against a single table in the Profile system will perform best; those that require join operations internally will perform considerably worse.

It is important to keep in mind that there is no caching of the Marketing system. Expressions are evaluated upon each and every request. Hence, they should be called judiciously. In general, the discounting functionality will not represent a problem – as this is typically only called in basket and checkout operations. Although by utilizing the Runtime Discount Filtering capability it is possible to use it on the end site – this will significantly degrade performance. With respect to Advertisements, generic advertisements that utilize only impression tracking capabilities will be more practical (for cross-selling and up-selling) than using Target Expressions to target content to individual users given the overhead of evaluating the expressions. Direct Mail, because it works offline, generally will not impact end site runtime performance.

The intrinsic e-mail handling capabilities (or lack thereof) of the Direct Mail system are the last major consideration; its capabilities are very much aimed at providing basic send functionality with minimal tracking and error handling. This has proven to be a challenge point for many customers. Instead, a better alternative might be to use the List Manager feature of Direct Mail (which utilizes the Target Expressions of the Marketing system to generate the recipient list) to create lists of intended recipients and then export them for use in another mailing engine.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s