The BizTalk Ops Team – Maintaining a Healthy, Responsive and Available BizTalk Environment

Originally posted by Nick Heppleston at: http://www.modhul.com/2008/12/22/the-biztalk-ops-team-maintaining-a-healthy-responsive-and-available-biztalk-environment/

One of the things that surprises me about BizTalk installations is, in my experience, the limited support they receive once a project has gone live. BizTalk is a large enterprise product and a dedicated team of BizTalk operational specialists and SQL Server DBA’s should be created for the task of maintaining operational and test environments.

In this blog-post, I’ll run over some of the responsibilities that I believe a BizTalk operational support team need to focus on to maintain a healthy, responsive and available BizTalk environment.

BizTalk Application Maintenance

BizTalk application maintenance relates to all aspects of the environment above SQL Server. Areas of focus for the Operations Team include:

  • Responding to and actioning monitoring software (e.g. MOM/SCOM) alerts, including errors, warnings and performance issues, in a timely manner.
  • Managing suspended instances to ensure that these do not grow out of hand and cause performance problems. Where suspended instances are caused by development bugs, triage and liaise with development to roll-out patches as necessary; where they are the result of misconfiguration, address any problems.
  • Identifying and apply BizTalk Hotfixes to all environments as necessary. A good place to start is the Microsoft RSS feed for BizTalk 2006 KB articles. Note: this RSS feed appears to be time-based and may not always have entries (thanks to Nikolai for pointing this out).
  • Understanding BizTalk throttling and tweaking parameters as necessary based on historical performance statistics and knowledge of the product domain (e.g. does the application need to handle larger volumes during certain times of the year).
  • Ensuring that the TDDS Tracking Service is running and that tracked messages are being moved to the Tracking Database.
  • Maintaining BizTalk Hosts and Host Instances, provisioning and decommissioning as necessary.
  • Maintaining Adapters, installing and installing as necessary.
  • Understanding options for scaling-up and scaling-out of the application tier; perform scaling as required, before performance becomes an issue.
  • Understanding some of the underlying developer-orientated concepts, including subscriptions, pipelines, maps etc.; a good understanding of the Orchestration debugger is also crucial.
  • Becoming one with the MsgBoxViewer tool to identify potential performance issues before they happen.
  • Running the BizTalk 2006 Best Practices Analyser at regular intervals to identify any non ‘best-practice’ issues.
  • Managing third-party adapter tools that interface directly to BizTalk, such as the Covast EDI Accelerator.
  • Maintaining operational documentation, including known issues, fixes and resolutions – a Wiki is an excellent resource to manage this knowledge.
  • Scripting as much as possible, particularly known, reoccurring situations. E.g. WMI scripts to clear-down any ‘harmless’ known suspended instances, such as zombies. The more that is scripted, the less chance of manual error. Scripting can either be performed in PowerShell, VBScript or C#.
  • Maintaining all scripts, bindings and configuration settings in source control to ensure proper versioning. Ensure all environments are updated with the same version of the tools.
  • Performing deployments (and have sufficient knowledge of BizTalk, SQL Server and the product domain to make decisions on deployment issues without having to go back to the development team).

Database Maintenance

This goes without saying, but unless you team maintains the health of the underlying SQL Server database the BizTalk environment will not perform as expected. To maintain optimum health, the team needs to:

  • Ensure that the BizTalk SQL Agent jobs are running successfully and are not running for an excessive length of time.
  • Ensure that tracking data is cleared down using the Purge and Archive jobs and that historical archive data is made available in an offline mode (i.e. on a different SQL Server) for analysis and reporting.
  • Ensure that backups are taken, using the BizTalk Backup job, and that the resulting backup data and log files are verified.
  • Monitor performance of SQL Server environment through a monitoring tool to ensure that the server/s are not exceeding CPU, memory or IO load; scale-up or -out as necessary.
  • Monitor replication performance and/or automagically restore backups to a DR environment, to ensure continuity of service in the event of downtime; respond to any incidents that arise in the restore.
  • Understand what can and more importantly what can’t be done on a SQL Server that is hosting BizTalk.
  • Understand options for scaling out the database tier and in particular, the Message Box; perform scaling as required, before performance becomes an issue.
  • Identify and apply SQL Server Hotfixes to all environments as necessary.

I would recommend that DBA’s also read the excellent Microsoft KB Article How to maintain and troubleshoot BizTalk Server databases.

Disaster Recovery

Disaster recovery is unfortunately often overlooked until it is too late. The Operations team should perform regular reviews and tests of their DR plan to ensure it is upto date and effective. Areas of focus for the team include:

  • Switching the live environment over to disaster recovery at regular intervals (every quarter / every six months) to prove the disaster recovery plan and to give confidence to the business. The switch to DR should be for a short period – 1 to 2 days – during a period of known ‘slack’. Switching to DR should be straightforward and (almost) entirely automated to ensure manual error is minimised.
  • Where there are problems with the plan, refine as necessary. Keep the master recovery document on a Wiki for example, but ensure an up-to-date hardcopy is kept off-site.
  • Ensuring that all members of the team have confidence in the plan and are prepared to invoke it as necessary.

Infrastructure and General Maintenance

There are a number of day-to-day infrastructure and general maintenance tasks that the team will need to complete during the lifetime of an environment, including:

  • Application of Windows Updates as necessary during scheduled down-time.
  • After creating new environments, run the BizTalk 2006 Best Practices Analyser to check for any non ‘best-practice’ issues.
  • Liaising with infrastructure team to ensure environments are correctly built before operation commences, including correct SAN RAID configuration, clustering etc. Work with DBA’s to ensure that the layout of data and log files is correct based on the role of the databases (BizTalkMsgBoxDb vs. BizTalkMgmtDb for example). Ensure elements of the environment (e.g a BizTalk Server / A SQL Server node etc.) are cleanly removed before downtime commences to actioned failed hardware.
  • Liaising with networking team to ensure necessary ports are open on firewalls etc. for traversal of traffic for both the underlying SQL Server Infrastructure and external access.
  • Liaising with security team to ensure correct Active Directory Domain users and groups are created and maintained to ensure a well running system.

For those of you who are a member of a BizTalk operational support team (or as a consultant), are there other recommendations you’d like to share?

Reblog this post [with Zemanta]
About these ads

14 thoughts on “The BizTalk Ops Team – Maintaining a Healthy, Responsive and Available BizTalk Environment

  1. RE “A good place to start is the Microsoft RSS feed for BizTalk 2006 KB articles.” that link seems to provide 0 articles, is it time based?

  2. Nick,

    Great post, only saw it now for the first time. I represent a company that firmly believes in the concepts and principals you outline here. Unfortunately we still find that many organizations do not put the required sort of importance on the operational side of BizTalk Server – I actually blogged about this on my blog the other day!

    We have created a product to make monitoring of BizTalk Server Enterprise solutions even simpler and more effective. The product is called Minotaur, a BizTalk monitoring product that supports proactive and reactive monitoring of BizTalk Enterprise Solutions.

    The product has a powerful notification engine to ensure support engineers are notified when a failure occurs or when a monitored threshold has been breached. Minotaur also has a cool dashboard that displays the status of the monitored BizTalk environment in real-time.

    WebSite: http://www.ragingbulltech.com/
    ScreenShots: http://www.ragingbulltech.com/downloads/MinotaurDocumentation.zip
    Raging Bull Tech Blog: http://www.ragingbulltech.com/index.php?option=com_lyftenbloggie&view=lyftenbloggie&category=0&Itemid=56
    My Blog: http://geekswithblogs.net/BizTalkmonitoring/Default.aspx

    Since your blog post hit such an accord with us, we would value you taking the time to look at our product – I definately think you will find it inline with your approach to the operational side of BizTalk server.

    Riaan

  3. Thanks for the feedback Riaan, I’ve seen Minotaur mentioned on a few website but I’ve yet to give it a try; I’ll download it over the weekend and see what its like.

    Cheers, Nick.

  4. Hi Nick,

    Its a really interesting discussion point to understand how organisations manage and maintain BizTalk. I think sometimes its actually organisational challenges which are as much of a factor as anything else.

    To give an example what you describe above is the most common pattern I tend to see or hear about, however I had a few discussions with Kent Weare at the MVP summit about how his company do this and I must admit I really liked it. Essentially development, support, operations fall under the responsibility of the same area for BizTalk/Integration. This means that the equivelent of the BizTalk Operator is part of the same team as the BizTalk Administrator and BizTalk developers (the actual people doing these roles may also do other things)

    This means that by working closely together they have a much better understanding of the effect their decisions and actions have in these other areas. Essentially this means developers have an increased accountability for their solutions and need to produce good code otherwise their team feels the pain.

    One of the problem with the above stuff is that the hand over from development to production is so often despite best efforts a “throw it over a fence” thing.

    Its good that you have outlined the minimum expectations for operations and support teams as too often they have limited guidance and there isnt really too much available in the way of training. My personal view is that if all operations teams can be encouraged to use a proper monitoring tool and then actually use it to drive the support they do then they are heading in the right direction

    All the best
    Mike

  5. Nick,
    This is a great list. It looks very comprehensive and complete to me.
    Thanks much.
    Ben Bagheri
    BizTalk Consultant,
    Dallas, Texas, USA

  6. Ben, pleased it was of use – I keep finding it extremely useful for clients who need guidance on their IT Ops roles and responsibilities. If you have any amendments, additions etc, please forward them over.

  7. Do you guys have any documentation or walkthrough for moving existind BizTalk applications from BTS 2006 R2 to BTS 2009? Especially for BAM components.

  8. Ben,

    I don’t believe there are any components or tools to do this ‘out of the box’, but any BAM components developed with BizTalk 2006 R2 work fine in 2009 based on my experience. You should also be able to copy across any of the existing BAM databases.

    Nick.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s