Monitoring-Friendly Applications
Author: Calin Groza

February 25, 2011

The idea for this article came after reading an article in CACM about what SysAdmins would want from IT vendors (http://cacm.acm.org/magazines/2011/2/104373-a-plea-from-sysadmins-to-software-vendors/fulltext). It made me thinking, What would the environment support team want? This question has many aspects one being the support for automated monitoring and more specifically:

  • Application monitoring at the business services level – not hardware, OS, networking, not even application container (JMX)
  • Not one application but an environment made of many (hundreds) of components
  • Multiple (tens) of environments – most of them for functional and performance testing, not production
  • Most components are implemented on a Web-Services/J2EE platform
  • Priority is on the availability of the environment – not performance or root-cause analysis.

Here are some of guidelines …

Separate the environment specific configuration from the rest of the application configuration. For example, have a file called test01-env-spec.properties that contains the configuration specific for environment TEST01 with information like:

    • App1.Host=camis02.acme.com
    • App1.Port=12000

  and make references to it in the definition of web-services (web-services-spec.properties) common for all the environments:

    • App1.Service1.URL=http://${App1.Host}:${App1.Port}/services/Service1

Map Web-Service to the same path in all environments. For example, don’t expose a service as: http://<hostA>:<portA>/a/b/c/Service in one environment and http://<hostB>:<portB>/x/y/z/Service in another.

Use the same JMS queue and JMS connection factory names in all environments.

Do not block clients that establish TCP/IP connections and then drop it. Some application containers do this in order to protect themselves against DoS. This should be a configurable feature turned off for non-production back-office containers.

Expose simple web-services which cover interactions between the major components of the applications. For example, UpdateCustomer, UpdateUser. These services/operations should be idempotent – i.e. the monitoring tool should be able to invoke them many times without raising a business error. AddUser may fail if we add the same user twice with the same name.

HTTP status codes should reflect correctly the success/error returned by the web-service:

  • 200 – operation was successful. Do not return status 200 if there was a system error such as Class Not Found or Database Connection failure
  • 500 – system error. Do not return 500 if this was a business error such as Customer not found. Also, do not return 500 for a “Null Pointer Exception” – this is an application error, not a system error

To reduce sensitivity to the test data, provide services that are not based on internal IDs. For example, on UpdateUser, provide a variant that allows to specify the User name, not the internal ID generated when the user was created.

To monitor request-response JMS-based services, allows to specify the return queue. This will be used by the monitoring application to separate its traffic from the that of the regular clients.

To monitor send-and-forget JMS-based services, provide an implementation that can send back acknowledgements if the operations succeeded. In a sense this means that the service provides also a request-response variant. The previous remark applies here as well, allow to pass the ack queue name in the request.

For JMS-based services, provide the same operation as a web-service.

Provide sample SOAP-XMLs for the service invocations that can be used for monitoring. This is useful for web-services and but especially for JMS-based services where there is no programmatic access to the get the WSDL or XSD.

Define a security role “monitoring” which gives permissions to the monitoring tool to invoke services. This is especially required in IIOP and JMS invocations which require credentials in most cases.

No Comments

No comments yet.

Categories