Generalization of Problem Diagnosis and Categorization

Motivation

In the last decade, application performance management (APM) solutions have been developed supporting enterprises with monitoring capabilities and early detection of performance problems. Leading application APM solutions mostly support only alerting and visualization of performance-relevant measures. The configuration of the software instrumentation, the diagnosis of performance problems, and the isolation of the concrete root cause(s) often remain error-prone and frustrating manual tasks. To this day, these tasks are performed by costly and rare performance experts. In order to improve this situation, NovaTec Consulting GmbH and the University of Stuttgart (Reliable Software Systems Group) launched the collaborative research project diagnoseIT on "Expert-guided Automatic Diagnosis of Performance Problems in Enterprise Applications". Hereby, the core idea is to formalize APM expert knowledge to automatically execute recurring APM tasks such as the configuration of a meaningful software instrumentation and the diagnosis of performance problems to isolate their root cause. By delegating the described tasks to diagnoseIT, experts do not have to deal with similar problems over and over again. Instead, the expert can focus on more challenging (and interesting) tasks.

Problem

The automated diagnosis analyzes traces for performance problems. Hereby, the diagnosis is designed as follows: possible symptoms of performance problems are provided as formalized expert knowledgean extensible set of rules. When a symptom is detected in a trace, the root cause diagnosis is started without the need for human interaction. Rules that perform localization of the problem are applied first, followed by technology and/or domain-specific rules, which are used to semantify the isolated root cause.

The goal of this thesis is to investigate how the diagnosis concept can be generalized to availability, reliability and functional problems in software applications.

Tasks

  • Development of a concept to diagnose availability, reliability and functional problems based on traces
  • Prototyping of the concept
  • Evaluation of the concept in an industrial case study

Challenges

  • Detection and diagnosis of availability, reliability and functional problems in traces
  • Categorization of identified problems

Locations

  • Stuttgart
  • Frankfurt (remote supervision)
  • Munich (remote supervision)
  • Berlin (remote supervision)

Contact