|
|
|
|
 |
View a list of available IEMs
Foundational knowledge building block in the self-healing infrastructure.
TDI's customers measure success by MTTR (mean time to remediation). In order to remediate you must understand the real cause of a problem – not the result of the problem. You must also understand what industry "best practice" or vendor knowledge is applicable to the problem as well as how it was handled the last time or times it occurred in your business.
TDI customers cannot have the infrastructure down. As a result, they use the "Outside In" management technology to not only gather event data streams from traditional sources (SYSLOG, logfiles, SNMP) but also at the infrastructures lowest management levels to understand hardware failures, security breaches, operating system failures as well.
Gathering data from different sources is just that - gathering data. Knowledge and understanding of the data makes the difference:
1. What does it mean?
2. Is it important?
3. How it relates to other events?
4. What should I do about it?
5. Why did it happen?
6. Who or what is responsible?
In order to create understanding of the mass chaos from different event sources, from different vendors using different formats, TDI uses each vendor or manufacturer's error messages and recovery procedures to create Intelligent Event Modules (IEM). This enables TDI and its customers to have visibility and understanding of potential events or messages that should be monitored.
Thus, when messages from a device, application or operating system arrive, each is interrogated in real-time against well known, vendor event definitions to determine the value of the message and what action is required. This contrasts with traditional systems that simply log it in a log file for mining or interrogation in an after the fact post mortem.
Once detected or matched, the event begins a complete life-cycle. When it was detected, where it came from, its cause, its vendor defined best-practice solution, its user defined best practice solution, who solves it, what they do to solve it, how long it takes to solve it and more importantly, a well defined, secure, always available path for remediation that is accessible remotely allowing instant access to fix or correct the problem in a manner that meets or exceeds regulatory requirements for auditing, logging and reporting.
Actions taken to remediate the problem or well know event are captured in digitally signed time-stamped log files which can then be used as well by other data mining facilities or logging solutions which are unable to capture root cause and low level event data which most often contains the root cause of a problem instead of the result of a problem.
TDI provides IEMs for all major hardware, operating system, storage, firewall, fabric and database vendors. With the advent of virtualization, not only is it important to understand the physical environment as TDI does today, but also the virtualized environment, regardless of vendor, VMware, XEN, Oracle, HP etc.
Full event management for Virtual Center and ESX Servers and XEN
TDI has developed patented IEM technology for Virtual Center and XEN (XEN and HP XEN) environments. TDI is expanding its technology to include Oracle and other virtualized environments.
The virtualized IEM's define hypervisor, virtual machine, guest operating system, and ILO/DRAC/management card messages for each of the specific environments. Because TDI is an "outside in" management system, users can identify root causes of guest operating system failure, guest machine failure on the supporting hypervisor and physical machine problems where the hypervisor is running. This may effect one or more virtual machines. With its low level, persistent connection and communication, problems may be remediated even if the OS or hypervisor is not functioning.
TDI's Virtualization Infrastructure Manager not only cares for the Virtual Server environment, but also watches the SAN (or NAS) infrastructure for critical messages relevant to the fiber channel fabric, disk farm, and controller which would render the virtual environment ineffective or impact its performance.
IEM's for the Business Critical Infrastructure
Each IEM is designed around a particular operating system, application, hypervisor or device. It contains a thorough vendor defined database of well known potential error messages and warning message that could occur. Associated with each error message are vendor remediation instructions specific to that error message which may also be customized. These remediation instructions may range from notifying specific personnel, automatically generating helpdesk tickets, power cycling a device, or executing code on the device in question.
For example, if one were monitoring an Exchange 2003 server with ConsoleWorks Engine, and the SMTP service stops responding, the Exchange server IEM can automatically restart the service, while also notifying the Exchange specialist. As a result, the down time of the service is reduced to seconds instead of minutes or possibly hours.
In more complex situations, IEMs also contain a library of vendor approved 'best practices'. This instant knowledge base, which one can modify, provides an invaluable resource to aid in reducing mean time to resolution.
There is virtually no limit to the capabilities of Intelligent Event Modules. Either through TDI Professional Services, or in house staff, IEMs can be modified or created to adapt to the specific existing IT processes.
|
| |
|
|
|
|
|