Leveraging 13c Cloud Control’s Auto-heal Capabilities in Your Environment – Part 2

Leveraging 13c Cloud Control’s Auto-heal Capabilities in Your Environment – Part 2

Blog Graphic_Cloud Controls_part 2

In part one of this two-part blog, we provided an overview of the corrective action feature of Oracle Enterprise Manager (OEM). In this post, we will be describing what corrective actions are, and how they can be utilized to automate some of the repetitive (and more mundane) daily tasks of a DBA.

The Test

We picked one of our medium-sized Managed Services customers as a test pilot and implemented these corrective action features on one of the non-prod environments. This allowed us to see these features in action, as well as determine how much they can save us in cost and labor over a set time period.

Types of Tickets & Time Spent

For proof of concept testing, we picked the five most common alert notifications we receive for this customer, namely:

1)      Blocking session alerts

2)      Tablespace alerts

3)      File system getting full

4)      Archive area usage

5)      Inactive forms sessions

We then checked our ticketing system to see how many tickets were created for these notifications. From there, the DBA on support was engaged to work on resolving each ticket.

The following is a table showing the frequency and type of alert we received in the last month:

Alert Occurrences
Blocking session alerts 16
Tablespace alerts 3
File system getting full 5
Archive area usage 4
Inactive forms session 4

 

From the time support engages a DBA until the final resolution of the ticket, we can see that the total time and labor saved on these five alerts in a month is around 16 hours.

When all is said and done, this is just one example of how much time a DBA spent resolving tickets for this specific customer. Still, we can easily extrapolate this number and apply it more broadly if we implement these corrective actions on all alerts created by our Managed Services customers.

Frequency of Alerts

To see the effects of these corrective actions from a different angle, we could also look at the total frequency of these alerts – not just where a ticket was created.

Some of these alerts get auto-cleared (read: locked), which is a common occurrence in a relational database management system. These locks keep occurring throughout the normal processing, and more often than not, these will get auto-cleared without requiring any DBA intervention.

However, in a complex implementation, such as a distributed transaction system, Data Guard setup or heterogenous system, these alerts would require more DBA intervention than a normal system. That being the case, we wanted to look at the actual frequency of the alerts and use these numbers to make an educated guess of how much cost and labor was saved when utilizing corrective actions.

Here are how many alerts occurred in a quarter:

Alert Occurrences
Blocking session alerts 221
Tablespace alerts 5
File system getting full 58
Archive area usage 21
Inactive forms session 72

 

The worst-case scenario of time saved on a ticket for which a DBA was engaged and used corrective actions:  188 hours.

If we were to extrapolate this number across all customers, the amount of time, cost and labor saved would be considerable.

Looking Forward

As was mentioned in part one of this series, the utilization of these corrective actions looks exceptionally promising across the board.

We are very excited about the potential value-add to our company and customers. Implementing this change will allow us to spend the majority of the project delivering quality work, enhancements and innovations. Meanwhile, the mundane and repetitive tasks that are crucial in keeping the business running will be taken care by automation, allowing us to meet our SLAs and decrease the margin for human error.

 

Legal

The following thoughts, intentions, strategies and/or solutions are those of the blog authors and do not represent the position of anyone other than the authors.