Skip to content.

TalkBMC

Sections
You are here: Home » Blog Archive » Steve Carl » Adventures in Linux » BMC UserWorld, Vancouver 2007

BMC UserWorld, Vancouver 2007 BMC UserWorld, Vancouver 2007

Document Actions
Day one at BMC UserWorld, and how BSM applies to things other than a production shop.

I started “Adventures” a couple of years ago to talk mostly about Linux and Open Source, as they were used inside my organization of R&D Support. Back then the idea was mostly to talk about OSS and Linux in particular, as they drove innovation and had a positive ROI. I have since of course branched to many other places, and so for example there has been a great deal about Linux as an Enterprise desktop / laptop OS.

A large number of the folks that read this blog are people who are doing many of the same things that I do. That only makes sense. They are often IT people in unusual IT shops: Universities, heavy duty cutting edge R&D in both hardware and software, and the like. People who are unafraid of change, often are change drivers, and are usually the bleeding edge / early adopters at their home companies.

Being in such shops, you may wonder how such things as ITIL, BSM, and such relate to your daily jobs. If you are non-traditional IT, then how does this huge thing called BSM really relate to you? Is BSM just about traditional IT: the financials and sales and marketing and email and so forth? Or does it extend into R&D IT, Customer Support IT, Training / Education IT, and the other more esoteric areas where we do use computers: we may have data centers and stackers and networks and whatnot, just like the traditional IT folks, but we use them mostly in very non-production, non-traditional ways.

The answer of course is: “It Depends” (P). The patented response of consultants the world over.

Measure the result that you want

On a small scale, say fifty computers, there is probably not a great deal of ROI in working through an entire BSM rollout. The company/school/organization is small enough that the person signing the checks for the gear most likely knows the person wanting the gear, and knows what they plan on using it for. They will have already discussed whether the new gear is worth the benefits it will deliver (even if, because this is a personal relationship, they do not think of it in exactly those terms). The person writing the checks won't sign them unless they know that the new thingy is something they need to succeed. Also, in a small organization, the person making the request probably has a pretty good idea what the fiscal impact is going to be.

As companies grow, the layers between the check writer and the person doing the technical function with the gear grow. At some point, the check writer will look at a purchase request and ask “What in the heck is this for?” because they can not know everything that is going on anymore.

At that point, there starts being value in implementing certain parts of BSM. At this point, the critical thing to remember at all times is to measure for the results that you want.

Here is my real world example: Virtualization. I am attending all sorts of sessions here at BMC UserWorld because while I have used our computer modeling tools in the past (back when they were called Best/1 in fact), I have not yet loaded up and used the tools that model VMware.

Our VMware farm started over a year ago, and it has about 20 medium to large computers in it right now. Sun X4600's, Dell 6950's and the like on the large side, but also many smaller systems such as Dell 2950's and Sun X2100's. There is a definite difference in CPU architectures, and the ratios of CPU (as measured in SPEC-Int), RAM, and I/O capacity between all these platforms. We have retired over 500 old old computers with these VMware computers, so at that simple ROI level, we have already come out ahead. We got back scads of data center floor space and even decommissioned four small labs, power was not consumed that would have been, etc. I did not in fact actually need any BMC Software products to measure any of that (although we did use Remedy AM to track both the real and virtual assets, and that will be of huge benefit while we are working on setting up among other things like SRM).

The farm is now big enough and worse, diverse enough, that I need to start figuring out some things about the farms behavior in the future. The diversity is not just in the underlying hardware platform. We support R&D on all their supported platforms and product lines. There is a huge mix of Linux, Solaris X86, and various MS. BMC has over 600 products, so the applications running are even more varied. These VM's are going to behave differently from each other, to be sure. Some will be RAM intensive. Others will be CPU hogs. Others will be massive I/O engines. Some will be various combinations of the above. Some will be only mildly active, but still doing useful things. Others will be stone dead most of the time.

With one VMware server there is no need to do measure anything with BMC PA. HA, DRS, V/Motion and so forth are not options. The built in tools about system utilization are all you need.

In the bigger scenario, it is far more complicated. HA only works between like-CPU'ed systems. You can not have a Sun X4600 (AMD) fail over to a Dell 6950 (Intel) for example. There are restrictions in what V/Motion can hot-migrate, and what required a cold move. And you have to have a SAN and enough HBA's on the servers to make it all work too. You want a new server? You have to be able to measure what you have so you can tell the check writer why you need more of them, and you also need to be sure you are buying the right thing. Example: It does no good to add more virtual server disk images to an overloaded SAN device.

One of the sessions (actually two of them, it was a double session) I attended today addressed exactly this topic. And I learned an interesting thing I did not know before about the way in which VMware servers are measured. I did mainframe VM for the first 15 or so years of my career. I know about things like the Virtual to Real ratios, wall clock elongation, and so forth. The MF folks solved that years ago by making the guests aware they were running in a virtual world so that the performance data could be dealt with accordingly. It is not trivial to do, but can be done.

I assumed that the VMware world was going to see this exact same thing, and one of the things I was not sure of was how the problem of multiple different guest OS's from multiple different vendors was going to deal with hypervisor awareness. The answer did not surprise me much. <linux content> Linux is getting there first </linux content>. MS Windows is going to take some heavy lifting. As I write this I realize I did not ask about Solaris X86, but I assume that being open source that it will get there pretty quickly if it has not already.

The good news is that you only need inside the guest visibility for some pretty detailed application modeling, and can do a credible capacity plan at the VM level using the Performance Assurance tools as they exist today.

That circles back to BSM and measuring the right thing: If what one needs to justify is new server investment, then being able to show that the servers are consumed at the server level is probably sufficient.

Measure People Correctly Too

ITIL does not really purport itself as a way to manage people, but the same concepts, or perhaps philosophies apply. I am fairly far afield from a normal “Adventures” post already, so what the heck. If you are in a non-traditional IT shop, and part of or managing a team of people in that shop, then these same things apply to the team as they do the computers. You have to measure to gain the results you want. While I do not personally think that a singular measurement of ticket closure rate is a great measure even in a traditional IT shop, it pretty much flies out the window in a non-traditional one. It may not seem intuitively obvious: I usually tell new members to R&D support that came from production environments that it takes about six months to make all the mental shifts required to get one simple concept: In R&D, flexibility rules. Availability is not even number two or even three on the list.

This changes everything, from your SLA on down. I am sure this same thing is true of other environments besides R&D too. A school with a computer lab for example: they will be very good at imaging the hardware back to known states every semester, or even more often because of how dissected the students computers will be. You could not measure the person that is responsible for that lab on computer platform uptime. The end result would be students afraid to look at the computer for fear of incurring the wrath of the lab admin.

If you measure and reward (or I suppose punish) people based on closed ticket counts as a primary measure, then ALL you will get is closed tickets. Lots of them. The customer will not be happy: I guarantee it. You can stand in front of a group of customers with an OpenOffice Impress slide showing ticket closure rates mapped to SLA all day long, and you will never convince them that you are doing the job for them. Really: Your customer, no matter who they are, does not care one little bit about your closure rate. If you want satisfied customers, you have to measure a thing called “Customer Satisfaction”.

Nothing makes a customer madder than being asked if a ticket can be closed before the problem is solved. Well, maybe closing the ticket without asking. Measure and compensate people based on ticket closure rates, and you get the following:

  1. Some people will jump on the easy tickets first, and leave the hard stuff till someone forces them to take it. Actually, this one is interesting, because it also can work against them: I watch who does this. I watch who takes the hard cases. Guess which one I reward?

  2. Some people will close tickets without asking if they solved the problem. This is against everything ITIL and BSM talk about, but it really happens when people think that the way they are being measured for their compensation and promotion is largely ticket count.

  3. Some people will ask to close a ticket that is about to pop the SLA timer, whether or not the problem is solved. They may offer to open a new one, but still, the customer does not want to have that conversation. They want their problem solved.

Actions are important here. Organizationally one can not say that ticket counts are less important than other things, but then when bonus time, raise time, or public recognition time arrives the people rewarded are those with the high ticket closure rates. In R&D Support, I measure my team based on customer satisfaction. Remedy sends a survey to each customer with each closure. Obviously not every survey is answered: I only need a statistically significant response rate though.

I do look at tickets. At review time I read through all the tickets that a person worked over the year. I look for good ITIL/BSM practices, like whether a ticket is associated with an asset or not. I read the work log to see the the customer was kept statused along the way on a long running ticket. I look at the closure to be sure it is not something like “Done” but something that is useful for the Service Desk in the future, in case of similar problems. I look at the complexity of the ticket. And of course I look at the customer satisfaction rating.

I will give a higher rating to someone with 10 tickets than another with 100, if the 10 were high is difficulty and done right and the 100 were system reboots.

The point of that sidetrip is only this: Measure the right thing, it does not matter if you are talking about people or computers. ITIL at it's core is a set of ideas / philosophies / learnings that can be generalized to all sorts of situations.

End of Day One.....


_____
tags:
Thursday, November 01, 2007  |  Permalink |  Comments (0)
 

Powered by Plone

This site conforms to the following standards: