Real-World-Virtualization
Studies and consultants and FUD, oh my!. You do not have to be in the computer industry for long before you realize that at least 90% of the job is separating fact from fiction. When Microsoft funds a study, and the results come out in Microsofts favor, no one is surprised. Calling it "Get the Facts" lends a sort of perverse humor to it.
The bad thing about this is, the result might even be real, or have a grain of truth in it, but there just is no way to tell because of the way it was funded.
Some survey results are the same way (it is actually very hard to design a valid survey): my recent favorite was combo burrito survey / sales call:
“Sir, We are survey professional IT persons such as yourself today to see what the state of the art is in IT Thingies. Are you interested more efficiency and saving money on your IT Thingies.”
“Yes: I am always interested in that.”
“Have you ever heard of product X, or product Y, or Product Z?”
“Yes: I know about those products. I have tested them. We use product Z here.”
“Taking the first one, product X, do you have any of it installed right now?”
“No: not installed now. We tested it. We did not like it.”
“Well, then are you interested in product X's ability to save you time and money?”
“No” At this point I have given up. Clearly not listening to my answers. Pause on phone.
“Were you aware that product X will reduce your costs and have positive ROI in the first year?”
“I am aware of the product, yes.”
“So you are interested in product X”. Obvious relief from poor cold calling sale person.
“No.”
“You do know that product X saves you all this time and money.”
“No”
“Oh, so you want to find out more about this? As part of this survey we are able to send you information about the products.”
“No." Pause. Not sure they are confused, or just waiting for me to say more. I wait too.
“So... you don't want to save money on IT thingies?” Incredulous: Tone clearly indicates I have no business being in IT.
“No: I do want to save money on IT thingies”
“But.... you don't want to know how to do it with product X...”
“Yes: I don't want to do it with product X. We have product Z, as I mentioned earlier. We looked at X, we went with Z.”
“You already know about product X?”
“Yes”
“Then... you know it will save you all this money...”
“No.”
On and on. Product Y and Z are never again mentioned.
Is Virtualization Real?
What does that intro have to do with today's topic? There has been enough hype and hot air over the benefits of Virtualization to float the Titanic, and then some. I am a big fan of virtualization: I started my career working on the worlds best hardware virtualization platform, VM (now called z/VM). Been working with virtualization at one level of another nearly thirty years. I say all this so you understand my bias. As much as I like the technology, virtualization is not for every one, every thing, or every situation.
Three things roll into today's post:
-
I talked a bit about this issue when wrote a entry a while back called "This Week in Virtualization". One of the main points of that post was to point out that not everything under the sun is a candidate for the virtual world. Quick example; Benchmarking is very hard to do in a virtual world. Not impossible. Just hard.
-
In my second to last post, "NAS Redeaux: Wrapup" I mentioned that one of our favorite VMware servers was the Sun X4600.
-
I have mentioned here from time to time just how many different computers we have here for R&D Support (over 2600), and how some of the computers are older than some of the people reading this entry.
Pulling together those threads here, I can talk about a real world example of where virtualization saves us (BMC) serious time and money. Better, it saves resources, and improves our ability to support our customers.
Oldie Moldie
First: About that old hardware: We have hardware stretching back to the 1980's, and we have piles hardware in order to support all the permutations of having over 600 software products. The possible computer environment permutations runs to the millions, but we of course don't have that many computers. We do rapid provisioning a great deal, re-purposing some gear daily. Sometimes though, a computers time has come. You can tell it by the doomed look in their eyes, the stoop to their shoulders. They way their 1X CD-ROM won't quite close anymore. It's over, and they know it. When we are done with a computer, we have to pay to have them hauled away. One set of computers that we have fully gotten our money out of is from the early and mid 1990's. Originally these computers were people in R&D desktop systems. Then, as new systems were bought, these were migrated into the data center to be test systems. Time passed. Some failed, and were mined for parts for others. System configs were maxed out wherever possible, so that these 100 - 233 Mhz Pentium chipped units were crammed to the brim with 512 MB of RAM. Four 128 MB sticks of RAM. Remember when desktops had four or more RAM slots? Remember RAM stackers? Hey, in 1993, we would have killed for a computer that big. Now take into account that BMC has roughly 7000 employees, do various bits of arcane math, and you'll see we have a lot of this class of computer.
Each of these old computers has an old style, non-switching, inefficient power supply. The 10/100 Ethernet card is not integrated into the motherboard and is full height. The processors require five volts to run, and use a fair number of watts. The hard drives are the old “half” height which is still over an inch tall, six or eight gigabyte capacities, have many platters, and older style motors that never spin down to save energy. They have been running so long that if they get powered off they sometimes can't power back on, or literally require being whacked on the side to free the heads from the lubricant that has built up over time in the landing zone.
These old things were / are useful to us as synthetic workload generation. Each of them can pretend to be 10 virtual people, and in concert will rack after rack of similar computers we can build up a workload of hundreds or even thousands of virtual people hitting servers to test our code on. It takes ten of them to fill a Gig-E pipe, but we have literally hundreds. We also have some really old OS's running on them, including one that had Redhat 5.2 until very recently. They are cheap to build test clusters with, and this was one place we had a practice cluster for Solaris X86.
We can place about 25 of the deskside tower form factor PC's on data center technical furniture, stacked in three vertical rows. There is one KVM on the 2nd tier to allow console access to all the computers. Each of the Ergotron 3000 desk style technical furniture fixtures is six feet long, and three feet deep. 18 square feet for every 25 of these old-style PC's. By today's standards they use a lot of power per CPU, but because the form factor is so large, we literally could not put them close enough together to create a hot spot in our data center. The data center has 18 inch raised floor, 250 PSI rated. It was built in 1993, and the cooling designed to dissipate 55 watts a square foot. Sigh. That seemed like a lot back then.
Sun X4600 as VMware server
Now lets look at the Sun X4600. We get the 64 GB units. An X4600 at the time of this writing can get up to 256 GB, but only Solaris X86 or Linux can address that much memory on this platform. 64 GB is the max VMware hypervisor currently handles. Ours have 16 processors. With quad core, this can go to 32 processors. Or real world experience in tuning is that without the ability to add more memory we can not really effectively use more than 16 processors with our workloads: We aren't calculating Pi or finding then next biggest prime or anything.
VMware 3.0.1 is currently limited to about 128 Virtual Processors, and this has mapped back to needing about 16 real ones so far. It really depends on your workload, and how CPU intensive it is though. As they say: Your Mileage May Vary (YMMV) 16 processors works for us right now.
Power supplies on the X4600 are 4 in number, and either 850 watt at 83% efficiency, or 950 watt at 89% efficiency. Ours are the 850's. We have two of these machines, in a VMware HA config. Each X4600 server runs 60-75 guests. We use P2V (Physical to Virtual) tools to migrate the workloads on machines that are being retired. The real machine is gone, the work it was doing or at the very least, might need to so, lives on. This is a big win for some of the really old versions of Linux or MS Windows that we might need to have on the hook, just in case. A VM that is not up is only using a small amount of disk space. But if a customer calls in with a problem on that platform, that disk space just became priceless.
Here is the first cool part: each VM guest has more memory than what it had on the old real PC's hardware. This is limited only by the capabilities of the guest OS itself now. We can make the virtual memory whatever we need it to be to solve the problem at hand.
We could run more guests on any given X4600 system. The limitation is really the number of virtual processors in play. With 128 being the current recommended total, that would be 32 quad CPU guests, 64 dual CPU guests, 128 single CPU guests, or some mix thereof. We can't actually run it at this max either though. We have two X4600's in an HA setup. The HA bit means, should an X4600 fail, The surviving X4600 system has to be able to run all of the workload. While we don't exactly half it, we do hold it to about 60 machines up at any given time on any given X4600.
We could place 10 X4600's in a single 42U rack, each one rack covering a 2 floor by 3 foot bit of flooring, but we don't. We also have to put shared storage into the rack to enable things like the HA features. We currently have two X4600's, plus a shared SAN disk array.
Data center shootout
Pass one: P2V'ing mid 1990's PC's to the X4600's:
|
Environment |
Quantity |
Wattage |
Price per Kilowatt / Hour |
1 year = 8760 hours |
|
X4600, HA, 60 VM's each, 120 VM's total |
2 |
4*850 = 3,400 = 3.4 Kw *2 = 6.8Kw |
10 US Cents or 1 USD per 10 hours |
8760*3.4 = 29,784 /10 = 2978.40 USD / year per X4600. 5956 USD a year for two X4600's |
|
PC |
120 |
120*375 = 45,000 = 45Kw |
10 US Cents or 1USD per 10 hours |
8760*45 = 394,200/10 = 39,420 USD per year. |
Making the Numbers a Bit More Real
Is this real? No. I used the max ratings of the power supplies on the X4600, and I understated the rating on the PC's: I did not put an AMP clamp on the power cord and see the average amperage draw. I did not take into account power supply efficiencies, or the fact that the X4600 has external disks on the SAN, while the PC's all have internal disks. Plus in the real world we look at three year ROI rather than one year. Further, in our world, we use stuff longer than three years most of the time.
What is real above without any adjustment is that 120 computers would require 5 Ergotron 3000 desks to sit on, covering 90 square feet of the data center floor, versus one rack covering six square feet with space left in it for more computers.
I also did not take into account Air Conditioning (A/C, but not Alternating Current kind of A/C....), and the power to run the A/C, or the costs of buying UPS or generator to sit behind all the electrons being feed to the computers. Also, we traded in 120 ports worth of 100 Mb Ethernet to eight ports of Gig speed copper. By being close together, we have substantially shortened the Cat 5 copper runs too. Less copper equals less money these days...
Staying cool
To get the numbers a little tighter, I'll next factor in the power it takes to run the A/C to cool the data center.
Looking at the SAN disks the X4600 uses, we have 2000 watts of power supply in total there. If I half the power number usage for the PC's (IE, once the PC is booted, it settles down to only use half of the power that the power supply is rated for), and leave the X4600's where they are to account for the SAN disks, and round up a bit, we'll be at 6,000 USD a year versus 20,000 USD a year. That intuitively feels about right as well. Three years is 12,000 USD for the X4600 versus 60,000 USD for the PC's.
A/C costs can not be ignored. Keeping this in watt / hours denominations (since BTU is a rating of heat removal in one hour), it takes 3.4 BTU of A/C to deal with every watt / hour of power that goes into the computers. For the PC's, after the halving and rounding down I did above to account for not actually using all the power in the power supply at all times, this is 22.5 Kw (22,500 watts) * 3.4. 76,500 BTU of A/C.
Two X4600 plus SAN came in rounding up, 7,000 * 3.4. 23,800 BTU of A/C required, but in a much smaller space.
I wondered how much power it requires to drive A/C. I looked at the box on a high efficiency A/C unit for some clues. At 12.63 EER, it takes 950 watts to deliver 12,000 BTU of cooling. Our data center uses chill water towers on the roof, massive pumps in the basement, and huge air handlers on the actual data center floor. These are 1993 vintage A/C units, so I assume their EER is not better than this. Even if it is, this is probably good for a ballpark number.
For easy math, I'll assume 1000 watts of A/C to deliver every 12,000 BTU. For the PC's that is 76,500 / 12,000. Rounding down again to favor the PC's, that is 6,000 watts of power for cooling. 60 cents an hour to run. Rounding down again, 5,000 USD a year, or 15,000 USD over three years.
The X4600's 23,800 BTU requires, rounding up, 2000 watts of A/C power to cool. 20 cents and hour. 5000 USD over three years. Total electricity is going to cost 17,000 for three years on the X4600, with everything rounded up. Total cost of power for the PC's for three years is 75,000 USD, rounding down at several turns along the way.
Service Life and Other Intangibles
We have had these PC's in service for four times that long. I have not counted the cost of data center space, or network connections or staff to maintain the hardware. Clearly the X4600's are going to save use serious money. In the most conservative fiscal sense, have a three year ROI, and start saving us 56,000 USD a year every year they are in service after three.
Finally: Power is not getting less expensive as time goes by. These numbers will vary depending on what one currently pays for electricity. In some googleing around to try and validate these numbers, it appeared to me that this was a pretty good median price to use. I saw rates some places where there was inexpensive hydroelectric nearby that were half these, but I also saw some that were higher.
Carbon Power
Every kilowatt saved is that much less carbon dioxide in the air. Depending on how power in generated, this is anywhere from almost nothing (Hydro, Geothermal, Wave / Tidal, Wind) to over two pounds of CO2 per Kilowatt / hour (Coal). Texas uses a great deal of natural gas, which is one pound of CO2 per Kilowatt hour. Over the life of these computers that is a pretty serious reduction on the impact we are having on this planet. My personal goal, as data center manager for R&D Support, is to reduce the number of real computers we use by just over 1000 over the next 12 months, increasing the number of OS images by that same number.
1000 computers leave, 2000 OS images remain. This will deliver to R&D, QA, and Customer Support a better, faster environment from which they can support our customers. And a better environment in general.
I think that is a pretty good use for virtualization.
Err... Linux?
Oh... Linux tie in. This is “Adventures in Linux” after all. Many of the VM's are Linux. VMware started on Linux with their Hypervisor. ESX still uses Linux for the service console. You can do exactly the same thing with Xen / Linux. Take your pick.
The Rest of the Story
UPS'S are funny beasts, In on of our remote locations, we have used VMware to reduce our server count from over 300 to about 185. At the same time, an undersized UPS in that location went from a runtime of less than 15 minutes to 30 minutes. More than double the runtime with 2/3's the physical systems, but more than 300 OS images in service.
It works out this way in part because of the funny nature of the chemistry of the UPS's Lead / Acid batteries. Reducing load increases runtime logarithmically.
One final disclaimer: I mentioned the Sun X4600's here, and the Sun X2200's in the NAS Redeaux series. We like that hardware a lot. Sun did not pay me to say anything nice about them, or give me a special deal on the hardware that we did not already get. In fact. I waited till after the hardware was bought to even bring all this up. I was trying to keep this as real as possible by mentioning the actual hardware we are using for this. The SAN array I mentioned above for the X4600 VMware setup is an HP MSA 1000.
(I hope I got all that math right: the hardest thing about this is making sure everything is expressed in the same units!)
_____
tags:


