SpyderByte.com ;Technical Portals 
 The #1 Site for News & Information Related to Linux High Performance Technical Computing, Linux High Availability and Linux Parallel Clustering
Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
Research and Services
Cluster Quoter (HPC Cluster RFQ)
Hardware Vendors
Software Vendors
HPC Consultants
Training Vendors
Latest News
News Archives
Search Archives
Featured Articles
Cluster Builder
User Groups & Organizations
Golden Eggs (Configuration Diagrams)
Linux HPC Links
Cluster Monkey (Doug Eadline, et al)
HPCWire (Tabor Communications)
insideHPC.com (John West)
Scalability.org (Dr. Joe Landman)

Beowulf Users Group
High Performance Computing Clusters
Thinking Parallel
The Aggregate
Cluster Computing Info Centre
Coyote Gultch
Robert Brown's Beowulf Page
FM.net: Scientific/Engineering
HPC User Forum
Linux HPC News Update
Stay current on Linux related HPC news, events and information.
LinuxHPC Newsletter

Other Mailing Lists:
Linux High Availability
Beowulf Mailing List
Gelato.org (Linux Itanium)

Mobile Edition

Linux Cluster RFQ Form
Reach Multiple Vendors With One Linux Cluster RFQ Form. Save time and effort, let LinuxHPC.org do all the leg work for you free of charge. Request A Quote...

Latest News

Cost Recovery by Design
Posted by Philip Carinhas, Thursday April 20 2006 @ 12:43PM EDT

by Dominique Heger & Philip Carinhas, Fortuitous Technologies


Often seen as an extra expense, performance and capacity planning often saves a project more money in the long run. Costs are usually recovered by the completion of the initial implementation phase if not sooner. Moreover, projects that are properly planned will achieve design goals and allow future scalability at a significantly lower total cost.

Performance Planning Issues

In today's parallel, heterogeneous, and interconnected IT wilderness, predicting and controlling cost factors surrounding systems performance and capacity planning is overwhelming at best. For larger IT projects, it is not uncommon to find situations where the cost factors for performance tuning and capacity problems reflect the largest and the least controlled expenses. To illustrate, a sudden slowdown of an enterprise wide application may trigger user complaints, delayed projects, an IT support backlog, and ultimately a financial loss to the organization. By the time the performance problem is located, analyzed, worked around, tested, and verified, an organization may have spent tens of thousands of dollars in time, IT resources, and hardware, only to fall back into the same vicious cycle the very next year.

The Crux

When performance is designed into the final solution, costs can be contained and reduced while ensuring required performance with scalability potential. This approach shifts the emphasis away from the installation and setup phase to the planning and design stages. It is paramount that IT not only understand the expected workload behavior, but responsibly act by conducting feasibility and design studies prior to spending many thousand of dollars on a solution that in a best case scenario, may not be optimal, and in a worst case scenario, completely fails.

Hidden Costs Associated with Bad Planning

  • Unneeded Hardware

    Application performance issues have an immediate impact on customer satisfaction and an organization's bottom line. It is not uncommon that while a performance issue surfaces, organizations start adding more (often expensive) hardware into the operation mix, without fully understanding where the problem truly lies nor understanding how the extra hardware will affect overall system performance. Hence, working on the symptoms and not the underlying cause may provide an organization with some relieve in the short run, but intensifies the issues in the long run, as even more hardware has to be troubleshot and analyzed. In addition, there are these costs associated with redundant hardware:

    • Electricity

    • Extra Cooling (several times the electricity costs)

    • Extra IT Overhead (See Below)

    • Hardware Replacement Costs (drives, fans, psu, et al)

  • IT Overhead
    In addition to hardware costs, the IT personnel costs associated with unplanned performance tuning exercises can be excruciating. IT managers may be forced to commit hundreds of man-hours to solve even simpler performance problems. As in some circumstances, the actual source of the problem may not be easily identified, IT personnel may spend hours or days analyzing and tuning the wrong subsystem. To make matters worse, some performance tuning exercises may require crossing over into the domains of security, reliability, or availability. Proper design and planning can reduce these costs.
  • Security and HA
    Without initial proper planning, fire-fighting scenarios such as these may result into additional work for an organization's security or high-availability (HA) personnel as well. Proper design and planning can significantly reduce these costs as well.
  • Lost Revenue
    Without proper planning, projects run the risk of partial or total failure which can drive away associated revenue. There is no excuse for a project to fail from a lack of adequate planning and design. Even if the system is not designed for direct revenue stream, it can cause loss for internal customers and related systems.

An Illustration

As an example of the shortcomings of zealous use of hardware lets consider CompanyX, whose 10 node cluster would not perform well under stress. The managers authorized IT to buy 5 more servers to increase performance, which resulted in no noticeable performance gain. When the system was finally examined, a simple model immediately showed that the memory and IO subsystem were bottlenecked, and the optimal number of compute nodes was about 10.


In short, the proper approach to managing systems performance is to design performance into the solution. If the system is already in production, the recommendation is to conduct a performance study that covers application, operating system, and hardware subsystems, respectively. It is paramount to understand not only the actual workload behavior, but also the interaction between the application, the OS, and the hardware. Treating performance related issues early on in an IT project avoids hidden cost scenarios, and is exponentially cheaper than performing extraneous tuning after deployment.

About the Authors

Dominique Heger has over 18 years of IT experience, focusing on systems performance, capacity planning, cluster technology, performance modeling, algorithms and data structures, and I/O scalability. Philip Carinhas is the President and CEO of Fortuitous, and has over 15 years experience in Linux and enterprise computing. They can be found at http://fortuitous.com

< Cluster Interconnects: The Whole Shebang | Metascheduling - Free study compiled by field experts at GridwiseTech >


Supercomputing '07
Nov 10-16, Reno, NV

Register now...



Cluster Monkey

Golden Eggs
(HP Visual Diagram and Config Guides)
CP4000 32x DL145G2 GigE Opteron, Dual Core
CP4000 64x DL145 GigE Opteron
CP4000 102x DL145 GigE Opteron
CP4000 32x DL145 Myri Opteron
Rocks Cluster 16-22 DL145 Opteron
Rocks Cluster 30-46 DL145 Opteron
Rocks Cluster 64-84 DL145 Opteron
LC3000 GigaE 24-36 DL145 Opteron
LC3000 Myri 16-32x DL145 Opteron
LC3000 GigaE 16-22x DL145 Opteron
LC2000 GigaE 16-22x DL360G3 Xeon
> DL365 System 2600Mhz 2P 1U Opteron Dual Core
DL360 G5 System 3000Mhz 2P 1U EM64T Dual/Quad Core
DL385 G2 2600Mhz 2P Opteron Dual Core
DL380 G5 3000Mhz 2P EM64T Dual/Quad Core
DL140 3060MHz 2P IA32
DL140 G2 3600MHz 2P EM64T
DL145 2600MHz 2P Opteron
DL145 G2 2600MHz 2P Opteron Dual Core
DL360 G4 3400MHz 2P EM64T
DL360 G4p 3800MHz 2P EM64T
DL380 G4 3800MHz 2P EM64T
DL385 2800MHz 2P Opteron Dual Core
DL560 3000MHz 4P IA32
DL580 G3 3330MHz 4P EM64T
DL585 2800MHz 4P Opteron Dual Core
Montecito 2P-16P, rx2660-rx8640 (multi-system diagram)
rx2660 1600MHz 2P 2U Montecito Systems and Cluster
rx6600 1600MHz 4P 7U Single & Cluster
rx3600 1600MHz 2P 4U Single & Cluster
rx2620 1600MHz 2P 2U Single & Cluster
Superdome 64P base configuration
Integrity Family Portrait (rx1620 thru rx8620), IA64
rx1620 1600MHz 2P MSA1000 Cluster IA64
rx2620 1600MHz 2P MSA1000 Cluster IA64
rx4640 1600MHz 4P MSA1000 Cluster IA64
rx7620 1600MHz 8P 10U Systems and MSA1000 Cluster
rx8620 1600MHz 16P 17U Systems and MSA1000 Cluster
MSA30-MI Dual SCSI Cluster, rx3600, rx6600 and rx2660
MSA30-MI Dual SCSI Cluster, rx1620...rx4640
MSA1500 48TB, SCSI and SATA
Dual Core AMD64 and EM64T systems with MSA1500

Appro: Enterprise and High Performance Computing Whitepapers
Is Your HPC Cluster Ready for Multi-core Processors?:
Multi-core processors bring new challenges and opportunities for the HPC cluster. Get a first look at utilizing these processors and strategies for better performance.

Accelerating Results through Innovation:
Achieve maximum compute power and efficiency with Appro Cluster Solutions. Our highly scalable clusters are designed to seamlessly integrate with existing high performance, scientific, technical, and commercial computing environments.
Keeping Your Cool in the Data Center:
Rethinking IT architecture and infrastructure is not a simple job. This whitepaper helps IT managers overcome challenges with thermal, power, and system management.

Unlocking the Value of IT with Appro HyperBlade:
A fully integrated cluster combining advantages of blade and rack-mount servers for a flexible, modular, scalable architecture designed for Enterprise and HPC applications.
AMD Opteron-based products | Intel Xeon-based products

Hewlett-Packard: Linux High Performance Computing Whitepapers
Unified Cluster Portfolio:
A comprehensive, modular package of tested and pre-configured hardware, software and services for scalable computation, data management and visualization.

Your Fast Track to Cluster Deployment:
Designed to enable faster ordering and configuration, shorter delivery times and increased savings. Customers can select from a menu of popular cluster components, which are then factory assembled into pre-defined configurations with optional software installation.
Message Passing Interface library (HP-MPI):
A high performance and production quality implementation of the Message-Passing Interface (MPI) standard for HP servers and workstations.

Cluster Platform Express:
Cluster Platform Express comes straight to you, factory assembled and available with pre-installed software for cluster management, and ready for deployment.
AMD Opteron-based ProLiant nodes | Intel Xeon-based ProLiant nodes

Home About News Archives Contribute News, Articles, Press Releases Mobile Edition Contact Advertising/Sponsorship Search Privacy
     Copyright © 2001-2007 LinuxHPC.org
Linux is a trademark of Linus Torvalds
All other trademarks are those of their owners.
  SpyderByte.com ;Technical Portals