Plastic brain learning/un-learning

Archive for the ‘Data center’ Category

Complexity

In Virtualized Data center on February 16, 2009 at 10:59 pm

 

I recently came across Benjamin Black’s blog on complexity in the context of AWS. He says:

i now see complexity moving up the stack as merely an effect of complexity budgets. like anything worth knowing, complexity budgets are simple: complexity has a cost, like any other resource, and we can’t expect an infinite budget. 

spending our complexity budget wisely means investing it in the areas where it brings the most benefit (the most leverage, if you must), sometimes immediately, sometimes only once a system grows, and not spending it on things unessential to our goals.

What drives design complexity, in the Cloud computing infrastructure space?

Finding the right mix of functional “differentiation” versus “integration” at all levels or tiers of the design (whether it is hardware or software), along with technology & business constraints drive “complexity budget”. 

“Differentiation” at the functional level is pretty well understood, but evolving: Routers, Switches, Compute, Storage nodes although the very basis of how these functions are realized is changing (e.g. Cisco is supposed to be gearing to sell servers).

“Integration” of various infrastructure functions in the data center, is always a non-trivial system (integration) expense.

Some Technology constraints examples:

  • Right from the chip level to system level, energy efficiency improvements are much slower than, hardware density improvements. Consequently, not being able to consume power in proportion to utilization levels result in sub-optimal cost/pricing structures.
  • Context/environment for virtualization for Cloud providers is really around defining what it means to have a “virtualized data center”. Cisco’s Unified computing being one example.
  • Workload characterization & technology mapping

Business constraints: typically, learning curve/market adoption, cost and time-to-market.

The need to produce manageable systems, the imperatives of interdependencies from the chip level to board, system level, software and ultimately the data center level requires us to resort to holistic, iterative thinking & design: We have to consider the function, interaction at all levels, in the context of the containing environment, i.e. what it means to have a “fully virtualized” data center.

Ironically, Virtualization gets more attention than “Reconfigurability” needed (in response to virtualization, workload & utilization variations) at all levels: compute, storage, interconnect, power/cooling, perhaps even topology with in the data center.

After all, Cloud providers would want:

  • “proportional power consumption” at the data center level
  • reduced system integration costs with fully integrated (virtualization aware) compute, storage & interconnect stack
  • optimal cloud computing operations, even for non-virtualized environments (lets face it, there are tons of scenarios that don’t necessarily need virtualization; by the same token, there are virtualization solutions that enable better utilization but don’t require hypervisors).
  • complete automation of managed IT environments

This is all about moving the complexity away from IT customers (adopting the cloud computing model), and in to the data centers/cloud providers.

So, you want to migrate to the Cloud…

In Cloud Computing, Data center on January 1, 2009 at 2:15 am

As we head in to 2009, I can’t help thinking about how Cloud computing will affect ITSM as more companies think about leveraging the variable cost structure/dynamic infrastructure model it enables. While IT capabilities provide or enable competitive advantages (and hence require “agility” in terms of features and capabilities, systemic qualities such as Scalability), other business considerations such as IT service criticality (Is it Mission Critical?) also influence the migration strategy. Here’s one way to look at this map:

 

High level mapping strategy for Enterprise app Cloud migration

High level mapping strategy for Enterprise app Cloud migration

You can imagine mapping your application portfolio along these dimensions.

Many of these applications could be considered Mission Critical, depending on the business you’re in. You may even want some of these applications to be “Agile” in the sense that you want quick/constant changes or additional features or capabilities to stay ahead of your competition. The Evaluation grid above is a framework to help you quickly identify Cloud migration candidates, and priorities based on both business/economic viability as well as technical feasibility.

Step 1:

First lets talk about types of applications we deal with in the enterprise. From the perspective of the enterprise business owner, applications could fall under:

  • Business infrastructure applications (generic): Email, Calendar, Instant Messaging, Productivity apps (Spreadsheets etc), Wiki’s. These are applications that every company needs, to be in business and this category of applications might not necessarily provide any competitive advantage…. you just expect this to work (i.e available, and supports a frictionless information-bonded organization). 
  • Business platform applications: ERP, CRM, Content and Knowledge management etc. These are key IT capabilities on which core business processes are built. These capabilities need to evolve with the business process, as enterprises identify and target new market opportunities.
  • Business “vertical” applications, supporting specific business processes for your industry type. E.g. Online stores, Catalogs, Back-end services, Product life-cycle support services etc.

Once you identify applications falling in to the above categories, you can map those in to the Evaluation grid above, based on the business needs (“Agility”) versus current levels of Support (Mission Critical, Business Critical, Business Operational etc) as well as Architectural consideration (mainly Scalability).

Generally, applications falling under Quadrants 1,5 are your Tier 1 candidates for migration requiring further due diligence (RoI, they’re already scalable, so you have fewer engineering development challenges than business challenges & data privacy, IT control issues).

Applications falling in Quadrants 2, 6 require extensive preparation.

Applications falling in Quadrants 3,4,7,8 could be your “Tier 2” candidates, with those in Quadrants 3 & 4 taking higher precedence. Whether it is internal business process support or for external service delivery, primary drivers for Enterprises to consider Cloud computing are (among other things, of course):

  • Cost
  • Time to market considerations
  • Business Agility

Step 2:

Once you map your application portfolio, you can revisit your “Agile IT” strategy by re-considering current practices w.r.t :

  1. ITSM: While virtualization technologies introduce a layer of complexity, deploying to a Cloud computing platform on top, requires revisiting our current Data Center/IT practices:
    • Change & Configuration Management: You can’t necessarily look at this in an application specific way any more. 
    • Incident Management – we need automated response based on policies
    • Service Provisioning – requires proactive automation (this where “elastic” computing hits the road) as in dynamic provisioning, you need to have business rules that enable automation in this area in addition to billing, metering.
    • Network Management – That is “distributed” computing aware
    • Disaster Recovery, Backups – how would this work in a Cloud environment?
    • Security, System Management – do you understand how your current model would work in a dynamic infrastructure environment?
    • Capacity planning, Procurement, Commissioning and de-commissioning – Since it is easier to provision more instances of your IT “service” on to VM’s do you have solid business rules/SLA’s driving automated provisioning policies. Does your engineering team understand how to validate their architecture (via Performance qualification for the target Cloud environment)? Is there a seamless hand-off to the Operations team, so they can tune Capacity plans based on Performance qualification inputs from Engineering? 
       
  2. Revisiting Software engineering practices
    • Programming model – Make sure the Engineering organization is ready across the board, in terms of both skills and attitude, to make the transition. Adopt common REST api standards & patterns, leverage good practices (remember Yahoo Pipes?), even a “cloud simulator” if possible…
    • Application packaging & Installation – Make sure Engineering and Operations team agree and understand on common standards for application level packaging and installation standards regardless of the Cloud platform you adopt.
    • Deployment model – Ensure you have a standard “operating environment” down to the last detail (OS/patch levels, HW configurations for each of your architecture tiers, Middle ware versions, your internal infrastructure platform software versions etc)
    • Performance qualification, Infrastructure standards and templates – Focus on measuring Throughput (Requests processed per unit time) & Latency levels on your “deployment model” defined above. You need this process to ensure adequate service levels on the Cloud platform.
    • How do you maintain Performance/SLA’s & Application Monitoring: Ensure you have Manageability (e.g. JMX to SNMP bridge) and Observability (e.g. JMX to DTrace bridge or just plain JMX interfaces for providing application specific Observability) built in to all the layers of your application stack at each architecture tier.
    Happy migration! Let me know what your experiences are….

Path to greener Data Centers, Clouds and beyond….

In Cloud Computing, Data center, Energy efficient on December 7, 2008 at 7:49 am

Data center power consumption (w.r.t IT equipment like Servers, Storage) is a hot topic these days. The cumulative install base of Servers around the globe is estimated to be in the range of 50 million units, and in the grand scheme of things, might consume 1% of the world-wide energy consumption (according to estimates from Google, let me know if you need a reference). If you’re wondering, yes, world-wide install base of PC’s might consume several multiples of Server power consumption in Data center….why? Because world-wide PC install base is upwards of 600 million units!

Data center power consumption obviously gets more attention…because of the concentration of power consumption. 50 Mega Watts of power consumption at one Data center location is more conspicuous than the same amount of power consumed by PC’s in millions of households and businesses.

Trying to understand the trends towards more eco-friendly computing requires understanding of developments at many levels. Starting from the VLSI design innovations at the Chip or Processor level, Board level, Software level (Firmware, OS/Virtualization, Middleware, Application level), and finally at the Data center level.

Chip or Processor level: Processor chips are already designed to work at a lower frequency based on load, in addition to providing support for virtualization. In Multi-core chips, cores can be off-lined, depending on need (or problems). Chips are designed with multiple power domains….so CPU’s can draw less power based on utilization. The issue is with other parts of the computer system such as Memory, Disks etc. Can you ask the memory chips to offline pages and draw less power? Can you distribute data across Flash or Disks optimally to allow similar proportional power consumption based on utilization levels? These are certainly some of the dominant design issues that need to be addressed, keeping in mind constraints such as low-latency, little or no “wake-up” penalty.

Board level: Today, Server virtualization falls short of end-end virtualization. When machine resources are carved up, guest VM’s don’t necessarily carve up the hardware resources in proportion. Network level virtualization is just beginning to evolve. For e.g. Crossbow in OpenSolaris. Another example is  Intel’s VT technology: enables allocation of specific I/O resources (graphics card or network interface card) to guest VM instances. If Chips and Board level hardware elements are power (and virtualization) savvy, you can ensure power consumption that is (almost) proportional to utilization levels, dictated by the workload.

Firmware level: Hypervisors, whether Emulated or Para-virtualized, present a single interface to the hardware, and can exploit all the Chip-level or board-level support for “proportional energy use” against a given workload.

OS level: Over a sufficiently long time interval (months), server utilization is predominantly characterized by low utilization intervals. Average utilization of Servers in Data centers is usually less than 50%. That means there is plenty of opportunity for Servers to go in to “low-power” mode. How can you design the OS to co-operate here? 

System level: Manageability (e.g. responding to workload changes, migrating workload seamlessly etc), Observability (e.g DTrace ), API’s to manage Middleware or Application stack in response to low-power mode of operation (again, proportional power usage w.r.t workload) are going to be paramount considerations.

Cloud level: shouldn’t Clouds look like operating systems (seamless storage, networking, backups, replication, migration of apps/data, dependencies similar to pkg dependecies). 3Tera and Rightscale solve only some of these problems…but many areas need to be addressed: Dynamic, workload based Performance qualification, Mapping application criticality to Cloud Deployment Models, Leveraging Virtualization technologies seamlessly…

Data center level: Several innovations outside of IT (HVAC systems, again enabled by IT/sensor technologies), as well as innovations at all of the levels discussed above will help drive down PUE (Power Usage Effectiveness) at the Data center level closer to the holy grail (PUE = 1, i.e all the energy supplied to the Data Center goes to useful compute work done). Microsoft’s Generation4 effort represents a leap in this domain, as more and more companies realize that this is a big change of paradigm, as computing business truly goes in to utility scale/mode.

So, there are plenty of problems to be solved in the IT space….pick yours at any of these levels 🙂