Category: Acropolis

Network Automation – “The Last Mile”

With so many organisations looking to increase their ability to react to business change and continually do more with ever reducing resources, automation is the only way to solve the challenge. Nutanix has pioneered the simplification of traditional datacentre infrastructure from compute, to storage and virtualisation but many customers I speak to ask about the network.

The Dynamic Duo

Network automation appears to the be the “last mile” in their journey to a fully automated datacentre and with the SDN market place rather fragmented it’s tough for organisations to pick a solution which completes the loop.

Many organisations are also embracing a DevOps methodology to improve the processes around development and release management of new and existing applications, ultimately driving their innovation goals – with that comes the requirement to provision infrastructure in rapid time.

The public cloud has provided a great benchmark for witnessing what can be achieved through automation. Let’s face it before AWS came along how long did it take to deploy a virtual machine, on a new network within a new datacentre…..a long time. You’d spend a huge amount of time just ensuring that you had compatible kit, let alone the process of deploying hypervisors, their supporting management infrastructure, provisioning and connecting storage environments etc.….with public cloud that is all abstracted away which enables businesses to move faster.

Nutanix aims to solve the rapid deployment challenge and on-going scaling requirements whilst ensuring that “day 2” operations are also streamlined, just like in the public cloud where the infrastructure building blocks are invisible. To aid in this journey Nutanix have partnered with Mellanox to provide the automation and simplification of “day 2” operations for common network tasks to complete the loop.

Mellanox are a leading supplier of end-to-end Ethernet and InfiniBand intelligent interconnect solutions and services for servers, storage, and hyper-converged infrastructure.

Mellanox switches have a REST-based API called NEO which enables tasks such as VLAN provisioning and trunking on the appropriate ports utilised by the Nutanix nodes. This enables consumers of the Nutanix Enterprise Cloud Platform to forget about VLAN provisioning requests, as these are automatically setup and migrated as VM’s move within the Nutanix infrastructure, ultimately ensuring that applications get access to the appropriate networks to communicate. This enables developers and operations teams to concentrate on delivering real business value and get on with developing the next business defining application!

Here are a couple of video’s walking through the integration. In the first example a VM will be migrated from Node A to Node B, as we automate the configuration of the VLAN on the Mellanox switches VLAN’s are only configured as required – in real-time, rather than trunking all existing VLAN’s on all ports.

In the second example we create a new VM within the Nutanix Prism console, just like the previous example the combination of Prism and NEO take care of the VLAN provisioning task ensuring that the consumer of the Enterprise Cloud Platform can get on with doing just that – consuming it, just like in the public cloud.

If you would like to know more about Nutanix and how we deliver and Enterprise Cloud Platform, check out our website; https://www.nutanix.com/what-we-do/

If you would like to find out more about Mellanox and their intelligent interconnect solutions, take a look at their website; http://www.mellanox.com/page/company_overview

Thanks for reading

Stuart

XenDesktop on Nutanix CE

Want to see how to install, deploy and update 99 desktops on an Intel NUC running CE while playing a game and getting education on how long a proper cup of tea takes to brew?  Of course!

One of the best things I was involved in while at Citrix was seeing the evolution of the flagship product.  From Metaframe 1.8 when I started to XenDesktop 7 when I left we always drove towards simplicity for the end user and eventually the admin.  It’s this simple idea of taking away complexity and either replacing it with something easy and intuitive or just making once manual tasks automated and invisible.

With XenDesktop 7 Citrix made great steps with Machine Creation Services and right from the start I’ve been a vocal supporter because it fit the beliefs of Citrix so well.  Nutanix brings this simplicity to another level by ensuring that not only is MCS easy to deploy but it’s also predictable and scalable – something that it has struggled with – much in the same way as linked clones did with Horizon View.

Over the last week or so I’ve been playing with my new Intel NUC and seeing what our free Community Edition can do.  I’ve completed some simple provisioning tests because I was naturally curious as to how quickly a little home lab system can spin up desktops.  The speed, as you’ll see below, is rather impressive but in this post I’m going to show you how easy it is to integrate XenDesktop and any Nutanix deployment running our own hypervisor AHV.  The steps you see below are identical to how a full production Nutanix cluster would work so let me take you from zero to hero in 16 minutes.  You’ll see what components need installing on the broker and how to set up a connection to, in this example, a single Nutanix Community Edition node.

In case you’re interested my NUC is a Skull Canyon model with two SSDs and 32GB RAM and was a lovely present from Intel for Nutanix being so bloody awesome.

If you like what you see and would like to try Nutanix CE out yourself then go to this link and register with your work email address: https://www.nutanix.com/products/register

There’s no music or voiceover so pick your favourite SAN killing tune, open a bottle of beer and enjoy one of the most poorly put together videos on the internet not involving cats.

Thanks for watching.

 

David

Sock stuffing

socksnake

For a while now the metrics most infrastructures, including Nutanix, are benchmarked against is IOps – effectively the speed the storage layer can take a write or read request from an application or VM and reply back.  Dating back to the (re)birth of SANs when they began running virtual machines and T1 applications this has been the standard for filling out the shit vs excellent spreadsheet that dictates where to spend all your money.

Recently thanks to some education and a bit of online pressure from peers in the industry, synthetic testing with tools like IOmeter have generally been displaced in favour of real-world testing platforms and methodology.  Even smarter tools such as Jetstress doesn’t give real world results because it focuses on storage and not the entire solution.  Recording and replaying operations to generate genuine load and behaviour is far better. Seeing the impact from the application and platform mean our plucky hero admin can produce a recommendation based on fact rather than fantasy.

Synthetic testing is basically like stuffing a pair of socks down your pants; it gets a lot of attention from superficial types but its only a precursor to disappointment later down the line when things get serious.

In this entry I want to drop into your conscious mind the idea that very soon performance stats will be irrelevant to everyone in the infrastructure business.  Everyone.  You, me, them, him, her, all of us will look like foolish dinosaurs if we sell our solutions based on thousands of IOps, bandwidth capacity or low latency figures.

“My God tell me more,” I hear (one of) you (mumble with a shrug).  Well consider what’s happened in hardware in the last 5ish years just in storage.  We’ve gone from caring about how fast disks spin, to what the caching tier runs on, to tiering hot data in SSD and now the wonders of all-flash.  All in 5 or so years.  Spot a trend?  Bit of Moore’s Law happening?  You bet, and it’s only going to get quicker, bigger and cheaper.  Up next new storage mediums like NVMe and Intel’s 3D XPoint will move the raw performance game on even further, well beyond what 99% of VMs will need.  Nutanix’s resident performance secret agent Michael Webster (NPX007) wrote a wonderful blog about the upcoming performance impacts this new hardware will have on networking so I’d encourage you to read it.  The grammar is infinitely better for starters.

So when we get to a point, sooner than you think, when a single node could rip through >100,000 IOps with existing generations of Intel CPUs and RAM where does that leave us when evaluating platforms?  Not synthetic statistics that’s for sure.

Oooow IO!

Oooow IO!

By taking away the uncertainty of application performance almost overnight we can start reframe the entire conversation to a handful of areas:

Simplicity

Scalability

Predictability

Insightfulness

Openness

Delight

Over the next few weeks (maybe longer as I’m on annual leave soon) I’m going to try to tackle each one of these in turn because for me the way systems are evaluated is changing and it will only benefit the consumer and the end customer when the industry players take note.

Without outlandish numbers those vendors who prefer their Speedos with extra padding will quickly be exposed.

See you for part 1 in a while.

A dip into Prism

A few weeks ago I was given the lovely task of attending a meeting at the last minute with no preparation time and a 3 hour drive just after I got back from annual leave.  The meeting was only for an hour so I decided to record a short 10 minute video in the morning to take them through what they’d actually be doing on a Nutanix cluster from day to day.  Knowing the type of customer I knew there would be no internet connection let alone a 4G signal.

I could have just given a normal powerpoint pitch and sent them back to sleep on a beach (which is where I still wanted to be) but I wanted to keep them awake and also elevate the conversation away from dull stuff like hardware and storage.  Usability, simplicity and time to value was the intention here so click below and leave a comment if it made sense to you.  No voice over as I’m too cheap to buy a program for my Mac that’ll do it 🙂

Do the Evolution

*** Updated for v4.6 ***

evoluOver the last 18 months I’ve seen some amazing innovations come into the Nutanix platform but I’ve only personally seen half of the story.  Before I joined we made some staggering strides and I’d like to take you through those today.

Below are some abbreviated entries from all of the release nodes dating back to NOS 2.6 back in January 2013.  I’ve highlighted some of the ones I consider to be important milestones in bold but these are open for discussion and I’m probably wrong anyway 🙂

In this short plagiarised post I wanted to illustrate what can be achieved when approaching a problem with a software first mentality and riding the wave of Moore’s Law.  While we’ve brought on new hardware models, ditched Fusion-IO cards for SSDs and partnered with Intel to make it all sing a sweet tune the biggest strides have been made in our famous non-disruptive rolling software upgrades.  Whether you bought a node this year or two years ago all of these features should be available to you.

The next time your SAN vendor (or any vendor) claims they’re constantly adding value to their customers get them to put together something like this post because it’s only when you look back do you appreciate how much you’ve already accomplished.

 

 

NOS 2.6 (January 2013)

  • Genesis, a new management framework that replaces scripts run from the vMA, which is no longer required.
  • Support for 2nd-generation Fusion-io cards.

NOS 2.6.3

  • Nutanix Complete Cluster 2.6.3 supports vSphere 5.1.

NOS 2.6.4

  • Support for Intel PCIe-SSD cards is available as a factory-installed option.

NOS 3.0 (September 2013)

  • VM-centric backup and replication
  • Local and remote backup of VMs.
  • Bidirectional replication.
  • Planned and emergency failover from one site to another.
  • Consistency groups of multiple VMs that have snapshots made at the same time.
  • Scheduling, retention, and expiration policies for snapshots.
  • Compression
  • Inline compression of containers.
  • Post-process compression of containers with a configurable delay.
  • Support for NX-3000
  • Dual 10 GbE network interfaces.
  • Higher maximum memory configuration.
  • Intel Sandy Bridge CPUs.
  • Improved hardware replacement procedures.
  • CentOS for Controller VM.
  • Adherence to requirements specified in the U.S. Defense Information Systems Agency (DISA)
  • Security Technical Information Guides (STIGs).

NOS 3.1 (January 2014)

  • New entry NX-1000 series platform
  • New deep storage NX-6000 series platform
  • New higher performance model in the NX-3050 series.
  • ESX 5.1 support
  • Mixed nodes in a cluster

3.5 (December 2014)

  • New HTML5 based administration interface
  • Active Directory/LDAP authentication
  • Introduction of RESTful API
  • User-configurable policies for frequency and alert-generating events
  • Expanded alert messages
  • Support for user-provided SSL certificates
  • User-manageable SSH keys and Controller VM lock down
  • SNMPv3 support and Nutanix MIB
  • Faster display of real-time data
  • More intuitive nCLI command syntax and enhanced output
  • Deduplication of guest VM data on the hot tiers (RAM/Flash)
  • Optimization of linked clones
  • Container and vDisk space reservations
  • Compression of remote replication network traffic
  • Automatically add new disks to single storage pool clusters
  • Storage Replication Adapter (SRA) for VMware Site Recovery Manager
  • General availability of KVM hypervisor
  • Technology preview of Hyper-V
  • Automated metadata drive replacement
  • Improved resiliency in cases of node or metadata drive failure
  • Field installation of replacement nodes

NOS 3.5.1

  • Support for the new NX-7000 (GPU platform), NX-6020, NX-6060, NX-6080, NX-3060, and NX-3061 models
  • Support for vSphere 5.5
  • Analysis dashboard expanded list of monitored metrics
  • DR dashboard expanded protection domain details
  • Storage dashboard deduplication summary
  • Application consistent snapshots

NOS 3.5.2

  • Support for Windows Server 2012 R2 Hyper-V
  • Support for application consistent snapshots in a protection domain
  • Virtual IP address, a single IP address for external access to a cluster
  • Certificate-based client authentication
  • Customised banner message in Prism
  • Enhancements to the Nutanix Prism web console
  • Expanded alert messages

NOS 3.5.3

  • Roles based access control using LDAP and Active Directory
  • Support for hypervisor lock down
  • Automatic reattachment to the Cassandra ring for replaced nodes
  • Improvements to the Stargate health monitor to minimize I/O timeouts during rolling upgrades, balance the load among nodes during failover, and facilitate high availability enhancements
  • Removal of the Avahi software dependency
  • The Nutanix SRA for VMware SRM supports vSphere 5.1 and 5.5 and SRM 5.1 and 5.5
  • NCC release 0.4.1

NOS 3.5.4

  • New entry NX-1020 platform
  • Volume Shadow Copy Service (VSS) support for Hyper-V hosts

NOS 4.0 (April 2014)

  • Feature based licensing introduced (Starter, Pro, Ultimate)
  • Disaster recovery support for Windows Server 2012 R2 Hyper-V
  • Prism Central introduced to manage and monitor multiple global clusters from one GUI
  • Automated rolling NOS upgrades
  • Automatic block awareness introduced for further data protection
  • Scheduled and remote archiving of snapshots via Protection Domains
  • Deduplication for the capacity tier
  • Amazon Web Services integration for storing remote snapshots
  • Powershell commandlets for cluster management and automation
  • Redundancy Factor 3 (RF3) introduced allowing two nodes to fail simultaneously without data risk

NOS 4.1 (September 2014)

  • Metro availability introduced enabling synchronous replication
  • Prism UI configuration of Cloud Connect for replicating VMs to Amazon Web Services
  • Improved health monitoring of data protection
  • Data at rest encryption with self-encrypting drives
  • Require setting password on first access to Controller VM and hypervisor host
  • STIG compliance for Controller VM
  • Audit trail for all user activity
  • Hypervisor upgrade from Prism UI
  • One-click upgrade for NCC utility
  • Tech preview of converged management for managing VMs on KVM
  • Support for Prism Central on Hyper-V
  • Nutanix SCOM management pack introduced
  • Nutanix plugin for XenDesktop

NOS 4.1.1

  • Network Time Protocol vulnerability fixed
  • Simplified certificate replacement in the Prism web console

NOS 4.1.2

  • Vormetric key management server support
  • Improved performance and reliability for virtual machine bootup and storage I/O in a KVM (Now called Acropolis)
  • Significantly reduced the NOS image size (from 2 GB to under 1 GB)
  • Connection-level redirection for the Acropolis hypervisor implimented

NOS 4.1.3 (June 2015)

  • Nutanix Cluster Check (NCC) version 2.0
  • Additional key management vendors for use with self-encrypted drives (SEDs)
  • Support for VMware ESXi 6.0
  • Image Service for Acropolis for importing non-Acropolis VMs
  • Synchronous Replication (SyncRep) for Hyper-V Clusters
  • Time Synchronisation Script to control time drift
  • Data at Rest Encryption for NOS clusters with Acropolis and Hyper-V hosts
  • Security timeout setting for Prism
  • Network Switch Configuration for Traffic Statistics Collection
  • Hypervisor Support for NX-6035C storage-only/compute lite node
  • Acropolis, Hyper-V, and ESXi mixed cluster support
  • Erasure Coding tech preview
  • Acropolis High Availability tech preview
  • Acropolis Volume Management tech preview

NOS 4.1.4

  • Nutanix Cluster Check (NCC) version 2.0.1.
  • Mix Asynchronous DR with Metro Availability (Synchronous DR)on the same VM.
  • Data Protection Enhancement: 1-Hour RPO performance and latency
  • Acropolis Enhancement: Restore VM Locality
  • Hypervisor Support for NX-6035C
  • Disk firmware upgrade process improvements

NOS 4.1.5.1

  • Nutanix Cluster Check (NCC) version 2.0.2.
  • SyncRep for Hyper-V Clusters Update/VSS Support

NOS 4.5.0.2 (October 2015)

  • Bandwidth Limit on Schedule for remote cluster replication

    It's evolution, baby

    It’s (Lancer) Evolution, Baby!

  • Cloud Connect for Azure
  • Common Access Card Authentication
  • Default Container and Storage Pool upon cluster creation
  • Erasure Coding
  • Hyper-V configuration through Prism Web Console
  • Image Service Now Available in the Prism Web Console
  • MPIO Access to iSCSI Disks (Windows Guest VMs)
  • Network Mapping for VMs started on a remote site
  • Nutanix Cluster Check (NCC) 2.1, which includes many new checks and functionality.
  • NX-6035C Clusters Usable as a Target for Replication
  • Prism Central support for Acropolis Hypervisor (AHV)
  • Prism Central Scalability improvements to 100 clusters and 10,000 VMs or 20,000 vDisks
  • Foundation 3.0 imaging capabilities
  • The Nutanix SNMP MIB database improvements
  • SNMP service logs are now written to the following log file:/home/nutanix/data/logs/snmp_manager.out
  • Rolling upgrades for ESXi hosts with minor release versions
  • VM High Availability in Acropolis
  • Windows Guest VM Failover Clustering
  • Self-Service File Restore tech preview

Acropolis Base Software 4.6 (formerly NOS) (Feb 2016)

  • 1-click upgrades to BIOS, BMC Firmware & Foundation via Prism
  • Windows and linux guest customisation for sysprep and cloudinit
  • Acropolis Drivers for OpenStack
  • Volume/Disk groups added to Prism and RESTful API
  • Convert Cluster Redundancy Factor from RF-2 to RF-3
  • Cross Hypervisor Disaster Recovery (prod ESXi to DR AHV for example)
  • Erasure encoding improvements for more usable capacity savings
  • Snapshot and DR for volume groups
  • CommVault integration for AHV
  • New release of Nutanix Cluster Check software
  • Nutanix Guest tools
    • Nutanix Guest Agent (NGA) service
    • File Level Restore (FLR) CLI
    • Nutanix VM Mobility Drivers (multi-hypervisor DR)
    • VSS requestor and hardware provider for Windows VMs
    • Application-consistent snapshot for Linux VMs
  • Performance increases on ALL nodes we’ve ever sold (IO, BW etc) some 400% (no BS!)
  • Self-Service Restore for end users
  • Non-disruptive VM Migration for Metro Availability Planned Failover
  • *In-place hypervisor conversion (1-click from ESXi to AHV, for example)
  • *Acropolis File Services (NAS file system all from Prism)

*Tech Preview Feature.

 

The consumable infrastructure (that’s idiot proof…)

Just give the customer what they need

Just give the customer what they need!

Over the last couple of months I’ve had my first experiences with Acropolis in the field. Both quite different but they highlighted two important design goals in the product; simplicity of management and machine migration.

Before I begin I want to take you back a few months to talk about Acropolis itself.  If you know all about that you can do two things:

  1. Skip this section and move on
  2. Go to YouTube and watch some classic Sesame Street and carry on reading with a warm glow only childhood muppets can bring.

I knew you couldn’t resist a bit of Grover but now you’re back I’ll continue.

Over the summer Acropolis gained a lot of happy customers both new and old.  In fact some huge customers were already using it since January thanks to a cunning soft release and that continues into our Community Edition too.

The main purpose of Acropolis was to remove the complexity and unnecessary management modern hypervisors have developed and to let customers take a step back and simply ask “what am I trying to achieve?”

It’s an interesting question and one that is often posed when too deeply lost down the rabbit hole.  For someone like me who used to spend far too long looking at problems with a proberbial microscope there’s a blossoming clarity in the way we approached these six words.  The journey inside Nutanix to Acropolis was achieved by asking our own question:

“For hypervisors, if you had to start again, what would you better and what would you address first?”

Our goal was to make deploying an entire virtual environment, regardless of your background and skill set, intuitive and consumable.  Our underlying goal for everything we do is simplicity and while we’ve achieved this with storage many years ago (which we call as our ‘distributed storage fabric’) the hypervisor was the next logical area to improve.

Developing our own management layer and beginning its work on top of our own hypervisor was a logical step and that’s what brought us to where we are today with the Acropolis Hypervisor.  You can see a great video walk through of the experience of setting up VMs and virtual networks in this video.

 

Anyway on to my first customer story.

Back in summer I spent time working with manufacturing company on their first virtualisation project.  They were an entirely physical setup using some reasonably modern servers and storage but due to many reasons they’d put off moving to a virtual platform for many years.  One of the most glaring reasons was one I hear a lot here as well as in my previous role at Citrix; “it worked yesterday just fine so why change?”  While this is true I could still be walking two miles to the local river to beat my clothes against rocks to clean them.  But I chose to throw them in a basket and (probably by magic) they get cleaned.  If my girlfriend is reading this, it could be my last blog…

Part of the resistance is related to human apathy but their main concern was having to relearn new skills, which takes focus and resources away from their business, and it simply being too time consuming.  I completely agreed.  They wanted simplicity.  They needed Acropolis.

Now, I could have done what many would and do a presentation, demo and finishing Q&A but I chose to handle our meeting slightly differently.  To allay their fears I let them work out how to create a network and create a new VM.  As we went I took them through the concepts of what a vCPU was and how it related to what they wanted to achieve for the business.  If someone with no virtualisation experience can use Acropolis without any training there can’t be any better sign off on its simplicity.  We were in somewhat of a competitive situation as well where ‘the others’ were pushing vCenter for all the management.  The comparison between the two was quite clear and while I’ll freely admit that feature to feature vSphere as many more strings to its bow, that wasn’t what the customer needed and isn’t the approach we are taking with the development of Acropolis.  We had no wish to just make a better horse and cart and the customer was extremely grateful for that.

One happy customer done, one to go…

Our second customer story, dear reader (because there is only one of you), was already a virtualisation veteran and had been using ESXi for a few years before they decided to renew their rather old hardware and hopefully do something different with their infrastructure.  Their existing partner, who’d been implementing traditional three-tier platforms previous to this chose to put Nutanix in front of them and see if we could ease their burden on management overhead, performance and operating expenditure.

While the simplicity of Acropolis was a great win for them and made up most of their decision it was how we migrated their ESXi VMs on to Acropolis that really struck me most and that’s what I’m going to summarise now.

This was my first V2V migration so I needed something simple as much as the customer and partner did and wow did we deliver.  Here is everything we needed to do to migrate:

  1. Setup the Nutanix cluster and first container
  2. Whitelist the vSphere hosts in Prism
  3. Mount the Nutanix container on the existing vSphere hosts
  4. Copy the VM to the Nutanix container
  5. Create a new VM is Prism and select Clone from NDFS then pick the cloned disk from step 4
  6. Start the VM and connect to the console
  7. Strip out the VMware tools
  8. Install the VirtIO drivers
  9.  Go to 4 until all other VMs are done

Now of course doing a V2V also has a few extra parts such as ensuring any interdependent services are migrated as a group but really that’s all you need to do.

The clever bit is the Image Service.  This is a rather smart subset of tools that convert disks like the vmdk in this example to ones used by Acropolis.  There’s no requirement for any other steps or management to get a VM across and the customer had their entire estate completed in an afternoon.  To me, that’s pretty damn impressive.

I’m really pleased with what engineering have done in such a short period of time and to think where this can go is quite amazing.

 

And now we come to the point explaining why I said this stuff was “idiot proof.”  I can only describe what happened as an organic fault in the system also known as a cock-up on my part.  I hold my hands up and say I was a dumb-dumb.  As HR don’t read this, and to be honest it’s just you and I anyway, I should be ok.

While we were preparing the cluster for the VM migrations I decided to upgrade the Nutanix software to the latest version and while this was progressing smoothly node by node I somehow managed to…erm…hmm…well……I sort of sent a ctrl+alt+del to the IPMI console.  Call it brain fade.  This obviously rebooted the very host it was upgrading right in the middle of the operation.  After a lot of muttering and baritone swearing I waited for the node to come back up to see what mess I had created…

Here’s where engineering and all our architects need a huge pat on the back.  All I had to do was restart genesis on the node and the upgrade continued.  What makes this even more amazing is that while I was mashing the keyboard to self destruction the partner was already migrating VMs – during my screw up the migration was already in progress!  If I’d have done this to any other non-Nutanix system on the planet it would have been nothing short of catastrophic.  However, in this case there was no disruption, downtime and if I hadn’t let off a few choice words at myself nobody would have known.  That is frankly amazing to me and shows just how good we’ve designed our architecture.

So how can I summarise Acropolis?  It (and Nutanix) isn’t just a consumer-grade infrastructure, it’s also idiot proof and I for one am very grateful for it 🙂

© 2017 Nutanix Noob

Theme by Anders NorenUp ↑