Category: Acropolis

The Nine Tiers of Desktop Hell

I started playing Minecraft again recently and I frankly love the complexity of building a new home with multiple levels, different types of foundations, wall types, flooring from different woods, torches and all manner of other decorations. There’s a tonne of work involved building something big and elegant but it only takes one of those bloody creeper monster things to walk over, blow it up and I’m off building it again. Very annoying. You know what I’ve found out? I had too many floors. I faff around putting together a house with so many levels only to realise I just need a small and simple base with some chests to keep my stuff in and a bed. And a sword or two in case those monsters come knocking again.

Tiers suck. Complexity sucks.

I’ve banged on about the problems I’ve encountered with 3-tier architectures on here and at nutanixnoob.com but allow me a moment to wind you back a few years to where my career started because today dear reader – one day there may be two of you – we’re going to talk about End User Computing, Virtual Workspaces, Desktop Virtualisation, Application Mobility or as you may also know it “VEE to the DEE to the EYE.”

Whatever you call the practice of centralising apps, desktops and data the job is the same; broadcast them to any device over any network to any location and get my apps and data to end users quickly and without friction. This will increase employee mobility, security and in turn make businesses and people more productive. That’s generally page one of the requirements and outcomes report but a beautiful vision can quickly become a mirage as the technology selection comes into focus.

Back at Citrix I have to admit that I didn’t take too much notice of what our software sat on. If it was HP servers on top of a NetApp Filer with some Atlantis in the middle well that was just “other stuff” – out of sight and, for better or for worse, out of my mind. I was viewing the projects from a very narrow angle and it was only when customers uttered the words “my Citrix is slow” did I start disproving my software wasn’t at fault – and it never was, by the way.

Now, while working from underneath Citrix at the Nutanix layer, I have a more rounded appreciation for the complexity that sits behind the end user’s screen and let me tell you it’s a world of problems navigated in part by best judgement and in many ways masked by luck.

The Traditional Stack

This picture represents the layers of a virtual desktop and application stack. If you’re not familiar with these let’s break them down into what they do and a give few examples of what vendors you’ll see in each layer.

Delivery Protocol

A delivery protocol is what connects the virtual desktop to an end user device like a thin client or laptop over a network. Smart protocols will also allow things like local printers, USB passthrough and make the desktop look, sound and perform nice and snappy – just like a local desktop. Citrix ICA, Microsoft RDP and Teradici PCoIP are popular examples of protocols in use today.

Secure Remote Access

Secure Remote Access is how users access their desktops and apps from outside of the company network. Typically, this is a virtual or physical network appliance from someone like Juniper, F5, Cisco or Citrix although disruptors like zScaler have come up in recent years. It’s a network security device that will grant or deny access but also enforce various policies along the way so end users access what they need with an extra blanket of security around them. It’s critical this works in a ‘sense and respond manner’ with the next layer otherwise end users might have to choose which VPN to launch or application to use based on where they are. Choices they simply shouldn’t care about.

Desktop & App Delivery

Desktop & App delivery represents connection brokering and desktop image management. These decide what desktops and apps a user can launch or in some cases what they can see upon logon. It also deals with the management of desktop creation, updates and enforcing or working with policies dictated by the Secure Remote Access solution and the Delivery Protocol. Citrix, VMware, Microsoft, Workspot and a host of smaller players flood this market. Nutanix recently bought Frame which is a Desktops as a Service product. Again this part has to work with all the above but also relies heavily on integration with the next layer so let’s move on.

Profile Management

Next up we get into the world of personalising the experience for the end user with Profile Management. This is a centrally managed service that gives all those desktops and applications user-specific settings depending on who logs in. It could be simple things like a desktop background or more complex application settings such as dictionaries in Office. Essentially this is all the personal settings a user will change (and expect to be there!) when they start using their desktop and applications. Pre-VDI this was all saved locally on a laptop or desktop but now we need to ensure a user’s ‘personality’ gets applied to any desktop or app they log on to. Out in the wild you might have come across tools from Ivanti (Appsense and RES in their pre-acquisition lives) to do such a task. We also need to think about what storage platform the user profiles sit on. NetApp tends to be the one I see the most but all storage vendors have a separate NAS platform to offer. It’s no good putting it on a cheap array of disks because user profiles are key to logon times, application performance and application stability.

Virtualisation

Virtualisation means the hypervisor. The broker mentioned above has to talk to a hypervisor to instruct it to create and update the virtual machines. This is the engine that runs the desktops and apps so it’s important that it’s able to drive required performance from the hardware and complete tasks such as evenly balancing load across all servers, move VMs between hosts using live migration, restoring VMs to service, make efficient use of compute resources and generally keep the valuable services running and available to the end users. This market is split between VMware, Microsoft, Citrix, Nutanix and several open source hypervisors like KVM and Xen.

Servers

Servers provide the hardware compute components. The CPUs and memory are allocated into neat virtual chunks by the hypervisor. This is considered a commodity by most, you can strongly argue the hypervisor is too, but it’s critical that it’s designed and validated with the hypervisor otherwise all sorts of problems will surface causing downtime, excessive patching and finger pointing between vendors. There are dozens of server manufacturers out there from HP, DellEMC, Cisco, Lenovo, Fujitsu etc. The internals are similar Intel CPUs but validating solutions for each one and testing components correctly is still a mammoth task and they all do it very well.

Storage

The Storage layer is the enterprise shared storage that gives the hypervisor and thus the virtual machines their ability to move between physical servers – called live migration – but also features such as snapshotting to save the state of VMs or as part of DR. There are probably hundreds of companies out there do this but the main ones would be HP, DellEMC, NetApp and Pure. Sizing this accordingly for a VDI environment is critical as it’s generally the first part of the stack to show up bottlenecks. For 3-tier architectures it has to also be sized for the maximum number of desktops up-front to avoid scalability issues further down the line.

Data Protection

Data Protection refers to backup, rollback or disaster recovery of a VM. If a desktop becomes corrupted or gets deleted by mistake businesses need a quick way to restore it back to service. In architectures that require two or more data centres for availability disaster recovery scenarios also fall under the Data Protection layer. Think of this as disaster recovery from VM to site level. VEEAM and Commvault are common examples of tools that can provide this advanced functionality over and above any native snapshot tools in the storage layer

Networking

Networking is a huge area. At this layer in the EUC stack we’re referring to visualising, segregating and securing network traffic between subnets or networks, including the internet, and the virtual desktop and application servers. Most organisations will do this using firewalls running from within the VMs like Windows Firewall or on the outside of the datacentre using purpose build firewall appliances from the likes of Palo Alto Networks. The issue here is VM can be compromised from the OS, especially Windows, and a perimeter firewall will only operate as the doorman checking names at the door. It won’t keep checking what’s coming into or out of each VM. This is where tools such as VMware NSX have used Microsegmentation to provide VM level firewalling.

There are obviously a load of other bits within the desktop image itself to consider but I’m going to put that to one side as that’s a science by itself and your attention span is already being challenged by my awful writing style. To be honest I’ve named so many different vendors up there I hope you can see where the cost and complexity can quickly come into play. Imagining all the combinations of layers and vendors and deciding which ones will play nicely with each other – while giving that expectant user the best desktop possible – is a very daunting task indeed.

Sadly it’s a common headache.

I’ve worked with many customers where all 9 layers are from different vendors each with their own management, support contract and in some unfortunate cases their own agendas. Getting the technologies to line up for the benefit the end user was an everlasting experience many would prefer to avoid. Even when some tiers were consolidated on things like vBlocks the stars would still lose their alignment due to patches at one tier having to wait for validation at another. The goalposts were always moving and just when one part was completed another would need attention. It actually looks and behaves like a game of Jenga and is no less frustrating to play at times but then again I’m crap at Jenga.

When I saw Nutanix for the first time back in early 2014 the first thing that struck me was what a good fit it was for Citrix customers and over the last few years both companies have worked together to make end user computing an easier and more predictable workload to deploy.

The Nutanix+Citrix Stack

Let’s take a look at what those 9 tiers look like when only Nutanix and Citrix are involved.

Before we get into the guts it’s important to say that neither of us are removing the need for any of those layers but now but they are delivered together, by just two vendors, without complexity and friction.

In short the top bit is designed and engineered to work with the bottom bit which is a huge step forward vs the home-brew method you saw earlier.

No more guesswork and certainly no more finger pointing between vendors or, worse still, internal teams.

Citrix takes care of the delivery protocol through to the user profile management and Nutanix the virtualisation down to the VM network security. Each part shaking hands with the next.

Let’s look at the layers again and see what Citrix and Nutanix bring to this now unified stack.

Delivery Protocol

ICA is widely accepted to be the pinnacle of Delivery Protocols. End user experience is what separates Cirtix from all others in their space. Over the years they’ve lead innovations for low latency networks, virtualising highly graphical applications and offering this on a multitude of client devices. All the features and functionality of ICA were branded HDX but to the hardcore, like me, it’ll always be ICA. First reader to comment on what ICA stands for wins some stickers.

Secure Remote Access

Citrix ADC or, as the T-shirt I have from 2005 says, NetScaler was a massive and positive acquisition by Citrix to move into the networking market. This is a physical or virtual appliance that provides secure policy based access to networks, desktops and applications. It works so seamlessly with the rest of the Citrix suite it’s effectively a set and forget. Did you know Sunil Potti, Nutanix Chief Product & Development officer, used to run the NetScaler engineering teams for Citrix?

Desktop and App Delivery

Citrix Desktops and Citrix Apps used to be called XenDesktop and XenApp and before that Presentation Server, Metaframe (when I started), WinFrame, WinView, Multi-win, Citrix Multiuser… They’ve been doing end user computing since 1993 and have considerable pedigree to say the least. As with ICA, this is the best platform for managing and deploying a wide variety of desktops and applications to end users. The same design-first thinking the end users benefit from is felt by the administrators on the backend as the platform is easy to use even with minimal training.

Profile Management

Citrix integrates their own Profile Management tool as part of Citrix Apps and Desktops. This is cunningly called Citrix Profile Management. It has a utilitarian name because it’s frankly a great utilitarian product with little fuss to be seen. Profiles are managed and maintained centrally with a simple policy ran from with the Citrix broker and applied to Desktops an Apps. These are also called upon via Citrix ADC to enforce or adapt settings depending on user location.

Another component of the Nutanix piece is the introduction of Nutanix Files. Don’t confuse this with the recently re-named Citrix Files (formerly ShareFile) although they can work together. This is a distributed virtual filer for user profiles and home directories to sit on the Nutanix cluster. No more separate NAS to support and manage. One benefit customers might see is faster logon times simply because the user profile aren’t coming from a constrained NAS and over time the data will be localised meaning files are read at bus speed rather than over the network.

Again, Citrix deals with the actual user profile management between desktops – critical for the success of any EUC project – and Nutanix provides the scalable and highly available distributed file system to host them.

Virtualisation

The Nutanix underpinnings for our EUC stack begin with the Virtualisation layer. AHV (Acropolis Hypervisor) is unique to Nutanix with its roots in open source KVM. Back in 2015 Nutanix took the engine and important guts of KVM and layered on a fully distributed control plane to simplify the management of a previously tricky, but highly stable and robust hypervisor. Joint support for all Citrix EUC applications means every Citrix component we’ve discussed is fully supported and backed up with best practice guides. Admins benefit from the simplicity of AHV while also using it as the single management portal for all Nutanix software. I like to think of AHV as being the way virtualisation would be done today if we had to start again. Modern, lightweight and purpose build for all workloads.

Servers

Hardware platforms haven’t changed that much from the 3-tier example we looked at earlier with the exception that Nutanix personally certifies each platform and the components used. The big architectural difference is we use the locally attached disks to form the shared storage rather than just using them for compute. Nutanix ships their own hardware with various Intel and commodity components inside them. Customers can also chose servers from OEM partners DellEMC and Lenovo or certified platforms from HPE and Cisco. The insides of all hardware platforms are subject to our own stringent testing so the experience for the end customer is the same as is the software and choice here is only a positive.

Storage

Storage is the easiest conversation we’ll have in this blog because Nutanix storage management is as close to hands-off as possible. Nutanix brings all the benefits of enterprise shared storage into what’s called a hyperconverged appliance – virtualisation, compute and storage together. All the disks in each server form one large storage cluster which is then presented to the hypervisor as shared storage. All administrators need do is turn on efficiency policies for compression and dedupe. That’s it. For customers this means storage administration is reduced to mere minutes, if at all, and they retain everything they had before such as live migration, dedupe, compression, thin provisioning and snapshots. The Nutanix software takes care of all the typically manual or intensive tasks in the background so virtual desktop and apps are fast, efficient and always available. Needless to say this supports provisioning technologies such as MCS (and PVS if you’re so inclined).

All workloads benefit from predictable performance and linear scale because the Nutanix architecture but this is especially useful in EUC because end users will detect the slightest change in performance long before you do. All worries should be put to bed and it’s thanks to the unique and patented beauty of Data Locality. This simple concept of keeping the hot data on the same node as the VM requesting it means that we can confidently state how many desktops per node, what their performance will be and then no matter how many desktops we add to the cluster over time the user experience will remain consistent. I can’t stress how unique and important this is. It makes it very easy to predict cost per user as well of course!

It’s pretty cool that Nutanix designed a platform with this feature at its core. Nearly 9 years later it’s still the biggest advantage we bring to our customers.

Data Protection

Data Protection on Nutanix covers both data availability within the local cluster to withstand hardware and software failures and also how the data is replicated between clusters for disaster recovery purposes. For persistent desktops these can be replicated between geographical clusters by adding them to protection domains. These are groups of VMs treated to a scheduled snapshot and replication policy. For non-persistent desktops the master image can be snapshot’d (is that a word yet?) and replicated to another site. This can be used to maintain a single gold image in an active active scenario. Nutanix can also replicate many to many so for customers with several datacenters a DR plan can be in place to match business requirements. To simplify DR further, Nutanix have demonstrated an up-coming runbook feature that will automate failover and DR between sites. Using Citrix ADC to front the remote connections users can get load balanced to the right location for reasons of availability or proximity. Smart, huh?

Networking

We can’t discuss any form of EUC without touching on security. One of the biggest reasons to centralise desktops and data are to give organisations more control over who can access certain resources. However simply moving desktops into your datacentre doesn’t mean there isn’t more to do.

Nutanix released a product called Flow earlier in 2018. This is network micro-segmentation tool native to AHV and adds a layer of security that’s quickly becoming the norm. Remember that the vast majority of malware and attacks come from within the network and if a customer is planning to centralise thousands of desktops back into the datacentre it’s even more important to take a long hard look at what VMs can talk to what services. The last thing you want to centralise is a trojan horse. End users aren’t stupid but everyone can get tricked with a dodgy link or open a file they shouldn’t.

Flow is a simple, transparent VM based firewall that will gracefully lock down and secure communications to and from any VM running on the cluster. Set once via a policy and you’re done. Admins can also use this to view network communication down to the port level so if a VM does get infected or some other rouge element on the network tries to do unsavoury things it’s easy to spot. Note this all works on a whitelist so you only open the doors you want your users to walk through.

But there’s more…!

Let’s take a closer look at some of the other integration on the Citrix side starting with Citrix Director. This is where the majority of Citrix troubleshooting and performance information is kept. How is ICA performing, what about the logon times and their breakdown? What processes are hurting that formerly perfect desktop deployment? Nutanix adds VM IOps, storage bandwidth and storage latency into Director so desktop admins have a detailed view of the stack from top to bottom. We don’t expect to be the source of the problem but it’s good to see where the problem is not, right?

In Citrix Studio, where desktop groups are created and assigned, you’ll find Nutanix as a new host connection where we become a new platform to connect to – this is to support AHV. Simply install the Nutanix plug-in onto all the Controllers and you’ll then be able to select Nutanix AHV and connect to the cluster VIP before starting to provision desktops and apps. As you can see in the pic below we’re using the Provisioning SDK that in turn talks to our Rest API. Very simple and invisible to the customer.

One of the latest innovations from Citrix is being able to remove the need for any on-prem Citrix management components. As an option, customers can chose to use Citrix Cloud to host Studio, Director and StoreFront and all the other sub components like SQL. The pains of managing and maintaining that infrastructure are offloaded and delivered back as a service and we’re seeing a lot of take up for this on our side.

If customers chose Citrix Cloud rather than building the management servers themselves deploying desktops to the on-prem Nutanix Cluster is exclusively done using Machine Creation Services. If you believe the rumours MCS has all sorts of scalability issues but this simply isn’t accurate. It didn’t scale when using a SAN because, being storage based, it could only perform well while the SAN wasn’t under stress or heaven forbid serving more desktops that it was designed to – ya know, unpredictable scalability…

The really swish part comes when customers want to spin up more VMs for things like seasonal events such as Black Friday that require more desktops but maybe for a short period of time. I’ll be the first person to tell you that buying more Nutanix nodes for a temporary requirement is a waste of money so why not use a public cloud for those elastic workloads? Citrix Studio connects to Azure and deploys desktops and applications just as easily as they do to an on-prem Nutanix cluster. That hybrid cloud story you’ve been hearing about is already here.

Here are a couple of pictures to illustrate what I’m talking about:

So what are we really doing here?

I’ve spent a lot of time talking about two technology companies but EUC, or whatever we settle on calling it, is about people. Get a desktop experience wrong and your project will fail. End users will push back and ask for the big clunky physical desktop again. Users don’t want to be exposed to technology unnecessarily they simply want to embrace the outcome of a well managed and well presented end user experience.

To me, this is the most impressive achievement Citrix and Nutanix have built together. To ensure user experience is fast and consistent for users and to allow organisations to build smarter, work smarter and keep the wizard’s curtain closed.

Cheers all – and by all I mean just you.

David

 

“This blog was proof read and approved by K.Baggerman :D”

Network Automation – “The Last Mile”

With so many organisations looking to increase their ability to react to business change and continually do more with ever reducing resources, automation is the only way to solve the challenge. Nutanix has pioneered the simplification of traditional datacentre infrastructure from compute, to storage and virtualisation but many customers I speak to ask about the network.

The Dynamic Duo

Network automation appears to the be the “last mile” in their journey to a fully automated datacentre and with the SDN market place rather fragmented it’s tough for organisations to pick a solution which completes the loop.

Many organisations are also embracing a DevOps methodology to improve the processes around development and release management of new and existing applications, ultimately driving their innovation goals – with that comes the requirement to provision infrastructure in rapid time.

The public cloud has provided a great benchmark for witnessing what can be achieved through automation. Let’s face it before AWS came along how long did it take to deploy a virtual machine, on a new network within a new datacentre…..a long time. You’d spend a huge amount of time just ensuring that you had compatible kit, let alone the process of deploying hypervisors, their supporting management infrastructure, provisioning and connecting storage environments etc.….with public cloud that is all abstracted away which enables businesses to move faster.

Nutanix aims to solve the rapid deployment challenge and on-going scaling requirements whilst ensuring that “day 2” operations are also streamlined, just like in the public cloud where the infrastructure building blocks are invisible. To aid in this journey Nutanix have partnered with Mellanox to provide the automation and simplification of “day 2” operations for common network tasks to complete the loop.

Mellanox are a leading supplier of end-to-end Ethernet and InfiniBand intelligent interconnect solutions and services for servers, storage, and hyper-converged infrastructure.

Mellanox switches have a REST-based API called NEO which enables tasks such as VLAN provisioning and trunking on the appropriate ports utilised by the Nutanix nodes. This enables consumers of the Nutanix Enterprise Cloud Platform to forget about VLAN provisioning requests, as these are automatically setup and migrated as VM’s move within the Nutanix infrastructure, ultimately ensuring that applications get access to the appropriate networks to communicate. This enables developers and operations teams to concentrate on delivering real business value and get on with developing the next business defining application!

Here are a couple of video’s walking through the integration. In the first example a VM will be migrated from Node A to Node B, as we automate the configuration of the VLAN on the Mellanox switches VLAN’s are only configured as required – in real-time, rather than trunking all existing VLAN’s on all ports.

In the second example we create a new VM within the Nutanix Prism console, just like the previous example the combination of Prism and NEO take care of the VLAN provisioning task ensuring that the consumer of the Enterprise Cloud Platform can get on with doing just that – consuming it, just like in the public cloud.

If you would like to know more about Nutanix and how we deliver and Enterprise Cloud Platform, check out our website; https://www.nutanix.com/what-we-do/

If you would like to find out more about Mellanox and their intelligent interconnect solutions, take a look at their website; http://www.mellanox.com/page/company_overview

Thanks for reading

Stuart

XenDesktop on Nutanix CE

Want to see how to install, deploy and update 99 desktops on an Intel NUC running CE while playing a game and getting education on how long a proper cup of tea takes to brew?  Of course!

One of the best things I was involved in while at Citrix was seeing the evolution of the flagship product.  From Metaframe 1.8 when I started to XenDesktop 7 when I left we always drove towards simplicity for the end user and eventually the admin.  It’s this simple idea of taking away complexity and either replacing it with something easy and intuitive or just making once manual tasks automated and invisible.

With XenDesktop 7 Citrix made great steps with Machine Creation Services and right from the start I’ve been a vocal supporter because it fit the beliefs of Citrix so well.  Nutanix brings this simplicity to another level by ensuring that not only is MCS easy to deploy but it’s also predictable and scalable – something that it has struggled with – much in the same way as linked clones did with Horizon View.

Over the last week or so I’ve been playing with my new Intel NUC and seeing what our free Community Edition can do.  I’ve completed some simple provisioning tests because I was naturally curious as to how quickly a little home lab system can spin up desktops.  The speed, as you’ll see below, is rather impressive but in this post I’m going to show you how easy it is to integrate XenDesktop and any Nutanix deployment running our own hypervisor AHV.  The steps you see below are identical to how a full production Nutanix cluster would work so let me take you from zero to hero in 16 minutes.  You’ll see what components need installing on the broker and how to set up a connection to, in this example, a single Nutanix Community Edition node.

In case you’re interested my NUC is a Skull Canyon model with two SSDs and 32GB RAM and was a lovely present from Intel for Nutanix being so bloody awesome.

If you like what you see and would like to try Nutanix CE out yourself then go to this link and register with your work email address: https://www.nutanix.com/products/register

There’s no music or voiceover so pick your favourite SAN killing tune, open a bottle of beer and enjoy one of the most poorly put together videos on the internet not involving cats.

Thanks for watching.

 

David

Sock stuffing

socksnake

For a while now the metrics most infrastructures, including Nutanix, are benchmarked against is IOps – effectively the speed the storage layer can take a write or read request from an application or VM and reply back.  Dating back to the (re)birth of SANs when they began running virtual machines and T1 applications this has been the standard for filling out the shit vs excellent spreadsheet that dictates where to spend all your money.

Recently thanks to some education and a bit of online pressure from peers in the industry, synthetic testing with tools like IOmeter have generally been displaced in favour of real-world testing platforms and methodology.  Even smarter tools such as Jetstress doesn’t give real world results because it focuses on storage and not the entire solution.  Recording and replaying operations to generate genuine load and behaviour is far better. Seeing the impact from the application and platform mean our plucky hero admin can produce a recommendation based on fact rather than fantasy.

Synthetic testing is basically like stuffing a pair of socks down your pants; it gets a lot of attention from superficial types but its only a precursor to disappointment later down the line when things get serious.

In this entry I want to drop into your conscious mind the idea that very soon performance stats will be irrelevant to everyone in the infrastructure business.  Everyone.  You, me, them, him, her, all of us will look like foolish dinosaurs if we sell our solutions based on thousands of IOps, bandwidth capacity or low latency figures.

“My God tell me more,” I hear (one of) you (mumble with a shrug).  Well consider what’s happened in hardware in the last 5ish years just in storage.  We’ve gone from caring about how fast disks spin, to what the caching tier runs on, to tiering hot data in SSD and now the wonders of all-flash.  All in 5 or so years.  Spot a trend?  Bit of Moore’s Law happening?  You bet, and it’s only going to get quicker, bigger and cheaper.  Up next new storage mediums like NVMe and Intel’s 3D XPoint will move the raw performance game on even further, well beyond what 99% of VMs will need.  Nutanix’s resident performance secret agent Michael Webster (NPX007) wrote a wonderful blog about the upcoming performance impacts this new hardware will have on networking so I’d encourage you to read it.  The grammar is infinitely better for starters.

So when we get to a point, sooner than you think, when a single node could rip through >100,000 IOps with existing generations of Intel CPUs and RAM where does that leave us when evaluating platforms?  Not synthetic statistics that’s for sure.

Oooow IO!

Oooow IO!

By taking away the uncertainty of application performance almost overnight we can start reframe the entire conversation to a handful of areas:

Simplicity

Scalability

Predictability

Insightfulness

Openness

Delight

Over the next few weeks (maybe longer as I’m on annual leave soon) I’m going to try to tackle each one of these in turn because for me the way systems are evaluated is changing and it will only benefit the consumer and the end customer when the industry players take note.

Without outlandish numbers those vendors who prefer their Speedos with extra padding will quickly be exposed.

See you for part 1 in a while.

A dip into Prism

A few weeks ago I was given the lovely task of attending a meeting at the last minute with no preparation time and a 3 hour drive just after I got back from annual leave.  The meeting was only for an hour so I decided to record a short 10 minute video in the morning to take them through what they’d actually be doing on a Nutanix cluster from day to day.  Knowing the type of customer I knew there would be no internet connection let alone a 4G signal.

I could have just given a normal powerpoint pitch and sent them back to sleep on a beach (which is where I still wanted to be) but I wanted to keep them awake and also elevate the conversation away from dull stuff like hardware and storage.  Usability, simplicity and time to value was the intention here so click below and leave a comment if it made sense to you.  No voice over as I’m too cheap to buy a program for my Mac that’ll do it 🙂

Do the Evolution

*** Updated for v4.6 ***

evoluOver the last 18 months I’ve seen some amazing innovations come into the Nutanix platform but I’ve only personally seen half of the story.  Before I joined we made some staggering strides and I’d like to take you through those today.

Below are some abbreviated entries from all of the release nodes dating back to NOS 2.6 back in January 2013.  I’ve highlighted some of the ones I consider to be important milestones in bold but these are open for discussion and I’m probably wrong anyway 🙂

In this short plagiarised post I wanted to illustrate what can be achieved when approaching a problem with a software first mentality and riding the wave of Moore’s Law.  While we’ve brought on new hardware models, ditched Fusion-IO cards for SSDs and partnered with Intel to make it all sing a sweet tune the biggest strides have been made in our famous non-disruptive rolling software upgrades.  Whether you bought a node this year or two years ago all of these features should be available to you.

The next time your SAN vendor (or any vendor) claims they’re constantly adding value to their customers get them to put together something like this post because it’s only when you look back do you appreciate how much you’ve already accomplished.

 

 

NOS 2.6 (January 2013)

  • Genesis, a new management framework that replaces scripts run from the vMA, which is no longer required.
  • Support for 2nd-generation Fusion-io cards.

NOS 2.6.3

  • Nutanix Complete Cluster 2.6.3 supports vSphere 5.1.

NOS 2.6.4

  • Support for Intel PCIe-SSD cards is available as a factory-installed option.

NOS 3.0 (September 2013)

  • VM-centric backup and replication
  • Local and remote backup of VMs.
  • Bidirectional replication.
  • Planned and emergency failover from one site to another.
  • Consistency groups of multiple VMs that have snapshots made at the same time.
  • Scheduling, retention, and expiration policies for snapshots.
  • Compression
  • Inline compression of containers.
  • Post-process compression of containers with a configurable delay.
  • Support for NX-3000
  • Dual 10 GbE network interfaces.
  • Higher maximum memory configuration.
  • Intel Sandy Bridge CPUs.
  • Improved hardware replacement procedures.
  • CentOS for Controller VM.
  • Adherence to requirements specified in the U.S. Defense Information Systems Agency (DISA)
  • Security Technical Information Guides (STIGs).

NOS 3.1 (January 2014)

  • New entry NX-1000 series platform
  • New deep storage NX-6000 series platform
  • New higher performance model in the NX-3050 series.
  • ESX 5.1 support
  • Mixed nodes in a cluster

3.5 (December 2014)

  • New HTML5 based administration interface
  • Active Directory/LDAP authentication
  • Introduction of RESTful API
  • User-configurable policies for frequency and alert-generating events
  • Expanded alert messages
  • Support for user-provided SSL certificates
  • User-manageable SSH keys and Controller VM lock down
  • SNMPv3 support and Nutanix MIB
  • Faster display of real-time data
  • More intuitive nCLI command syntax and enhanced output
  • Deduplication of guest VM data on the hot tiers (RAM/Flash)
  • Optimization of linked clones
  • Container and vDisk space reservations
  • Compression of remote replication network traffic
  • Automatically add new disks to single storage pool clusters
  • Storage Replication Adapter (SRA) for VMware Site Recovery Manager
  • General availability of KVM hypervisor
  • Technology preview of Hyper-V
  • Automated metadata drive replacement
  • Improved resiliency in cases of node or metadata drive failure
  • Field installation of replacement nodes

NOS 3.5.1

  • Support for the new NX-7000 (GPU platform), NX-6020, NX-6060, NX-6080, NX-3060, and NX-3061 models
  • Support for vSphere 5.5
  • Analysis dashboard expanded list of monitored metrics
  • DR dashboard expanded protection domain details
  • Storage dashboard deduplication summary
  • Application consistent snapshots

NOS 3.5.2

  • Support for Windows Server 2012 R2 Hyper-V
  • Support for application consistent snapshots in a protection domain
  • Virtual IP address, a single IP address for external access to a cluster
  • Certificate-based client authentication
  • Customised banner message in Prism
  • Enhancements to the Nutanix Prism web console
  • Expanded alert messages

NOS 3.5.3

  • Roles based access control using LDAP and Active Directory
  • Support for hypervisor lock down
  • Automatic reattachment to the Cassandra ring for replaced nodes
  • Improvements to the Stargate health monitor to minimize I/O timeouts during rolling upgrades, balance the load among nodes during failover, and facilitate high availability enhancements
  • Removal of the Avahi software dependency
  • The Nutanix SRA for VMware SRM supports vSphere 5.1 and 5.5 and SRM 5.1 and 5.5
  • NCC release 0.4.1

NOS 3.5.4

  • New entry NX-1020 platform
  • Volume Shadow Copy Service (VSS) support for Hyper-V hosts

NOS 4.0 (April 2014)

  • Feature based licensing introduced (Starter, Pro, Ultimate)
  • Disaster recovery support for Windows Server 2012 R2 Hyper-V
  • Prism Central introduced to manage and monitor multiple global clusters from one GUI
  • Automated rolling NOS upgrades
  • Automatic block awareness introduced for further data protection
  • Scheduled and remote archiving of snapshots via Protection Domains
  • Deduplication for the capacity tier
  • Amazon Web Services integration for storing remote snapshots
  • Powershell commandlets for cluster management and automation
  • Redundancy Factor 3 (RF3) introduced allowing two nodes to fail simultaneously without data risk

NOS 4.1 (September 2014)

  • Metro availability introduced enabling synchronous replication
  • Prism UI configuration of Cloud Connect for replicating VMs to Amazon Web Services
  • Improved health monitoring of data protection
  • Data at rest encryption with self-encrypting drives
  • Require setting password on first access to Controller VM and hypervisor host
  • STIG compliance for Controller VM
  • Audit trail for all user activity
  • Hypervisor upgrade from Prism UI
  • One-click upgrade for NCC utility
  • Tech preview of converged management for managing VMs on KVM
  • Support for Prism Central on Hyper-V
  • Nutanix SCOM management pack introduced
  • Nutanix plugin for XenDesktop

NOS 4.1.1

  • Network Time Protocol vulnerability fixed
  • Simplified certificate replacement in the Prism web console

NOS 4.1.2

  • Vormetric key management server support
  • Improved performance and reliability for virtual machine bootup and storage I/O in a KVM (Now called Acropolis)
  • Significantly reduced the NOS image size (from 2 GB to under 1 GB)
  • Connection-level redirection for the Acropolis hypervisor implimented

NOS 4.1.3 (June 2015)

  • Nutanix Cluster Check (NCC) version 2.0
  • Additional key management vendors for use with self-encrypted drives (SEDs)
  • Support for VMware ESXi 6.0
  • Image Service for Acropolis for importing non-Acropolis VMs
  • Synchronous Replication (SyncRep) for Hyper-V Clusters
  • Time Synchronisation Script to control time drift
  • Data at Rest Encryption for NOS clusters with Acropolis and Hyper-V hosts
  • Security timeout setting for Prism
  • Network Switch Configuration for Traffic Statistics Collection
  • Hypervisor Support for NX-6035C storage-only/compute lite node
  • Acropolis, Hyper-V, and ESXi mixed cluster support
  • Erasure Coding tech preview
  • Acropolis High Availability tech preview
  • Acropolis Volume Management tech preview

NOS 4.1.4

  • Nutanix Cluster Check (NCC) version 2.0.1.
  • Mix Asynchronous DR with Metro Availability (Synchronous DR)on the same VM.
  • Data Protection Enhancement: 1-Hour RPO performance and latency
  • Acropolis Enhancement: Restore VM Locality
  • Hypervisor Support for NX-6035C
  • Disk firmware upgrade process improvements

NOS 4.1.5.1

  • Nutanix Cluster Check (NCC) version 2.0.2.
  • SyncRep for Hyper-V Clusters Update/VSS Support

NOS 4.5.0.2 (October 2015)

  • Bandwidth Limit on Schedule for remote cluster replication

    It's evolution, baby

    It’s (Lancer) Evolution, Baby!

  • Cloud Connect for Azure
  • Common Access Card Authentication
  • Default Container and Storage Pool upon cluster creation
  • Erasure Coding
  • Hyper-V configuration through Prism Web Console
  • Image Service Now Available in the Prism Web Console
  • MPIO Access to iSCSI Disks (Windows Guest VMs)
  • Network Mapping for VMs started on a remote site
  • Nutanix Cluster Check (NCC) 2.1, which includes many new checks and functionality.
  • NX-6035C Clusters Usable as a Target for Replication
  • Prism Central support for Acropolis Hypervisor (AHV)
  • Prism Central Scalability improvements to 100 clusters and 10,000 VMs or 20,000 vDisks
  • Foundation 3.0 imaging capabilities
  • The Nutanix SNMP MIB database improvements
  • SNMP service logs are now written to the following log file:/home/nutanix/data/logs/snmp_manager.out
  • Rolling upgrades for ESXi hosts with minor release versions
  • VM High Availability in Acropolis
  • Windows Guest VM Failover Clustering
  • Self-Service File Restore tech preview

Acropolis Base Software 4.6 (formerly NOS) (Feb 2016)

  • 1-click upgrades to BIOS, BMC Firmware & Foundation via Prism
  • Windows and linux guest customisation for sysprep and cloudinit
  • Acropolis Drivers for OpenStack
  • Volume/Disk groups added to Prism and RESTful API
  • Convert Cluster Redundancy Factor from RF-2 to RF-3
  • Cross Hypervisor Disaster Recovery (prod ESXi to DR AHV for example)
  • Erasure encoding improvements for more usable capacity savings
  • Snapshot and DR for volume groups
  • CommVault integration for AHV
  • New release of Nutanix Cluster Check software
  • Nutanix Guest tools
    • Nutanix Guest Agent (NGA) service
    • File Level Restore (FLR) CLI
    • Nutanix VM Mobility Drivers (multi-hypervisor DR)
    • VSS requestor and hardware provider for Windows VMs
    • Application-consistent snapshot for Linux VMs
  • Performance increases on ALL nodes we’ve ever sold (IO, BW etc) some 400% (no BS!)
  • Self-Service Restore for end users
  • Non-disruptive VM Migration for Metro Availability Planned Failover
  • *In-place hypervisor conversion (1-click from ESXi to AHV, for example)
  • *Acropolis File Services (NAS file system all from Prism)

*Tech Preview Feature.

 

The consumable infrastructure (that’s idiot proof…)

Just give the customer what they need

Just give the customer what they need!

Over the last couple of months I’ve had my first experiences with Acropolis in the field. Both quite different but they highlighted two important design goals in the product; simplicity of management and machine migration.

Before I begin I want to take you back a few months to talk about Acropolis itself.  If you know all about that you can do two things:

  1. Skip this section and move on
  2. Go to YouTube and watch some classic Sesame Street and carry on reading with a warm glow only childhood muppets can bring.

I knew you couldn’t resist a bit of Grover but now you’re back I’ll continue.

Over the summer Acropolis gained a lot of happy customers both new and old.  In fact some huge customers were already using it since January thanks to a cunning soft release and that continues into our Community Edition too.

The main purpose of Acropolis was to remove the complexity and unnecessary management modern hypervisors have developed and to let customers take a step back and simply ask “what am I trying to achieve?”

It’s an interesting question and one that is often posed when too deeply lost down the rabbit hole.  For someone like me who used to spend far too long looking at problems with a proberbial microscope there’s a blossoming clarity in the way we approached these six words.  The journey inside Nutanix to Acropolis was achieved by asking our own question:

“For hypervisors, if you had to start again, what would you better and what would you address first?”

Our goal was to make deploying an entire virtual environment, regardless of your background and skill set, intuitive and consumable.  Our underlying goal for everything we do is simplicity and while we’ve achieved this with storage many years ago (which we call as our ‘distributed storage fabric’) the hypervisor was the next logical area to improve.

Developing our own management layer and beginning its work on top of our own hypervisor was a logical step and that’s what brought us to where we are today with the Acropolis Hypervisor.  You can see a great video walk through of the experience of setting up VMs and virtual networks in this video.

 

Anyway on to my first customer story.

Back in summer I spent time working with manufacturing company on their first virtualisation project.  They were an entirely physical setup using some reasonably modern servers and storage but due to many reasons they’d put off moving to a virtual platform for many years.  One of the most glaring reasons was one I hear a lot here as well as in my previous role at Citrix; “it worked yesterday just fine so why change?”  While this is true I could still be walking two miles to the local river to beat my clothes against rocks to clean them.  But I chose to throw them in a basket and (probably by magic) they get cleaned.  If my girlfriend is reading this, it could be my last blog…

Part of the resistance is related to human apathy but their main concern was having to relearn new skills, which takes focus and resources away from their business, and it simply being too time consuming.  I completely agreed.  They wanted simplicity.  They needed Acropolis.

Now, I could have done what many would and do a presentation, demo and finishing Q&A but I chose to handle our meeting slightly differently.  To allay their fears I let them work out how to create a network and create a new VM.  As we went I took them through the concepts of what a vCPU was and how it related to what they wanted to achieve for the business.  If someone with no virtualisation experience can use Acropolis without any training there can’t be any better sign off on its simplicity.  We were in somewhat of a competitive situation as well where ‘the others’ were pushing vCenter for all the management.  The comparison between the two was quite clear and while I’ll freely admit that feature to feature vSphere as many more strings to its bow, that wasn’t what the customer needed and isn’t the approach we are taking with the development of Acropolis.  We had no wish to just make a better horse and cart and the customer was extremely grateful for that.

One happy customer done, one to go…

Our second customer story, dear reader (because there is only one of you), was already a virtualisation veteran and had been using ESXi for a few years before they decided to renew their rather old hardware and hopefully do something different with their infrastructure.  Their existing partner, who’d been implementing traditional three-tier platforms previous to this chose to put Nutanix in front of them and see if we could ease their burden on management overhead, performance and operating expenditure.

While the simplicity of Acropolis was a great win for them and made up most of their decision it was how we migrated their ESXi VMs on to Acropolis that really struck me most and that’s what I’m going to summarise now.

This was my first V2V migration so I needed something simple as much as the customer and partner did and wow did we deliver.  Here is everything we needed to do to migrate:

  1. Setup the Nutanix cluster and first container
  2. Whitelist the vSphere hosts in Prism
  3. Mount the Nutanix container on the existing vSphere hosts
  4. Copy the VM to the Nutanix container
  5. Create a new VM is Prism and select Clone from NDFS then pick the cloned disk from step 4
  6. Start the VM and connect to the console
  7. Strip out the VMware tools
  8. Install the VirtIO drivers
  9.  Go to 4 until all other VMs are done

Now of course doing a V2V also has a few extra parts such as ensuring any interdependent services are migrated as a group but really that’s all you need to do.

The clever bit is the Image Service.  This is a rather smart subset of tools that convert disks like the vmdk in this example to ones used by Acropolis.  There’s no requirement for any other steps or management to get a VM across and the customer had their entire estate completed in an afternoon.  To me, that’s pretty damn impressive.

I’m really pleased with what engineering have done in such a short period of time and to think where this can go is quite amazing.

 

And now we come to the point explaining why I said this stuff was “idiot proof.”  I can only describe what happened as an organic fault in the system also known as a cock-up on my part.  I hold my hands up and say I was a dumb-dumb.  As HR don’t read this, and to be honest it’s just you and I anyway, I should be ok.

While we were preparing the cluster for the VM migrations I decided to upgrade the Nutanix software to the latest version and while this was progressing smoothly node by node I somehow managed to…erm…hmm…well……I sort of sent a ctrl+alt+del to the IPMI console.  Call it brain fade.  This obviously rebooted the very host it was upgrading right in the middle of the operation.  After a lot of muttering and baritone swearing I waited for the node to come back up to see what mess I had created…

Here’s where engineering and all our architects need a huge pat on the back.  All I had to do was restart genesis on the node and the upgrade continued.  What makes this even more amazing is that while I was mashing the keyboard to self destruction the partner was already migrating VMs – during my screw up the migration was already in progress!  If I’d have done this to any other non-Nutanix system on the planet it would have been nothing short of catastrophic.  However, in this case there was no disruption, downtime and if I hadn’t let off a few choice words at myself nobody would have known.  That is frankly amazing to me and shows just how good we’ve designed our architecture.

So how can I summarise Acropolis?  It (and Nutanix) isn’t just a consumer-grade infrastructure, it’s also idiot proof and I for one am very grateful for it 🙂

© 2018 Nutanix Noob

Theme by Anders NorenUp ↑