Category: Customer

Sock stuffing

socksnake

For a while now the metrics most infrastructures, including Nutanix, are benchmarked against is IOps – effectively the speed the storage layer can take a write or read request from an application or VM and reply back.  Dating back to the (re)birth of SANs when they began running virtual machines and T1 applications this has been the standard for filling out the shit vs excellent spreadsheet that dictates where to spend all your money.

Recently thanks to some education and a bit of online pressure from peers in the industry, synthetic testing with tools like IOmeter have generally been displaced in favour of real-world testing platforms and methodology.  Even smarter tools such as Jetstress doesn’t give real world results because it focuses on storage and not the entire solution.  Recording and replaying operations to generate genuine load and behaviour is far better. Seeing the impact from the application and platform mean our plucky hero admin can produce a recommendation based on fact rather than fantasy.

Synthetic testing is basically like stuffing a pair of socks down your pants; it gets a lot of attention from superficial types but its only a precursor to disappointment later down the line when things get serious.

In this entry I want to drop into your conscious mind the idea that very soon performance stats will be irrelevant to everyone in the infrastructure business.  Everyone.  You, me, them, him, her, all of us will look like foolish dinosaurs if we sell our solutions based on thousands of IOps, bandwidth capacity or low latency figures.

“My God tell me more,” I hear (one of) you (mumble with a shrug).  Well consider what’s happened in hardware in the last 5ish years just in storage.  We’ve gone from caring about how fast disks spin, to what the caching tier runs on, to tiering hot data in SSD and now the wonders of all-flash.  All in 5 or so years.  Spot a trend?  Bit of Moore’s Law happening?  You bet, and it’s only going to get quicker, bigger and cheaper.  Up next new storage mediums like NVMe and Intel’s 3D XPoint will move the raw performance game on even further, well beyond what 99% of VMs will need.  Nutanix’s resident performance secret agent Michael Webster (NPX007) wrote a wonderful blog about the upcoming performance impacts this new hardware will have on networking so I’d encourage you to read it.  The grammar is infinitely better for starters.

So when we get to a point, sooner than you think, when a single node could rip through >100,000 IOps with existing generations of Intel CPUs and RAM where does that leave us when evaluating platforms?  Not synthetic statistics that’s for sure.

Oooow IO!

Oooow IO!

By taking away the uncertainty of application performance almost overnight we can start reframe the entire conversation to a handful of areas:

Simplicity

Scalability

Predictability

Insightfulness

Openness

Delight

Over the next few weeks (maybe longer as I’m on annual leave soon) I’m going to try to tackle each one of these in turn because for me the way systems are evaluated is changing and it will only benefit the consumer and the end customer when the industry players take note.

Without outlandish numbers those vendors who prefer their Speedos with extra padding will quickly be exposed.

See you for part 1 in a while.

The consumable infrastructure (that’s idiot proof…)

Just give the customer what they need

Just give the customer what they need!

Over the last couple of months I’ve had my first experiences with Acropolis in the field. Both quite different but they highlighted two important design goals in the product; simplicity of management and machine migration.

Before I begin I want to take you back a few months to talk about Acropolis itself.  If you know all about that you can do two things:

  1. Skip this section and move on
  2. Go to YouTube and watch some classic Sesame Street and carry on reading with a warm glow only childhood muppets can bring.

I knew you couldn’t resist a bit of Grover but now you’re back I’ll continue.

Over the summer Acropolis gained a lot of happy customers both new and old.  In fact some huge customers were already using it since January thanks to a cunning soft release and that continues into our Community Edition too.

The main purpose of Acropolis was to remove the complexity and unnecessary management modern hypervisors have developed and to let customers take a step back and simply ask “what am I trying to achieve?”

It’s an interesting question and one that is often posed when too deeply lost down the rabbit hole.  For someone like me who used to spend far too long looking at problems with a proberbial microscope there’s a blossoming clarity in the way we approached these six words.  The journey inside Nutanix to Acropolis was achieved by asking our own question:

“For hypervisors, if you had to start again, what would you better and what would you address first?”

Our goal was to make deploying an entire virtual environment, regardless of your background and skill set, intuitive and consumable.  Our underlying goal for everything we do is simplicity and while we’ve achieved this with storage many years ago (which we call as our ‘distributed storage fabric’) the hypervisor was the next logical area to improve.

Developing our own management layer and beginning its work on top of our own hypervisor was a logical step and that’s what brought us to where we are today with the Acropolis Hypervisor.  You can see a great video walk through of the experience of setting up VMs and virtual networks in this video.

 

Anyway on to my first customer story.

Back in summer I spent time working with manufacturing company on their first virtualisation project.  They were an entirely physical setup using some reasonably modern servers and storage but due to many reasons they’d put off moving to a virtual platform for many years.  One of the most glaring reasons was one I hear a lot here as well as in my previous role at Citrix; “it worked yesterday just fine so why change?”  While this is true I could still be walking two miles to the local river to beat my clothes against rocks to clean them.  But I chose to throw them in a basket and (probably by magic) they get cleaned.  If my girlfriend is reading this, it could be my last blog…

Part of the resistance is related to human apathy but their main concern was having to relearn new skills, which takes focus and resources away from their business, and it simply being too time consuming.  I completely agreed.  They wanted simplicity.  They needed Acropolis.

Now, I could have done what many would and do a presentation, demo and finishing Q&A but I chose to handle our meeting slightly differently.  To allay their fears I let them work out how to create a network and create a new VM.  As we went I took them through the concepts of what a vCPU was and how it related to what they wanted to achieve for the business.  If someone with no virtualisation experience can use Acropolis without any training there can’t be any better sign off on its simplicity.  We were in somewhat of a competitive situation as well where ‘the others’ were pushing vCenter for all the management.  The comparison between the two was quite clear and while I’ll freely admit that feature to feature vSphere as many more strings to its bow, that wasn’t what the customer needed and isn’t the approach we are taking with the development of Acropolis.  We had no wish to just make a better horse and cart and the customer was extremely grateful for that.

One happy customer done, one to go…

Our second customer story, dear reader (because there is only one of you), was already a virtualisation veteran and had been using ESXi for a few years before they decided to renew their rather old hardware and hopefully do something different with their infrastructure.  Their existing partner, who’d been implementing traditional three-tier platforms previous to this chose to put Nutanix in front of them and see if we could ease their burden on management overhead, performance and operating expenditure.

While the simplicity of Acropolis was a great win for them and made up most of their decision it was how we migrated their ESXi VMs on to Acropolis that really struck me most and that’s what I’m going to summarise now.

This was my first V2V migration so I needed something simple as much as the customer and partner did and wow did we deliver.  Here is everything we needed to do to migrate:

  1. Setup the Nutanix cluster and first container
  2. Whitelist the vSphere hosts in Prism
  3. Mount the Nutanix container on the existing vSphere hosts
  4. Copy the VM to the Nutanix container
  5. Create a new VM is Prism and select Clone from NDFS then pick the cloned disk from step 4
  6. Start the VM and connect to the console
  7. Strip out the VMware tools
  8. Install the VirtIO drivers
  9.  Go to 4 until all other VMs are done

Now of course doing a V2V also has a few extra parts such as ensuring any interdependent services are migrated as a group but really that’s all you need to do.

The clever bit is the Image Service.  This is a rather smart subset of tools that convert disks like the vmdk in this example to ones used by Acropolis.  There’s no requirement for any other steps or management to get a VM across and the customer had their entire estate completed in an afternoon.  To me, that’s pretty damn impressive.

I’m really pleased with what engineering have done in such a short period of time and to think where this can go is quite amazing.

 

And now we come to the point explaining why I said this stuff was “idiot proof.”  I can only describe what happened as an organic fault in the system also known as a cock-up on my part.  I hold my hands up and say I was a dumb-dumb.  As HR don’t read this, and to be honest it’s just you and I anyway, I should be ok.

While we were preparing the cluster for the VM migrations I decided to upgrade the Nutanix software to the latest version and while this was progressing smoothly node by node I somehow managed to…erm…hmm…well……I sort of sent a ctrl+alt+del to the IPMI console.  Call it brain fade.  This obviously rebooted the very host it was upgrading right in the middle of the operation.  After a lot of muttering and baritone swearing I waited for the node to come back up to see what mess I had created…

Here’s where engineering and all our architects need a huge pat on the back.  All I had to do was restart genesis on the node and the upgrade continued.  What makes this even more amazing is that while I was mashing the keyboard to self destruction the partner was already migrating VMs – during my screw up the migration was already in progress!  If I’d have done this to any other non-Nutanix system on the planet it would have been nothing short of catastrophic.  However, in this case there was no disruption, downtime and if I hadn’t let off a few choice words at myself nobody would have known.  That is frankly amazing to me and shows just how good we’ve designed our architecture.

So how can I summarise Acropolis?  It (and Nutanix) isn’t just a consumer-grade infrastructure, it’s also idiot proof and I for one am very grateful for it 🙂

Size matters but it’s ok to be smaller

Here’s a short post without the usual rubbish I write.  Imagine that!  Well, my dinner is nearly ready, it’s a Friday, I’ve got 4 cold beers in the fridge and the daughter is trying to eat crayons.

I’ll be brief.

I’m part way through designing an infrastructure for a new customer who is looking to replace 39 IBM hosts and a whole mess of SANs and associated fabric.  In total they need five racks to put it all together while consuming 120TB of usable storage.  It’s an old environment and is ready for the future.

For legacy reasons they split their environment into three separate ESX clusters; one for the DMZ, one for SQL and one for the remaining production VMs.

Ignoring requirements for DR for a moment here’s what I need to put all that together and for your viewing pleasure I’m going to show you the output of our new sizing tool that we at Nutanix and our partners use to figure out what fits best.

Remember we have 5 racks filled with crap to take out.

To keep with with the separation the customer wants I’ve done three designs but they can all be part of the same cluster and will all fit into a single rack too.

Below you can see the specs of each cluster and the amount of VMs each one needs to support.  I’ve included the rack size just for giggles 🙂

 

Main VMware Cluster

Main VMware cluster sizing

Main VMware rack

SQL VMware Cluster

SQL VMware cluster

SQL VMware Rack

DMZ VMware Cluster

DMZ VMware Cluster

DMZ VMware Rack

 

So there you go.  5 racks down to 20U.

I’ll add some more notes to this about the various models and you can probably tell the the first cluster is using our new compute-lite 6035-C KVM nodes to bump up the total amount of storage.  We’re doing this because they need far more storage than compute and to add more nodes just wouldn’t make commercial sense.  But that’s the beauty of Nutanix, you just add what you need.

 

Anyway dinner and beer is calling.  Enjoy and stop buying SANs, for your own sake.

 

Quality assured, not assumed.

hackintosh-dell-mini-10v

Wow, bet that runs just like Steve intended!

There are two trains of thoughts in the world of hyper convergence.  One is to own the platform and provide an appliance model with a variety of choices for the customer based on varying levels of compute and storage.  Each box goes through thousands of hours of testing both aligned to and independent of the software that it powers.  All components are beaten to a pulp in various scenarios, ran to death and performance calibrated and improved at every step.  Apple has done this from its inception and has developed a vastly more reliable and innovative platform than any PC since.  Yes I’m a fanboy…

The other train is one that can (and has) been quickly derailed.

You create a nice bit of software, one that you also spend thousands of hours building and testing but when it comes to the platform you allow all manner of hardware as its base.  Processor, memory, manufacturer all are just names at this stage.  vSAN started its HCL last year as a massive Excel spreadsheet filled with a huge variety of tin most of which was guesswork and it showed by how that spreadsheet was received by the community.   Atlantis USX also uses a similar approach.  Choice of a thousands of flavours is great if you’re buying yogurt but not so good when your business relies on consistency and predictability – oh and a fast support mechanism from your vendor.  You can imagine the finger pointing when something goes wrong…

It’s the software that matters, of course, and while this statement is correct it’s only a half truth.

Unless you can accurately test and assure every possible server platform from every manufacturer your customers use then the supportability of the platform (that’s the hardware plus the software) is flawed.  If you can somehow do the majority you’re still in for a world of pain.  Controllers on the servers may differ.  Some SSDs may provide different performance in YOUR software regardless of their claimed speeds.  Suddenly the same software performs differently across hardware that is apparently the same.

98c20_sds-nutanix-bezel-v3-620x200

At Nutanix we’ve provided cutting-edge hardware from small footprint nodes to all-flash but never once have we not known the performance and reliability of our platform before it leaves the door and is powered up by a customer.  You can read about all six hardware platforms here.  When we OEM’d our software to Dell we gave the same level of QA to the HC appliances too.

We know our hardware platform and ensure that it works with the hypervisors we support.  We then know our software works with those hypervisors.  We own and assure each step to provide 100% compatibility.  If you’re just the software on top, you have thousands of possible permutations to assure.  Sorry I mean assume.

We own it all from top to bottom and the boxes, regardless of their origin or components, are 100% Nutanix.  This is how we can take and resolve support questions and innovate within the platform without external interference.  Customers love the simplicity of the product as you probably know but their is an elegance in also displaying a structured yet flexible hardware platform.  Ownership is everything.

I’ve lost count of the flack I’ve taken by “not being software only” as that’s “the only way to be truly software defined.”

What bollocks.

It is the software that matters but if as a company you cannot fully understand the impact your software has on the hardware it must run on then the only person you’re kidding is yourself and more worryingly the first person it hurts is your customer.

Let’s see who else follows the leader once again.

“Simplicity is the ultimate sophistication”

…so said Leonardo Da Vinci.  Why make things hard?  Why make your own life harder when with some effort everything can be simplified and better.  This is our approach and it touches Nutanix employees as well as our customers.Leon says relax

The one thing that still staggers customers about Nutanix is how flipping easy it is to get blocks installed into their environment.  To give you an idea of how I do it and the time it takes from getting a completely blank (or even previously configured proof of concept box) installed, take a look at this:

  1. Image the server.  We and our partners use a tool called Foundation which pushes a vanilla image of ESX, Hyper-V or KVM down to the nodes with Nutanix software configured. This takes 50 mins because we do it over a 1GB switch and is automated.
  2. Configure the new cluster.   Here we just add in IP addresses for the hosts, management ports and Nutanix CVM. This takes 2 mins via an intuitive webpage or you can do it via the NCLI if you want to appear a true geek.
  3. Create the storage pool.  Another 14 seconds.
  4. Create the first container, its policies and present to hosts.  This final stage took me 22 seconds.

That’s it.

Total time for me is under an hour.  Total time for the customer under 10 minutes (if you include racking time!)

Now we’re at the stage where the customer can start building VMs (aka doing the things that matter) while the infrastructure becomes invisible – just as it should be.

At no point do they need to revisit the storage other than to create new containers or change policies.

Go order something from VCE, NetApp, EMC or any SAN and try doing the same thing.  In fact try to do the same thing with any hyper converged competitor as well.

This is the power of simplicity and it’s only going to get easier for our customers.

“Simplify, then add lightness”

Colin Chapman

I’ve been working with an organisation for the past few months and I’m pleased to say they’re now a Nutanix customer and on the way to becoming a case study too.  They face challenges that more traditional customers would never experience but are critical hurdles to the way they run their business, so I thought you’d like to know a bit more about it and why Nutanix was such a great fit for them.

Colin Chapman, who founded Lotus (no they’re not the customer in question!) in 1948, had a brilliant way of describing his philosophy for motor cars:  “Simplify, then add lightness.”  It typified the approach Chapman took to achieve maximum performance without the anchors opposition cars were still proverbially dragging around behind them on road and track.

In many ways this reflects how Nutanix operates as well; we cut away the fat from administration, unnecessary complexity from the architecture and use commodity components (a bit like an Esprit then?) to ensure the best performance and reliability (so not an Esprit after all…)

Anyway, back to my customer.  They operate out of a site here in the UK but have ‘remote offices’ across the world.  The big difference to the shops you and I have worked in is that their IT systems are constantly traversing the globe.  Nothing stays still for long and while they’re online they have to be at maximum performance for a wide variety of workloads while also being in constant communication back to the UK.  It’s a very testing environment to say the least.  Before we put our arm around them they were a NetApp shop with all of the usual bits of hardware, administration and software costs that go along side it.  They also used the SAN to host their virtual machines on (I know, how 2003) which was just not giving them the performance they needed.

The first task was to show how we stacked up against their current architecture so I built a small three node cluster in around 10 minutes (including making two cups of tea) at their headquarters and within half an hour we had VMs migrated and running IOmeter and SQL benchmarking.

“It’s around ten times faster,” they said, after a week of testing.  Not bad for a single 2U block, huh?

By the way, setting up IOmeter in our distributed architecture is a bit different to the usual way of hammering a SAN so if you’re interested to know how just let me know and I’ll see if I can publish it.

So performance was a given and something that I had very little worry about and through these tests and a few failure simulations (cable kicks!) we showed them that our software could be relied upon in a far more volatile environment than a cosy datacenter.  We began to talk about the remote sites and that’s where the fun began and, coming back to todays theme, why I’m talking about lightness.

Exactly half of their Nutanix investment was to be in transit  and for the pleasure of doing so the customer is charged around $290 per KG by the shipping companies.  That’s a very expensive bag of sugar, don’t you think?  Now, imagine three racks filled with UPSs, switches, fabric, servers, NetApp controllers and disk shelves.  How many bags of sugar would you need to buy to balance all that?  A hell of a lot.

What if you could remove more than half of that weight instantly?

What if by using Nutanix you could save $150,000+ per year just in transportation costs?

That tastes rather sweet, don’t you agree?

The staggering fact is that by saving such a massive sum of money, in just three years the cost of the Nutanix solution is practically zero.

Ten times the performance, zero single points of failure, no limitations of scale and a constantly evolving platform to meet the needs of today and and the demands of tomorrow.

For nothing.

*** UPDATE ***

The customer in question is Williams Martini Racing and here’s a link to the case study.

© 2017 Nutanix Noob

Theme by Anders NorenUp ↑