This is an exclusive interview with a Lead Engineer Dave Shanley who was at the very beginning of the EVO:RAIL process.
I'm very glad that Dave was so kind and has accepted to reply to few questions as I thought that it would be cool to get also some informations from behind the scenes – from the engineering perspective.
Before VMworld San Francisco I have assisted to a presentation of EVO:RAIL where he first presented the solution together with other engineers, to bloggers and analysts. The detailed article about what EVO:RAIL does can be found in this post – VMware EVO:RAIL – New Hyper-converged Solution by VMware.
Without further wait, here is the interview with Dave Shanley:
Q1:Hi Dave, could you introduce yourself and tell us how the process of the creation of the UI for EVO RAIL has started, and which technologies were used during the building process?
Sure, well my name is Dave Shanley, I am the Lead Engineer and Software Architect of the EVO:RAIL platform. The UI is just one component of the system – EVO:RAIL is an entire engine that is integrated directly into VMware’s vSphere technology stack, specifically designed to simplify and speed up the complex process of configuring and managing the infrastructure. I started working on the initial prototypes in February 2013. It was just myself and Mornay Van Der Walt (VP, Emerging Solutions) on the project at the time. He brought this idea of ‘software that makes it as easy to deploy our technology in an appliance as it was to deploy a DVR or home router’. I spent about 3 months looking at our core tech, building prototypes, tinkering with architectures and experience until something clicked. It snowballed from there really, I was the only engineer on the project for a good 10 months. I pulled in the best engineers I could from the company and hired in some crack shot hackers to help me pull my ideas together. After many (many) late nights, long weekends and lots of swearing and starting again… we knew we had nailed it.
In terms of technology, EVO:RAIL is built in Java & Python, the core platform runs Java, our zeroconf subsystems and automation frameworks all operate in Python. The core platform is based on Spring and uses a pure RESTful JSON based API for all UI control and WebSockets for all UI messaging.
Q2:The Main reason to have this simplified interface for configuring of the hyper-converged solution is simplicity. We could see in the presentation that the installation took less than 15 min. I really liked the error handling operations, where the IPs are checked etc.
Yeah, the idea is to STOP a user from breaking the system. I mean what is the point if you spent so much time crafting such a simplified experience and abstracting away endless complexity – if you’re just going to allow your end user to screw it all up? The goal of the EVO experience is to help you in every way possible, showing you why things won’t work and telling you how to fix them (and where to go to fix them) instead of dumping a cryptic error message in your lap. The error handling was one of my primary concerns when I initially designed the system. The errors had to be useful, they had to tell me in plain speak what was wrong and how it needed to be fixed.
Q3:During the presentation we had the question about Marvin. How that came in and why finally the change for EVO: RAIL
MARVIN was the original code name for the product. The name was actually picked as a bit of fun (but had a strategic meaning) by a Director who has since moved on, but the name stuck. I built the robot mascot logo and iterated on it. You can still see the big ‘M’ in his body, however it’s now been evolved to show off our core brand ‘VM’. We faced a number of discussions and debates about the name – Corporate marketing wanted a family brand and after many iterations and discussions and proposals, Mornay and I finally decided that EVO was most suitable, Marketing agreed and the deal was done.
Q4:Which hardware you were using for testing? Many storage startups uses Supermicro hardware, could you give us some details and have you had any challenges during the building process?
We have been testing on all our OEM partner hardware (those that are different). The bulk of our testing has been performed on Intel hardware, as this is the foundation for a couple of our OEM partners (including EMC). But we have our own lab room here at HQ that’s been full of hardware. We’ve had to sit next to it for days at a time whilst we performed thousands of tests (it’s really noisy and hot). I personally spent weeks pulling out disks, network cables, power cables, simulating failures left and right.
Some of the challenges were getting our OEM build recipe correct, we would miss a small step here and there in the documentation and that would mean that when we got a new appliance from the factory – it wasn’t quite ready to be tested because we missed a step – so getting that robust and fully complete took some time. Other challenges were that we had to work with some of our OEM partners with actual hardware issues, vibrations on drives, firmware misbehaving etc. Lots of direct communication and cross engineering discussions, but we got it all worked out.
Q5: While the simplification of the UI is something that was required I feel that it’s very good option for remote locations, where usually no qualified engineers work.
That’s the idea, we don’t remove any of the power of vSphere, we don’t break any of it either. Once your appliance is configured, you can use it like you always have done. You are not required to use the EVO:RAIL management UI, you can simply use the Web Client and configure and install all of your third party appliances and plugins. The choice is really yours – it’s your hardware, it’s your software. You can use the simplified experience or the more traditional super user experience. You don’t need any training to use the simplified experience, so it’s a great alternative for smaller locations, or those with less virtual machine technical expertise.
Q6: Some story to share from the building process?
From February to April 2014 – we (as an engineering team) faced endless dead-ends. We kept on hitting brick walls every which way we turned. The reason why EVO:RAIL hasn’t existed up until this point is because of one reason: This stuff is really *really* complicated. It’s why professional services can take days, or weeks to set up, configure and install these types of infrastructure. We tried a hundred different ways to glue together technologies – many times without success. I remember getting home very late (2am sometimes) after being in the lab banging my head against a wall, frustrated because we had hit another big brick wall. Now that being said, we managed to smash most of those walls down, some of them we had to go around though. There were some really great sprints of innovation, creativity and invention though, many times my whole team were online together on Saturday night at 4am running tests and fixing bugs. We all wanted it to be perfect and we put in everything we had to make it happen.
Update: I took a picture with Dave and his team during VMworld Barcelona 2014…
Q7: I haven’t found any information concerning brown field deployments. What happens if someone wants to expand an existing vSphere installation with a EVO: RAIL. What if I there is already vCenter residing on some shared storage. What are the options here.
Well the first release of EVO:RAIL is designed to get you up and running with a working, reliable, ready to go hyper converged virtual infrastructure appliance. That being said – you could simply go in and consume the hosts with your existing VC. It would mean you’re effectively rolling your own at this point. You can consume the appliance however you like.
There are plans to work in brown field deployments from scratch, however there is a lot more work to be done on this (licensing being the key part). It turns out there is a lot of corner cases with licensing, especially with ELA licensing. I’m just going to stick with what I do best – Engineering.