Second Life code issues and scalability

At the beginning of the year, I started to play Second Life again. Yes, the same one by Linden Lab and the same one that a virtual world that operates within our reality. This is the game where you can be anyone, do anything, and it even has its own commerce and stock exchange. Where the Grid is like the Matrix where you’re tied to a world upon many many servers that are interconnected.
But lately, there has been some issues. Issues with code release and testing.


Scalability:
Here at VTOReality, we find that Linden has been looking to increase the ceiling for maximum players in the Grid. That would be simultaneous players. Currently the capacity is at 100,000 at a time. Yet for the same hardware, Blizzard’s World of Warcraft can probably handle quite a bit more. Is there a problem with scalability? I would think so.
They do make a good point though:

“We’re swapping engines out at 40,000 feet while still flying,” Miller says.

But doesn’t this happen to all MMORPG type games? They upgrade on the fly. I used to play one called Silk Road. In beta, they only had six or seven grids that handled thousands of players. Definitely something to think about.
Code release and process:
I hate to say it, but the process they go through internally isn’t the most efficient. And this is not knowing anything about their process at all. The reason I say this is because of the recent patches and backouts that have happened in the last few weeks.
From the Grid perspective, the timeline follows as such:

  • New code released.
  • Strange errors occur. Troubleshooting begins.
  • Patch released.
  • Errors continue to occur.
  • Patch backed out.
  • Error found. Fix applied.

Now from a perspective of code here, if there are errors that you’re not seeing on a test grid, then your test suite isn’t robust enough. Second, if new code creates errors, then it’s better to rollback to an older version. That’s what versioning is for. Stability for end users trumps any new functionality. That goes for any industry.
Lastly, the errors in process that I just dictated for testing? It’s part of freshman year of computer science. Usually in your very first class. That’s why it’s difficult to understand why there are issues with robust testing. Of course, I’ve spoken to both Microsoft and Mathworks engineers before and you would be surprised how some corporations cut corners in testing and some are very good at keeping their code clean.
As a user, I find the above very annoying when there are multiple hard reboots without stability. As a fellow developer, I find that the process is very amateurish. I understand that you can’t have it all, and some developers do things differently. But having multiple hard reboots a week outside of the usual maintenance window is very unnerving. Perhaps the software reliability and process will become more robust in the weeks to come. But if it goes down the current path, frustration from users will occur and it won’t be a friendly sight.
For now, I’ll continue to play. But be forewarned that if I’m able to foresee such things, then there are also others out there that feel the same. Fix your processes before a massive exodus begins for another bigger and better world.