I'm commenting on this because I believe their failure was due to an unnecessarily complex infrastructure. Of course, this requires a lot of conjecture on my part about an organization I know little about ... but I'm pretty comfortable making some guesses.
It's en vogue these days to build filesystems across a SAN and build an application layer on top of that SAN platform that deals with data as "objects" in a database, or something resembling a database. All kinds of advantages are then presented by this infrastructure, from survivability and fault tolerance to speed and latency. And cost. That is, when you look out to the great green future and the billions of transactions you handle every day from your millions of customers are all realized, the per unit cost is strikingly low.
It is my contention that, in the context of offsite storage, these models are too complex, and present risks that the end user is incapable of evaluating. I can say this with some certainty, since we have seen that the model presented risks that even the people running it were incapable of evaluating.
This is indeed an indictment of "cloud storage", which may seem odd coming from the proprietor of what seems to be "cloud storage". It makes sense, however, when you consider the very broad range of infrastructure that can be used to deliver "online backup". When you don't have stars in your eyes, and aren't preparing for your IPO filing and the "hockey sticking" of your business model, you can do sensible things like keep regular files on UFS2 filesystems on standalone FreeBSD systems.
This is, of course, laughable in the "real world". You couldn't possibly support thousands and thousands of customers around the globe, for nearly a decade, using such an infrastructure. Certainly not without regular interruption and failure.
Except when you can, I guess:
# uptime 12:48PM up 350 days, 21:34, 2 users, load averages: 0.14, 0.14, 0.16
(a live storage system, with about a thousand users, that I picked at random)
# uptime 2:02PM up 922 days, 18:38, 1 user, load averages: 0.00, 0.00, 0.00
(another system on the same network)
Further, the consensus might be that rsync.net is not providing "cloud storage". Good. We're not. I would be very happy if the colloquial definition of "cloud storage" did indeed imply very "sophisticated" and "modern" infrastructures, like Amazon S3 and Rackspace Cloud Files and so on. Complex, sprawling webs of fault tolerance and survivability that no team of ten people could possibly understand - and certainly not their customers.
I don't know what we'll end up being called. Offsite fileservers, maybe. Or maybe just "human beings that control computer systems" (instead of the other way around).
I have a lot to say about complexity in computer systems and the conclusions I have drawn about "fault tolerance" and things like the "five nines". I'd like to wait and present it more formally, perhaps even write a book about it. I am forced to address it today, however, when I see the worst aspect of the SwissDisk failure - their own response:
"(SwissDisk) has signed a contractual service agreement with a $40B company. For the first time this will enable SwissDisk to provide our customers with a 99.95 per cent service level agreement. SwissDisk users will enjoy the peace of mind that comes with access to a truely world class, internationally deployed, multi-million dollar, extremely resiliant storage infrastructure and internet network."
Their response to their inability to comprehend and master their own environment is to increase the complexity of that environment. Whatever layers of risk that SwissDisk customers could not possibly evaluate will now be increased by at least one order of magnitude - probably more.
We wish them luck as they climb higher up into the stratosphere of "serious" infrastructure.
SANs scare me because byzantine failures in any system connected to it can corrupt any data on the system. [Byzantine failures are those where a system doesn't simply crash & reboot, but continues to operate incorrectly. Imagine a bit corrupted in memory causing a block to be written to the wrong sector on a disk, corrupting some important meta-data, which causes other clients to start writing things in the wrong place. Ugh.]
I like rsync.net because I feel like I understand how it works and the ideas behind it. Its hard to imagine a cascading failure that takes out my main disks, in-house backups, and rsync on the same day.
Posted by: Trevor Blackwell | October 25, 2009 at 11:46 AM
Honestly SwissDisk failure haven't suprised me because :
1) I have never heard of them before this blog post
2) Even if I had I wouldn't ever signed up with them just looking at the website, it completely lacks of personality, seems like a template.
3) On that website they claim some mumbo-jumbo cloud storage but they don't explain to the end customer how it actually works, another reason to be suspicious.
Then, they (un)surprisingly lose all the user data and they are not able to explain why, so their only solution is to claim all data lost, run away, and sell the whole client base to someone like EMC (a quick search on my finance software shows that EMC is the only company in the storage industry to have a market cap near 40B$, and with Mozy they offer a service comparable to swiss disk, so it would be a match), oh great.
Posted by: Marco | October 25, 2009 at 01:15 PM
@Marco
I hadn't thought to investigate that "$40B" number ... thank you for that.
Posted by: John Kozubik | October 25, 2009 at 06:34 PM
As rsync.net lost some of my data a short while ago, I'm not surprised by the SwissDisk melodrama. However, my understanding of rsync.net's terms was that storage was offered on a best-effort basis. Because of that, I did not rely on them as primary backup, and, could restore the information they lost with modest effort. Guaranteed data integrity and confidentiality is nearly impossible to actually deliver:
1.) Consider the dollar value of my photos. They're my personal memories, and if lost by a hosting service, I would have a hard time showing monetary damages in a court of law. Even should I succeed in getting a judgment, I would still not have my photos back. Some data has value which cannot be precisely determined, and this is quite a problem because it prevents effective legal recourse should the hosting service renege on their end of the deal.
2.) Force Majure applies to most contracts with a business. If the local terrorists destroy the hosting servers, the hosting service is off the hook. However, I still have to work without the needed data.
3.) Byzantine problems like buggy firmware, etc... can silently corrupt data; worse, if you encrypt your files before storing, it's rather difficult to prove your files were corrupted, vs you simply lost the key.
4.) Unless one encrypts their files with their own crypto programs, confidentiality is out. Maybe your hosting service's custom app correctly works; maybe it doesn't leak keys; maybe it doesn't silently corrupt data; etc...
5.) If you don't encrypt, well, then, you can't really say with any certainty that no one else is looking at your data.
Rsync.net doesn't promise the impossible or make outlandish claims about the integrity or confidentiality of the data you put there. Rather to the contrary, they seem forthright and honest about what you can realistically expect from them. Would it be better if they could guarantee data integrity? Of course. But a company that would lie to me about their abilities would likewise try to weasel their way out of being held to those commitments should they lose my data. In short, it's better to be informed than blissfully unaware and unprotected from the inevitable.
Posted by: Nathan | October 25, 2009 at 08:19 PM
Indeed. My company ( http://www.geniedb.com/ ) develops a replicated high-availability database system, and we're happy to be backing up our version control repositories to rsync.net than some faceless outfit full of marketing speak and buzzwords.
Meanwhile, in my copious free time, I'm working on an sftp backend for a personal project of mine, which is an archival/backup system, called Ugarit ( http://www.kitten-technologies.co.uk/project.php?project=ugarit ); and when I've done that, I'm going to write a backend multiplexer that puts replicas of the backups onto multiple physical backend (possibly distributing the data, too, rather than maintaining full replicas). Because no one backend will ever be 100% reliable. The best you can do is to hedge your bets!
Posted by: Alaric Snell-Pym | October 26, 2009 at 04:35 AM
This is all well and good, but when a near gamma ray burst kills all our backups hosted @ rsync.net, I will be a very unhappy customer for the 5-10 days of life I have left.
Posted by: haploid | October 26, 2009 at 09:52 AM
Hello Mr. Kozubik, all, SwissDisk would like to thank you for tracking our service. Despite the unfortunate events of 10/09 we are happy to say that since the system was restored on the new platform that SwissDisk has experienced 100% uptime. The new platform was desinged by a team of leading engineers to be one of the most reliable ever built. Further SwissDisk has created a secure/private backup solution. In the extremely unlikely event of any downtime we can restore all accounts in a very short period of time. SwissDisk management and engineers are confident that anyone using SwissDisk will experience the ultimate in security and reliability. News:
.... SwissDisk has recently released our new Multi-User feature. This paid feature allows a account holder to create a folder and assign a unique password to that folder. The account holder can also create a folder named "Shared" that will allow all sub-account users to share files. For more information please see the Multi-User tab at the top right of your My SwissDisk page. Also note, the SwissDisk site graphics, etc will be updated very soon. Thank you for sharing your blog, we appreciate honest web reporting and views.
Posted by: mmmbuzz | June 11, 2010 at 11:12 PM