I'm commenting on this because I believe their failure was due to an unnecessarily complex infrastructure. Of course, this requires a lot of conjecture on my part about an organization I know little about ... but I'm pretty comfortable making some guesses.
It's en vogue these days to build filesystems across a SAN and build an application layer on top of that SAN platform that deals with data as "objects" in a database, or something resembling a database. All kinds of advantages are then presented by this infrastructure, from survivability and fault tolerance to speed and latency. And cost. That is, when you look out to the great green future and the billions of transactions you handle every day from your millions of customers are all realized, the per unit cost is strikingly low.
It is my contention that, in the context of offsite storage, these models are too complex, and present risks that the end user is incapable of evaluating. I can say this with some certainty, since we have seen that the model presented risks that even the people running it were incapable of evaluating.
This is indeed an indictment of "cloud storage", which may seem odd coming from the proprietor of what seems to be "cloud storage". It makes sense, however, when you consider the very broad range of infrastructure that can be used to deliver "online backup". When you don't have stars in your eyes, and aren't preparing for your IPO filing and the "hockey sticking" of your business model, you can do sensible things like keep regular files on UFS2 filesystems on standalone FreeBSD systems.
This is, of course, laughable in the "real world". You couldn't possibly support thousands and thousands of customers around the globe, for nearly a decade, using such an infrastructure. Certainly not without regular interruption and failure.
Except when you can, I guess:
# uptime 12:48PM up 350 days, 21:34, 2 users, load averages: 0.14, 0.14, 0.16
(a live storage system, with about a thousand users, that I picked at random)
# uptime 2:02PM up 922 days, 18:38, 1 user, load averages: 0.00, 0.00, 0.00
(another system on the same network)
Further, the consensus might be that rsync.net is not providing "cloud storage". Good. We're not. I would be very happy if the colloquial definition of "cloud storage" did indeed imply very "sophisticated" and "modern" infrastructures, like Amazon S3 and Rackspace Cloud Files and so on. Complex, sprawling webs of fault tolerance and survivability that no team of ten people could possibly understand - and certainly not their customers.
I don't know what we'll end up being called. Offsite fileservers, maybe. Or maybe just "human beings that control computer systems" (instead of the other way around).
I have a lot to say about complexity in computer systems and the conclusions I have drawn about "fault tolerance" and things like the "five nines". I'd like to wait and present it more formally, perhaps even write a book about it. I am forced to address it today, however, when I see the worst aspect of the SwissDisk failure - their own response:
"(SwissDisk) has signed a contractual service agreement with a $40B company. For the first time this will enable SwissDisk to provide our customers with a 99.95 per cent service level agreement. SwissDisk users will enjoy the peace of mind that comes with access to a truely world class, internationally deployed, multi-million dollar, extremely resiliant storage infrastructure and internet network."
Their response to their inability to comprehend and master their own environment is to increase the complexity of that environment. Whatever layers of risk that SwissDisk customers could not possibly evaluate will now be increased by at least one order of magnitude - probably more.
We wish them luck as they climb higher up into the stratosphere of "serious" infrastructure.