This is the first in a series of articles that discuss the benefits Moving Content out of SQL Server. This article discusses pairing SQL Server to a file system; the next discusses pairing to an archive followed by an entry about using a traditional ECM solution. It wraps up with an overview of the pros and cons of each including not putting it there in the first place.
When Microsoft first announced EBS and RBS it looked like we should just all consider writing file system providers that would take the Binary Large OBjects (BLOBs) from SQL Server and dump them on a local file system. I’m sure some people did just that but you run out of benefits pretty quickly. You can eke out the value by using virtualized file systems but still the challenges addressed are pretty limited. The EBS/RBS approach does give us a 100% non-invasive solution which our SharePoint end users love but it doesn’t give us everything.
Let’s consider four different file system-based approaches…
Store the unstructured content on a local file system.
To be fair, this does deal with the SQL Server “bloat” issues that many customers seem to be concerned about; SQL backups will become more manageable and SQL will scale to support more centralized deployments. I’ve seen SQL instances shrink down to 20% of their original size when the BLOBs are externalized. It also can improve performance when handling large files because it relieves the IO bottlenecks to/from SQL Server. The problem is that you don’t get any other benefits from externalizing the content. It just moves from SQL Server to the local file system…oh, and your restores just became a little more complicated because the file system backup and the SQL Backup need some ‘synchronicity’.
Store the unstructured content on a virtualized file system and perform hardware-level de-duplication.
Imagine that you used the same method as outline above but that the file system that you wrote the BLOBs to was actually an intelligent storage device that was capable of de-duplicating those BLOBs. This means that if you have multiple copies of the same objects in SharePoint (common with Microsoft Office documents especially with versioning switched on) you will reduce your storage requirements. Looks like a nice little bonus saving in disk space which it is but be warned that all of your BLOBs are going to one tier of storage.
Store the unstructured content on a virtualized file system and perform software-level HSM
OK, this time imagine that the file system that you wrote your BLOBs to was actually a file system emulator. It looks like an NFS or CIFS file system but really it is a piece of software that can take any files and store them behind the scenes on different storage devices. Now you are starting to see some value – not just dealing with the SQL Server bloat issues, you are doing real live hierarchical storage management. Bottom line is that content can be moved from high speed, high availability and high cost storage to less expensive and less performant devices. If the physical storage device does de-duplication then there’s a bonus savings there too.
The downside is that typically you don’t really have enough information from simple file system attributes to do anything other than the crudest management. You are restrained to “it is more than 6 months old” or “it is a PDF rendition and less than 20K”…not exactly 21st century policy enforcement – but the ROI can be attractive.
Store the unstructured content on a virtualized file system and push out to on/off premise cloud storage
So, 6 months ago I would have said that this idea was pie in the sky…but it turns out that it is actually water vapor in the sky. Imagine that in the scenario painted above some of the tiers of storage were cloud storage devices – on-premise, off-premise or a combination of both. So the content might be stored on high speed local storage for the first 3 months, then for the next 6 months it moves to your on-premise cloud and then it moves to off-premise cloud storage.
You have similar limitations as I outline in the example above – you don’t have too much metadata to work from but you do have the word “cloud” both in the product description and in your resume. That’s why I’d do it anyway.
Conclusion
Using real and/or virtualized file systems with EBS or RBS gives you a solution that is transparent to your SharePoint users, relatively inexpensive to implement with a well defined and demonstrable ROI. However, it is very limited, it really only supports HSM with very basic policy management. Companies try to milk an extra mile or so out of this technology by implementing solutions that use extended file system attributes but there’s only so far you can take these solutions.
Note that considering writing BLOBs to the file system replacing the objects with shortcuts in SharePoint is the worst of both worlds because it gives you the limited benefits of just using a file system but the downsides of an invasive solution.
What’s next?
In the next post I’ll look at pairing SharePoint to an archive solution. This brings a more holistic, intelligent and cost effective approach to the issue.