This is part three of a thrilling series of entries related to the aggregation of SharePoint content. It relates back to reference architecture #7, RBS vs. EBS vs. Content Transfer vs. Shortcuts and An Overview of the Potential Solutions. In the previous entry I rambled on ab out the different options available to support the aggregation of data behind a SharePoint deployment however I left you hanging on for more about two recently added technologies. In this entry I’ll go in to more details about these sexy new options, why you might care and the pros and cons of each. I'll do my best to be technically accurate but bear in mind that I am a manager now so just stringing together non-monosyllabic works is difficult enough for me. My main concern is not the actual implementation details; I am more interested in whether either solution might deal with the inherent business problems.
RBS and EBS Review
The Problem...SharePoint stores everything related to an object in SQL Server. This includes the:
-
Content, (PDFs, PPTs, Zip files, etc.),
-
Metadata, (the object's title, project number, format, etc.)
-
Context, (which site it came from, the folder location, security details, etc.)
#2 and #3 belong in SQL Server because they are represented by structured content. Storing #1 in a database is a travesty of the highest order. I've seen system architects tarred and feathered for doing this. Why? Databases excel at managing lots of ickle bits of data but they suck when it comes to managing large binary objects, (called BLOBs - Binary Large OBjects). Go read more about these issues in the Eight Reference Architectures series. I have seen estimates that suggest that up to 96 petabytes of data will be archived from SharePoint instances over the next 5 years - that's 96 quadrillion bytes of data...not in to a database me thinks!
The solution...Bottom line, you have to get the binary objects out of the database. Not necessarily out of SharePoint but out of SQL Server. In the previous entry I mentioned 5 ways of doing this but did not dig in to options #4 and #5 - RBS and EBS.
RBS and EBS are both pretty new technologies. Given that they have very similar names it is not surprising that people get them confused so here's a primer:
RBS is implemented by SQL Server (only SQL Server 2008 and later); it is nothing to do with SharePoint directly. When you enable RBS, all BLOB streams that SQL Server would normally be compelled to store internally are spewed forth to the file system.
EBS is implemented by MOSS 2007 (available as a hot fix to MOSS 2007 SP1 and later). The EBS provider lives at the very bottom of the SharePoint stack, just above the interface in to SQL Server. Just before the BLOB is passed to SQL server the EBS provider gives your process the opportunity to optionally take ownership of the BLOB. You give SharePoint a token in exchange so it knows how to get the object back from you at a later date.
RBS vs. EBS…
There are pros and cons to both approaches and the balance will change over time according to the SharePoint product plans that we know of. Let me spoil the ending for you…I’d recommend EBS today but RBS later as it matures. Here’s the rationale:
Remote BLOB Storage (RBS)
Pros:
- RBS is implemented in SQL Server and is application agnostic. That’s to say, if you turn RBS on then all BLOB objects from any SQL Server-based application will be externalized. If that’s what you want to happen then that’s great but if you need to be able to apply business logic to what is externalized and whence it goes then you are severely restricted.
- It is simple – you turn RBS on and the content is simply stored on to the local file system. If you have some kind of file system virtualization software in place then you can do some basic management tasks but only based on the file system attributes of the object.
- If you want access to the context and metadata of the object then you are going to have to dip in to SQL Server and start hunting down SharePoint based reference information; Microsoft do not recommend this - in fact they do not publicly publish the DB schema for SharePoint so it would be potentially dangerous.
- The current thinking is that RBS might have more longevity than EBS. It is likely that EBS will fade out of the stack over time – obviously this is not 100% certain but likely.
Cons:
- Getting the content out of SQL Server only solves 5% of the real issues according to 9 of my 10 personalities. Seriously, getting the BLOBs out of SQL Server gives you scalability but it does not deliver any of the IT efficiencies, compliance overlays, or re-purpose/re-use benefits of managing the externalized content.
- Intelligent archiving is the key to getting this right. You need to have the BLOB, the metadata, the context and the ability to manage the object – no less than this. The RBS model only provides the BLOB – no context and no ability to manage the object.
- No business rule mapping…RBS is all or nothing – you get all BLOBs all of the time. EBS is not much better but does support certain rules. For example, in theory you could configure EBS to not externalize content from certain sites or content less than 50KB in size.
- Needs SQL Server 2008 – not a huge deal but a consideration.
External BLOB Storage (EBS)
Pros:
- EBS is provided by the SharePoint team and although it is lacking in some areas it does understand the context of the BLOB that it exposes. In other words, we do know what the BLOB object is and we can track changes/deletes on the object.
- The architecture allows is to provide an intelligent process for capturing the BLOB and just as importantly for returning the BLOB on demand, (i.e. when you want to view it from SharePoint).
- Because we are interacting directly with the SharePoint processes we can perform more intelligent operations. For example, if the BLOB was deleted (with good reason) from the store then we could cascade that delete back up to SharePoint. Same with changes to the object or its status.
- It does not require SQL Server 2008.
Cons:
- There are a lot of areas where I would improve EBS but for what we are doing at this point in time the only con is that EBS will probably not survive in the long term. For what it is worth, we have worked with Microsoft to ensure that a transition to RBS in the future would be seamless.
The Bottom Line
The fact that Microsoft have provided mechanisms to allow for partners to hook in to the underlying storage capabilities of SharePoint is testament to the fact that Microsoft recognize the value that other companies can add to SharePoint. I am often asked whether Microsoft might not just add all of the capabilities of a classic ECM solution to SharePoint - obviously they could but take it from me, they'd be better off focusing on usability, integrations, information worker productivity efficiencies and nailing the Office integrations - that's their sweet spot. It took us 15 years to build up the suite of ECM functionality that you see today and it was painful!
So what's next?
Not surprisingly we have a set of products that leverage all of the pros of this new architecture and that have been designed to add all of the benefits of classic ECM without taking away anything from the SharePoint user experience. Contact me if you need more information under NDA.
RBS actually sits as a library attached to the application, not as part of the SQL stack. This is definitely not an all-or-nothing prospect, the application is the one that determines if a blob should be stored in SQL Server or via RBS. Also, the File system provider is a demonstration provider, not one that should be used for deployments. RBS is primarily targeted at large scale CAS systems and is in fact a direct replacement of EBS.
Posted by: mike w | 02/06/2009 at 05:20 PM
If you have a framework for implementing an EBS provider, or can suggest a source for one, I'd be very interested in learning more.
Posted by: Mark Gerow | 03/08/2009 at 06:16 PM
Mark,
I'd suggest visiting http://www.codeplex.com/. I know that they have code samples for RBS and they might have something for EBS. Be warned that EBS is not going to be around for too many more years and also Microsoft do not recommend trying to do too much with EBS unless you have a lot of expertise. It is poorly documented and is much more complex to implement than it seems at first.
Andrew
Posted by: Andrew Chapman | 03/09/2009 at 08:27 AM
If you do not know, please do not pretend you know it.
Sharepoint 2007 cannot support RBS, so you have to use EBS on Sharepoint 2007. What is your point to compare EBS and RBS on Sharepoint 2007? RBS does not work.
Posted by: Jerry Lu | 06/19/2009 at 09:40 PM
Jerry,
Thanks for your lovely comment…
I’m sure that someone will correct me if I am wrong but if you were to run MOSS 2007 on top of SQL Server 2008 and SQL had an RBS provider enabled then the content would be externalized. If that’s all you need then you might be happy however if you want to know what the externalized object is then the RBS provider would have to do a lot of work. It would have to call back over to SharePoint to get the object’s context or derive it from the SQL tables directly (not recommended).
I think that the real question should be whether SharePoint 2010 will be RBS ‘aware’, i.e. will SharePoint 2010 interface directly to RBS or not – just like it does for EBS. See my previous posting about the relationship between SharePoint and RBS… http://nevertalkwhenyoucannod.typepad.com/nevertalk/2009/02/the-truth-behind-rbsis-it-really-all-or-nothing.html
Comments? If I am missing something please jump in, that’s what this forum is for.
Posted by: Andrew Chapman | 06/22/2009 at 05:53 PM
Jerry, Andrew is correct in pointing out that you can make SharePoint 2007 work with SQL 2008 RBS. It's not pretty and you have to change the Content column on the AllDocStreams and AllDocVersions tables from a Image to a VarBinary(max), effectively making your implementation unsupported by Microsoft, but it will work. Both EBS and RBS are available options in SharePoint 2010. Neither will give you much OOB, so you will need to develop a "provider" or leverage a 3rd party solution like StoragePoint. We support EBS today and will support both EBS and RBS for 2010. We already have a working version and it will be available for Beta testing with the release of the public Beta of SharePoint 2010 later this year.
Posted by: JerseyBob | 07/16/2009 at 10:50 PM
the con of only having the file content in the external blob exists for either RBS or EBS.
Posted by: klamerus | 01/12/2010 at 04:03 PM
It would be very useful to have performance information on RBS vs. EBS.
Of course there's going to be different performance based on how the blob contents are stored externally, but when using the same approach, what is the performance difference between using EBS and RBS.
Posted by: klamerus | 01/12/2010 at 04:10 PM
Does EBS support a 'deep copy' with Move-SPSite power shell cmdlet? We have implemented EBS on our product, and when we use Move-SPSite cmdlet to move site collection to another content database - the blob ids were successfully moved together with other content. But the blob data has not been moved. Does it works as expected?
The main question is whether the EBS binary data(blobs) should be moved or not when we use Move-SPSite?
Any help will be really appreciated. Thanks in advance.
Posted by: Rostyslav | 10/27/2011 at 06:54 AM