This is part two of a thrilling series of entries related to the aggregation of SharePoint content. It relates back to reference architecture #7 and part one in the series.
In the previous entry I rambled on about the “why” behind data aggregation. In this entry I’ll give you a high-level overview of the “how”.
What are my options?
I think that it is fair to say that up until recently, (prior to MOSS 2007 SP1 and SQL Server 2008), you only had three options:
- Move objects out of SharePoint in to an aggregated storage location when appropriate.
- Do #1 and leave behind a shortcut so the content is not orphaned in SharePoint.
- Build your deployment infrastructure around the inherent limitations in SharePoint.
Post MOSS 2007 SP1 and SQL Server 2008 you can add two sexy new options to the list.
- Use Remote Blob Storage (RBS).
- Use External Blob Storage (EBS).
I am going to briefly describe all 5 options and then in the next posting I’ll discuss in more detail the epic battle between the forces of good (EBS) and the forces of evil (RBS). (Only kidding SQL team, I am sure that RBS’s mother loves it!)
1. Move objects out of SharePoint in to an aggregated storage location when appropriate.
The premise here is that you would start to move content from SharePoint in to the aggregated store based on some business criteria; for example when an object becomes a formal record. The object goes from being owned by SharePoint to being owned by the aggregated storage device - and no longer accessible from SharePoint. In some cases this is fine, (for formal records management for example), but it is invasive and typically I would not call it a scalable solution - i.e. it does not really suit all use cases. One other consideration is that you are not really dealing with the underlying challenges that SharePoint's silo-oriented architectural model creates, you are just bypassing them.
Perhaps most importantly, this solution is technology driving the deployment model rather than the business needs driving the deployment. Of course it almost goes without saying that the end users hate it; their precious documents just disappeared from their SharePoint site never to be seen again...hardly the answer they were looking for!
2. Do #1 and leave behind a shortcut so the content is not orphaned in SharePoint.
In this case we move content out of SharePoint just like in #1 above but we leave behind a shortcut/proxy object/stub/pointer/reference object, (that's one thing not five different things - this concept has more aliases than a mafia boss in the witness protection program). The shortcut is a pointer from SharePoint back to the object that was moved in to the aggregated store.
When you write it down, see a vendor led demo, or read the marketing literature this concept sounds "just peachy". Conceptually, it gets the content out of SharePoint but allows users to still access the content from within SharePoint. Wow, what's better than that? Truthfully, the only thing better than that would be a solution that actually worked in reality not just in marketing ga-ga land. (No offence to marketing people, some of my best friends "work" in marketing.)
Shortcuts sound great but I am yet to see any vendor really implement a seamless, working shortcut in SharePoint. SharePoint inherently doesn't support this concept so some actions kinda work but most just don't. SharePoint shortcuts are a bit like DOS/Windows shortcuts, on one level they look like a great idea but on the other 99 levels they suck. If you want to test this then try routing a shortcut through a workflow and see what actually moves through the workflow, (a clue - it is the shortcut object not the document). Try managing links between documents when one of the target documents gets “shortcutted”, (a clue - they break because the object name changes from wibble.doc to something like wibble.doc.aspx). Try updating the object’s metadata from within an Office application when it is represented by a shortcut, etc….
3. Build your deployment infrastructure around the inherent limitations in SharePoint.
More, you want more?
The Oliver Twist approach to SharePoint data management. Just don’t let the SharePoint repository get too full; when it starts to get a bit bloated just crank up a new deployment and start again. Bizarrely enough this is the most common approach that we see today…scary isn’t it?
What's wrong with this approach? Again this is technology driving the deployment model rather than the inherent business need. Also, you are actually making the problem worse because you are creating even more silos.
4. Use Remote Blob Storage (RBS)
The next installment in this series goes in to details surrounding the pros and cons of RBS; I’ll just give you an overview here.
RBS is implemented by SQL Server (only SQL Server 2008 and later); it is nothing to do with SharePoint directly. When you enable RBS, all BLOB streams that SQL Server would normally be compelled to store internally are spewed forth to the file system. There is no selectivity – all BLOBs get up-chucked. Note that because this happens at such a low level, you do not have access to any of the SharePoint-related information about this BLOB. Without accessing the SQL tables directly, you do not know what it is, from whence it came, its security settings in SharePoint, its relationship to other versions, etc.
5. Use External Blob Storage (EBS)
The next installment in this series goes in to details surrounding the pros and cons of EBS; I’ll just give you an overview here.
EBS is implemented by MOSS 2007 (available as a hot fix to MOSS 2007 SP1 and later). The EBS provider lives at the very bottom of the SharePoint stack, just above the interface in to SQL Server. Just before the BLOB is passed to SQL server you have the opportunity to optionally take ownership of the object. The metadata and context related to the object continue in to SQL Server but you get the actual binary object to manage.
Remember, if you elect to take the BLOB then you are also responsible for making sure that you can retrieve it when the EBS provider asks for it, (when a user views the object for example).
RBS or EBS?
These two technologies have their pros and cons…I’ll go in to more details in the next posting…stay tuned.
BTW - If I have missed any options then post a response and I'll add them...