It is EMC World this week and as usual we are announcing a multitude of new/updated products. Two of them are relevant to my usual subject matter and thinking about them raised one interesting observation about the commoditization of certain aspects of data externalization from SharePoint.
The two products are ‘EMC SourceOne for Microsoft SharePoint’ and the next release of ‘Documentum Repository Services for SharePoint’. The former product is a product aimed at removing content from SharePoint and placing it in an archive; for operational reasons and for long-term preservation of SharePoint content that is no longer accessed actively from SharePoint. The latter I’ve mentioned before but it is basically a way of connecting SharePoint to Documentum in a way that is invisible to the SharePoint end user.
I’m not known for gratuitously marketing my employer’s products so you might wonder why I’m bringing this up in my Blog. Almost exactly a year ago I wrote a set of articles that focused on externalizing unstructured content from SQL Server. I concluded that there were four primary ways to handle externalized content:
- Dump the unstructured content on to the local file system from SQL Server - this gets the content out of SharePoint and relieves any SQL Server bloat related issues but you do not get any additional value out of the practice.
- Externalize all content types and store them in an archive alongside your other archived content - as well as solving the SQL bloat issues this adds significant value with common policy management, long term archiving and retention/disposition/litigation support on the archived content.
- Store unstructured content in your ECM system alongside your existing content – fixes SQL bloat; provides policy enforcement, long term archiving and compliance management plus the ability to re-purpose and reuse the content. Also gives you a natural integration point in to other enterprise information systems
- Put the content in an ECM system and use SharePoint as a portal – an excellent solution for accessing existing ECM content and processes but doesn’t really leverage SharePoint’s native capabilities.
One year on, where are we? Consider a brief analysis of each of the four options today…
1. There are so many solutions for simple file system externalization that some vendors are giving these away for free. The premise is that rather than storing the unstructured BLOBs (Binary Large OBjects) in SQL Server you use Microsoft supported APIs (EBS/RBS) to store the BLOBs on a local file server. This reduces DB bloat and helps smooth out the IO flow.
2. Storing your content in an archive works well for inactive content but is typically too invasive for most active content. The reason is that in this model the content is either moved to the archive and no longer a ‘first class’ SharePoint citizen or a copy is made in the archive for compliance purposes.
3. If you need the power of a traditional ECM system and want your SharePoint content to play nicely with all of the other corporate assets then this works well. If you just want the content out of SQL Server then options #1 & #2 are probably more cost effective.
4. As long as we have traditional ECM systems then this option will be around. Although this option can give you access to the power of the traditional ECM system you typically lose a lot of the rich functionality of native SharePoint so this is always not for everyone’s data.
One year ago you could find Microsoft partners touting solutions modeled around any of these options. However, the thing that I noticed about the SourceOne archive product and the new version of Repository services was that they both include BLOB externalization to the file system as a basic piece of their architecture; in fact I’d argue that in both cases this capability is being undersold because it is just a byproduct of the actual capability of the product.
Both products can externalize content to a file system and then optionally perform their core operations on the content. The SourceOne product externalizes content to the file system and then based on rules it can archive the same content out to a common repository to be managed alongside archive email and archive file system data, (the latter is to be released later this year I believe). In the repository services case the product has always externalized content to a staging area before it was ingested into Documentum but in the new release you can write more granular rules to leave some content in that staging area and only have specific content moved into Documentum.
I wonder which one of the remaining three will become just a product feature in another year…or maybe a new option will emerge. Perhaps the real question will be ‘when will Microsoft fill this architectural gap?’ in the future. Contrary to hype RBS does not do this in the core product; RBS simply adds another area where the externalization can occur. In fact, for solutions that need to do ‘intelligent’ management of the underlying content RBS is a much less elegant solution. More on that in a later post perhaps!