This is the third in a series of articles that discuss the benefits of Moving Content out of SQL Server. The first discusses pairing SQL Server with a file system, the second with an archive; this one discusses pairing SQL Server with a traditional ECM solution. The series wraps up with an overview of the pros and cons of each including not putting it there in the first place.
Overview
In the context of SharePoint, it is not a trivial task to perform a low-level integration of SQL Server into an ECM system but the benefits can be significant. Let me start by saying that typically you might not want all of your SharePoint content to be stored in an ECM system but for those specific types of content where it makes sense this solution can reap significant rewards.
Let’s recap what a reward is in ECM-land…a reward is something that either makes you money, saves you money or keeps you out of jail. In summary, we are looking for ways of re-purposing content created in SharePoint to make money, efficiency improvements to save money and gains in security and compliance to keep us out of jail. I’ll use this model as a way of analyzing the pros of this pairing; later in the entry I’ll discuss the weaknesses of this approach and how it compares to pairing to an archive.
How does it work?
Technically, it works in a very similar fashion to the archiving solution described earlier – in some cases you might only be storing the unstructured content, in others you might also take the structured too (Blog entries, calendar items, tasks, etc.). The rationale for only taking the unstructured content is that ECM repositories are often just used as “Document Management” systems so just taking the documents can make sense. Also the documents constitute the majority of the storage overhead and often the most critical business assets.
SharePoint content is stored in the ECM system along with a copy of its metadata; inside the ECM system you have the document and its context from SharePoint. Once it is in the ECM system you can re-purpose it or just protect it based on the ECM’s capabilities. You have to take care when re-purposing the content because SharePoint may ask for it back in the future and if you changed it or deleted it then SharePoint will fail.
The following illustrate the kinds of things that you might do with the content once it has been captured in the ECM system.
Making money
The bottom line is that your SharePoint sites contain a lot of corporate assets – documents that represent the intellectual property that your clients pay for. If you are able to aggregate all of that valuable content in to a single location and then index and categorize it you can re-purpose and reuse it. If you don’t have to start from scratch each time you want to deliver data to clients then you are able to decrease the cost of developing collateral and increase profits.
It should become clear that the more compelling reasons to pair SharePoint to a traditional ECM system are based on operational efficiency and compliance but I do talk to a lot of customers who view re-purpose and reuse as a compelling win for them.
Saving Money
Operational Efficiency
The file system pairing and archive pairing entries both focused on how to better manage BLOBs once they have been externalized from SQL Server. Depending on your ECM system of choice, all of these efficiencies should be available to you by using an ECM system as your aggregated repository. Tiered storage, de-duplication, SQL Server scalability, etc.
Integration in to existing systems
SharePoint systems tend to be a little isolated in corporate environments. It is not Microsoft’s strategy to integrate tightly in to the non-Windows systems within your organization but it is fair to assume that your organization has data and processes outside of the SharePoint environment. If your SharePoint content is available from your ECM system then you can make use of it from within any system that already integrates in to your ECM systems.
In this model the ECM system becomes the center of your information infrastructure and SharePoint becomes both a contributor and consumer of information just like your other systems. You can start to manage the SharePoint unstructured content in the same way that you manage other content types – directly created ECM content, scanned images, formal records, physical records, web content, etc.
Distributed content
Most ECM systems are able to distribute content geographically and then serve it up to users from the closest available location. This can be critical when bandwidth is an issue or content sizes are large. If SharePoint content is stored in your ECM system and then pushed out to remote locations it can be consumed from the local cache…in theory anyway!
Long term archiving platform
Even SharePoint content deserves dignity in its old age. If you have specific content that needs to be retained you can keep it in SharePoint and keep the entire SharePoint stack running for the life of your retention period or you can make use of the existing long term archiving policies in your ECM system. In this case you retain the content in the ECM system and decommission the source SharePoint site.
Although you are probably not thinking about this yet I believe that this is one of the most significant wins – second only to tiered storage management in absolute costs savings,
Management efficiency
You are already managing storage, compliance policies, disposition, holds, workflows, transformations, web publishing, long term archives, etc. in your ECM system. Rather than duplicating this effort in SharePoint you can consolidate the effort in to the ECM systems at little or no extra cost.
Staying out of Jail
Many companies view their traditional ECM system as being the ‘system of record’ and for that reason alone they want to get critical SharePoint content in to the ECM system. Once the ECM system has the content you can leverage existing retention, formal record, data protection and disposition policies. You can also have common audit reporting on that content which can greatly aid in proving that content has been adequately protected.
I’m slightly more skeptical about eDiscovery support from the ECM system alone. In theory this makes sense – you have all of the critical SharePoint information in one place why not discover against it? In reality you will probably need to mine your content directly from SharePoint if it is still active. This is for two reasons, firstly you may not have absolutely all of the unstructured content in the ECM system and until you mine the content you don’t know what you don’t have. Secondly, you may not have access to all of the context of the object – you might know its metadata but do you know what its relationship was to the rest of the SharePoint content?
So, what’s the difference between this and case #2
This is a very common question and I think that it can be hard to draw a completely clean line between them. The truth is that there’s a lot of similarity between pairing SharePoint to an archive and to an ECM system. In fact it looks like the ECM solution fully encompasses the benefits of the archive but in reality it is more of a Venn diagram.
Let’s start with what the ECM solution can do that the Archive might not…
- Re-use – typically an archive is not built to support the re-purpose and reuse use cases. The archive is treated more like a black box with only the archive admin processes, eDiscovery and SharePoint accessing the content.
- Center of existing processes – Many companies have invested heavily in building ILM processes around their ECM systems but not around their archives. They may have some retention and disposition policies in place in the archive but it would be rare to see something like full blown workflow.
- Better security and compliance – ECM systems will provide a more comprehensive set of data protection and compliance capabilities including fully certified formal records management.
- Bridge systems – Archives tend to ingest content and then manage it in a closed environment whereas ECM systems ingest content and then make it available to other systems to utilize, this allows you to use the ECM system as a bridge between SharePoint and your other systems.
Now consider when an archive might be optimal…
- Typically ECM systems are not optimized for rapid ingestion of large numbers of objects or for long term archiving/disposition management. I’m not saying that they cannot do this just that archives are typically designed to scale out to a higher degree.
- ECM solutions typically store unstructured content very well but may create an unacceptable overhead when storing structured content (calendar items, tasks, Blogs, etc.) Archives should be able to store structured content in a more efficient manner.
- ECM systems are usually extremely feature rich but that also translates in to cost and complexity. Archiving systems tend to be less expensive and have lower maintenance so if you don’t need all of the benefits that ECM brings then an archiving pairing might make more sense.
- In both ECM and Archiving you can see a tendency towards data bloat but this tends to be higher in ECM systems- for example the ECM system might store multiple renditions not just multiple versions.
Conclusion
I think that it is fair to be confused with regards to the difference between pairing an archiving system or an ECM system to SharePoint. I think that it comes down to what you are going to do with the information once you have it in your repository. I’d proffer that most archive systems do not expect you to work actively with archived content. ECM systems however are designed to hold your most important electronic assets. If you consider what an ECM system allows you to do to your content it includes business process support, transformations, lifecycles not just data protection, compliance and disposition.
Very generally, if your unstructured content is high-value inactive content or active content then an ECM system is probably a better choice. If the unstructured content is fixed or is unlikely to be changed over time then an archive is probably better.
So, how would you classify your SharePoint content – active or fixed? Well it depends on what you get and when you take it. You’d typically expect to see an archive solution taking content towards the end of its life, (perhaps the object gets versioned, moved to a different folder or has not been accessed for 6 months. Certainly it would be appropriate to decommission entire site collections and dump them in to an archive. For more active content or content that has a very high value to the organization you might expect to see that being managed in an ECM system.
Can you use both in parallel? Absolutely, using lifecycle management you could move content between backend repositories when it makes sense – financial, regulatory and performance related.
What’s next?
In the next entry I’ll briefly discuss one extra alternative approach…if you really have a problem with how SharePoint manages your content then don’t put it there in the first place. Put it somewhere more appropriate but use the SharePoint UI in order to access that content.
Comments