One of the great things about blogging is that you have to think hard about what you are writing, literally tens of thousands of people might read your blog...granted I am lucky if half of my immediate family read this drivel but conceptually you get the point. Realizing that once you've written something you cannot retract it really does make you think twice about what you say.
Engaging my brain before I speak is not my natural state, just in case you wondered so this is quite a challenge for me.
Point in case - I'm sat on a flight back home typing the "SharePoint-ECM Reference Architecture 2: Loosely Coupled Solution" entry and I realize that I've forgotten to mention one key point in my SharePoint-related ramblings...I am focused on the archiving and de-duplication primarily of unstructured content...PDF files, Word documents, etc. I am neglecting the oh so important structured data types that SharePoint manages - discussion threads, calendar entries, etc. There is no doubt that these structured data types - especially the former example - are critical components of a compliant archiving solution.
Given this I'll do two things,
- Firstly I'll be explicit when I am referring specifically to archiving, unification and aggregation of unstructured content vs. unstructured content
- Secondly, I'll talk to the people from whom I steal all of my ideas and trick them in to telling me how to deal with archiving structured data. I'll then post some entries related specifically to that topic.
I'm pretty sure that the core compliance issues are fundamentally the same no matter the data type but as usual, the devil is in the details...
A challenge with the continuously changing types of collaboration information such as discussion threads or wikis is determining the event to trigger the migration (or replication) into an ECM store. Wiki pages can undergo very tiny changes creating a vast array of versions in the repository. Threaded discussion may be separate "records" or just a single record that is updated by appending more comments at the end. All these mechanisms proliferate a lot of similar data to clog up the store, not to mention generate a lot of bandwidth feeding it. The typical wiki mechanism (presumably used in SP) of just saving diffs may have to be added when migrating this kind of date, ideally at the source (to reduce incoming bandwidth) but at the backend if that is the only convenient place for this function.
Posted by: douq millar | 02/26/2008 at 03:00 AM