I just re-read a really interesting technical report from the Microsoft research team that relates to when it is ‘better’ to store BLOBs in SQL vs. externalizing them on to a file system. The report is very well written and is worth plowing through because there’s a lot of good information hidden in there. You can find the source article at http://research.microsoft.com/apps/pubs/default.aspx?id=64525 .
Here’s my brief assessment of the take-away points from my perspective.
- Storing BLOBs in SQL is easy…from an application perspective and from a back-end management perspective. For small files reading and writing to/from SQL is faster than opening a file on a file system.
- Storing BLOBs in SQL doesn’t scale to the levels that customers may require. Large file streaming is faster from the file system.
The report’s objective was to establish what ‘small’ and ‘large’ mean in these scenarios.
Bear in mind that when externalizing SharePoint content we need to balance a few specific factors…
- We want the maximum size of the SQL data store to stay below 200GB if possible.
- We want the data to be in the most efficient location for the conditions.
- We want to make the SQL instance practically manageable from an IT perspective.
The conclusions from the technical report are…
- Files less than 256KB are more efficiently stored in the database
- Files larger than 1MB are more efficiently stored in the file system
- Files between 256KB and 1MB could be stored in either – if in doubt store them on the file system.[i]
How it might affect your decisions when externalizing SharePoint content from SQL.
- If you have very few large (>256KB) files then you might not want to externalize any content from SQL – just leave it alone…assuming the total size doesn’t greatly exceed 200GB.
- If the majority of your BLOBs are large then externalize everything.
- If you have a variety of file sizes then ideally you should filter data based on both the individual file sizes and the need to keep the overall SQL data store total below 200GB.
The conclusion from the report is best summarized in this one paragraph…
“The study indicates that if objects are larger than one megabyte on average, NTFS has a clear advantage over SQL Server. If the objects are less than 256 kilobytes, the database has a clear advantage. Inside this range, it depends on how write intensive the workload is, and the storage age of a typical replica in the system.”
[i] The ambiguity arises because in a new system files between 256KB and 1MB are better stored in SQL but over time as fragmentation occurs the performance of file management in SQL degrades faster than the file system (SQL doesn’t have any de-frag capabilities).
Comments
You can follow this conversation by subscribing to the comment feed for this post.