Extracting Files From a Corrupt Site Collection Content Database

So here I am sipping away at my new found love of Jasmine Green Tea when the Queue Manager pops on Office Communicator. Kris, we have a Severity A. That usually means system down and someone is filling their pants with brown stuff.

Turns out it was a certain customer in Greece with a failed Microsoft Office SharePoint Server (MOSS) 2007 restore and they really, really, really wanted a file restored. They pretty much went around it the correct way except this particular site collection didn’t come up.

We browsed to the site and got the wonderful 404 HTTP error. Great start. We checked the Central Administration > Application Management > Site > Site Collection List. Sure enough it was there.

Next step to see if it exists at all, so we navigate to;

http://intranet/sites/sitename/_layouts/settings.aspx

That seems to work. It’s a start. If that works lets go ahead and extract the site using our new export and import commands in stsadm to see if we can at least extract the file. We ran;

stsadm -o export -url http://intranet/sites/sitename -filename sitename.bak -includeusersecurity -level 4 -nofilecompression

When you run the export command it creates a nice filename.export.log for you to check and see what files are extracted. So we go ahead and check this file and no deal. No files there, in fact the whole export is about 1MB in size and the site is blank. How odd.

It’s time to get down and dirty with SQL now. *sigh* This is where all the fun starts. Now SQL Server and the TSQL queries aren’t really my area of specialty but I know enough to get myself out of some pretty tight situations.

So lets go ahead an have a look see if the file exists in the Content Database. So we fire up Microsoft SQL Server Management Studio and start a New Query. We’re going to, deep breath, SELECT a Content Database, the SELECT the AllDocs table, look for any files that end .doc and hopefully sort by the Site then by the Document. Using this statement;

USE WSS_Content_0001
SELECT * FROM AllDocs
WHERE LeafName LIKE ‘%.doc’
ORDER BY DirName, LeafName

Searching the actual database we see that this document actually exists but not when we export the site using stsadm via the SharePoint Object Model. Strange.

My next thought was to check the database for any oprhaned files. To perform this with our native tools we run the following command;

stsadm -o databaserepair -url http://intranet/sites/sitename -databasename wss_content_0001

You can also add the deletecorruption switch to go ahead and fix any issues for you, we just want to see if there is indeed an issue so we’ll omit it. Have a look at;

Databaserepair: Stsadm operation (Office SharePoint Server)
http://technet.microsoft.com/en-us/library/cc263282(TechNet.10).aspx

So that command gets run and oddly enough the results show 0. Even stranger. I was getting curious so my next step was to have a wee look and see what our Webs table tells me about the site location. The Webs table keeps a record of ALL your sites, whether they be top level, subsites, Meeting Workspaces or any other site you collect. So I ran the following command;

USE WSS_Content_0001
SELECT Id, SiteId, FullUrl, ParentWebId FROM Webs
WHERE FullUrl LIKE ‘%sitename%’
ORDER BY
FullUrl

I was quite taken a back by what I saw. The FullUrl table return about 20 entries with /sites/sitename with only one of them listing the GUID of the ParentWebId! Ouch! This database is will have to go, but first we need to extract this one file for them to use.

There are methods to extracting the files from the AllDocStreams table. Your actual files are held within the Content column of the AllDocStreams table, so when you download a document you are actually triggering a stored procedure that utilises remnants from the AllDocs, AllDocStreams and AllDocVersions tables.

To extract your files from Content Database you will need a couple of things.

Firstly you’ll need to get a hold of a couple files from the the SQL Server 2000 Resource Kit. I was lucky enough to have a copy laying around in our Library down in Level 1. Once you have that you’ll need to copy two files from the CDRom;

textcopy.exe from \x86\binn directory
ntwdblib.dll
from \x86\system directory

Then using TSQL queries we can extract the file and GUID from the database using the following statement;

USE WSS_Content_0001
SELECT AllDocStreams.Id, AllDocStreams.[Content], AllDocStreams.Size, AllDocs.Version, AllDocs.TimeLastModified, AllDocs.CheckoutUserId,
AllDocs.CheckoutDate, AllDocs.IsCurrentVersion, AllDocs.DirName, AllDocs.LeafName, AllDocs.[Level]
FROM AllDocs INNER JOIN
AllDocStreams ON AllDocStreams.Id = AllDocs.Id AND AllDocs.[Level] = AllDocStreams.[Level]
WHERE (AllDocs.DirName = ‘Shared Documents’) AND (AllDocs.LeafName = ‘example.doc’)

When you enter this query you might return several rows. This will signify that you have several versions of a file. This can be validated by the column isCurrentVersion. If it has a 1 next to it, then this is the latest version. Depending on your versioning setup, you will also get differing entries in the Level column, take note of this for the next step.

So now we’re ready to go and extract the file from the database with the above details. Finally. The juice is running and it’s taken a while to get here, but the customer appreciates we are trying everything to get them out of a pickle. Back to the extraction. This is where textcopy.exe comes into play. A word of warning. We had to create a new user using SQL rights and associate temporary db_owner rights to the WSS_Content_0001 database, you might have to perform the same.

Some background to textcopy.exe to get started, remember, knowledge is king in this industry. The help details of textcopy.exe is;

Copies a single text or image value into or out of SQL Server. The value is a specified text or image ‘column’ of a single row (specified by the “where clause”) of the specified ‘table’.

If the direction is IN (/I) then the data from the specified ‘file’ is copied into SQL Server, replacing the existing text or image value. If the direction is OUT (/O) then the text or image value is copied from
SQL Server into the specified ‘file’, replacing any existing file.

TEXTCOPY [/S [sqlserver]] [/U [login]] [/P [password]] [/D [database]] [/T table] [/C column] [/W"where clause"] [/F file] [{/I | /O}] [/K chunksize] [/Z] [/?]

/S sqlserver The SQL Server to connect to. If ‘sqlserver’ is not specified, the local SQL Server is used.
/U login The login to connect with. If ‘login’ is not specified, a trusted connection will be used.
/P password The password for ‘login’. If ‘password’ is not specified, a NULL password will be used.
/D database The database that contains the table with the text or image data. If ‘database’ is not specified, the default database of ‘login’ is used.
/T table The table that contains the text or image value.
/C column The text or image column of ‘table’.
/W “where clause” A complete where clause (including the WHERE keyword) that specifies a single row of ‘table’.
/F file The file name.
/I Copy text or image value into SQL Server from ‘file’.
/O Copy text or image value out of SQL Server into ‘file’.
/K chunksize Size of the data transfer buffer in bytes. Minimum value is 1024 bytes, default value is 4096 bytes.
/Z Display debug information while running.
/? Display this usage information and exit.

So in our specific example we need to perform the following statement;

textcopy.exe /S SQLServer01 /D wss_content_0001 /T alldocstreams /C content /U tempadminuser /P P@55w0rd /F C:\Temp\example.doc /O /Z /W “where ID=’E0DEB98E-4E02-11DD-A410-E46C56D89593′ and Level=’255′”

This allowed us to extract the example.doc file from the Content Database associated with one of the Web Applications. The customer’s happy, I’m happy, everyone’s happy.

Case closed.

Kristof Kowalski | kristof@kowalski.ms