Hi J.D.,
> Recently our webserver had a problem with a full disk on the root
> partition, where BlueQuartz stores the codb.
Outch. That's pretty bad. Whenever a Linux runs into a 100% full / partition
"bad things" happen. For example: Linux tries to write to a file. It opens the
file for writing, erases the content and then tries to write the new data.
Then it realizes: "Dang, not enough space!" and the net result is that the
file (or at least all data in it) got destroyed in the process.
So it can be assumes that a number of files that were written to during the
100% disk utilization situation may have been corrupted.
> We currently have around 3800 directories in /usr/sausalito/codb/objects,
> so there should be room for more, but whenever we try to create anything
> new, we get an error like: UNKNOWN ERROR DURING CREATE.
>
> In /var/log/messages we get errors like this as well:
>
> ww1 cced(smd)[28872]: client 7:[48:25081]: CREATE "DnsRecord"
> "mail_server_name" "=" "domain.ca" "type" "=" "MX" "domainname" "="
> "domain.ca" "mail_server_priority" "=" "low" "hostname" "=" ""
> Aug 24 15:08:34 ww1 cced(smd)[28872]: client 7:[48:25081]: CREATE
> DnsRecord failed (-7)
The error message "UNKNOWN ERROR DURING CREATE" and the "-7" status message in
/var/log/messages indicate a problem that I know pretty well.
It's a little complicated to explain, but I'll do my best:
CODB has "Classes" and "Objects".
A "Class" defines a database Object. Like what kind of storage fields (keys)
it has inside and which kind of data (values) they take. For example a
database field of type "ipaddr" will only take values that are valid IP
addresses and nothing else.
When an Object is created, the create action specifies of which Class the new
Object is. Even if you write no data at all into the new Object, all the
default storage fields as defined for this Class are created in the new
Object. These can later be populated with whatever data you want to write into
the Object.
Now on to your problem:
CODB also has an Index. That's basically a textfile which keeps track of which
Object IDs (numbers) are already taken and which ones are free for usage. Of
course every Object must have its unique Object ID. No two Objects may have
the same ID.
Whenever a new Object is created, CCE refers to the Index to check which
lowest Object ID is still free for usage. It then creates the new Object with
the lowest free Object ID as reported by the Index.
You probably see the problem already:
When your / partition was 100% full, the file that contains the list of used
Object IDs (the Index) apparently got messed up. When your GUI then tried to
create new Objects, it found and empty Index (because it got destroyed) and
therefore started to re-use Object IDs which were already in usage.
This then caused that the Object directories got populated with database
fields from more than one Class. That in turn essentially corrupted those
Objects to a point where CODB can no longer use them and this causes the
"UNKNOWN ERROR DURING CREATE".
The "-7" status message appears whenever CODB tries to update an existing
Object with information and suddenly fiends database fields in it, which -
according to the Schema for this Class - shouldn't be there.
I hope you could follow me so far.
Now how to fix it?
In short: This is a trainwreck and a usually non-recoverable situation.
If you have a backup copy of /usr/sausalito/codb/ which was taken like a day
before the crash, then you might want to try to use that one instead. But even
then you could run into major inconsistencies like missing users, missing
sites, changed settings and what not. This depends on how many changes were
made through the GUI. Not only by you, but also by siteAdmins and regular
users.
Even a CMUexport / CMUimport may not work, as CMU might be unable to export
the data correctly if CODB is so highly inconsistent and messes up. In that
case you might have to fall back to a CMUexport that was taken before the
accident.
Is a manual repair of CODB possible? Yes. But is it practical? Probably not.
You said yourself: You've got 3800 database Objects. One would need a good
familiarity with the different CODB Classes and would need to examine all 3800
Objects to make sure each only contains database fields which are expected to
be there according to the Schema fields for that Class. There are no automated
tools available for that and doing it manually could be a herculean task.
Additionally one would have to re-create the CODB Index file from scratch -
which (of all things) is the most trivial.
It's the file /usr/sausalito/codb/codb.oids which contains the Index and in an
example box of mine it looks like this:
1-554,570-595
Which means: Object IDs 1-554 and 570-595 are taken. All others are free.
Over the years I've run into this issue a few times (as recently as last
Saturday) and usually the quickest way to recover from it is to restore from
the backups.
I'm sorry to report these bad news, but a 100% full / partition can be pretty
destructive. :o(
--
With best regards,
Michael Stauber