|
Data QC Home
The jumping off point for the BCD2HDMS Data QC procedures
You are advised to review all instructions on this page prior to beginning any QC processes.
These are the QCs required for BCD2HDMS data conversion (really, these are required for any conversion to HDMS, but this documentation is rather BCD specific).
These QCs are primarily the pure technical QCs--they find records that will fail conversion. Content quality QC is methodically not addressed here.
Please note that this document is a work in progress. Your input is welcome. Please send your comments to mailto:product_support@natureserve.org and precede the subject line with "BCD2HDMS: DataQc:" (if you are at NatureServe, or have access to the NatureServe VPN, please add your comment directly to this page at http://jaguar.abi.org/hdmsdoc/DataQc -- you have to log in to get edit access).
First run the Missing Name QCs to save paper
- Go to the MissingSname page and run the QC manually.
- Go to the MissingGname page and run the QC manually.
- Go to the MissingNname page and run the QC manually.
- If you edited (updated) the GNAME, NNAME, or SNAME fields in the ET file, execute REBUILD_SYN_FILE at TCL. See the RebuildTheSynFile page for more information.
You do not have to run these preliminary QCs manually again. Since most people using an English Subnational BCD System do not have any data at all in the NNAME field, and in many cases no data in the GNAME field for records that have not yet been exchanged with NatureServe Home Office, this preliminary step is designed to reduce the inevitable waste of paper resulting from what was previously a "normal" situation.
Main QC procedure
- In the BCD, execute HDMS.QC.PLUS at TCL. While it's running, take a look at what this batch command is checking for you:
| HdmsQcPlusBatchProcess |
Compendium of the non-elimination QC procedures (these are QC procedures that report issues to be reviewed). Note that these are not optional QC procedures, but have been separated from the main body of QC procedures because the error reports do not reduce to "zero records" as problems are resolved. |
- When the HDMS.QC.PLUS batch job is completed, "press any key" twice (it'll prompt you). The text file (saved as "hdms\output\hdmsplus.txt") will be opened in Notepad.
| If you want to print the reports from Word instead of using notepad, see FormatQCReports for formatting directions. | |
For each resulting report, go to the appropriate section in the HdmsQcPlusBatchProcess document to find out how to understand the report, and what to do about the issue/records listed.
- In the BCD and execute HDMS.QC at TCL. While it's running, take a look at what this batch command is taking care of for you:
| HdmsQcBatchProcess |
Compendium of the QC's executed by the batch process including guidance on how to resolve issues. |
- When the batch job is completed, execute HDMS.QC.REPORTS at TCL to generate the resulting reports. When the HDMS.QC.REPORTS batch job is completed, "press any key" twice (it'll prompt you). The text file (saved as "hdms\output\hdmsqc.txt") will be opened in Notepad.
| If you want to print the reports from Word instead of using notepad, see FormatQCReports for formatting directions. | |
For each report, go to the appropriate section in the HdmsQcBatchProcess document to find out how to understand the report, and what to do about the records listed.
- If you edited (updated) the GNAME, NNAME, or SNAME fields in the ET file, execute REBUILD_SYN_FILE at TCL. See the RebuildTheSynFile page for more information.
- Repeat HDMS.QC until the report is blank.
- Execute SELECT ET WITHOUT {LEN({STATE})=2} to determine if there are extraneous or improper values within the ‘state’ field. If records are selected, execute LIST ET STATE WITHOUT {LEN({STATE})=2} to determine what the improper state values are. The program will only read the first two values in this field and then truncate the rest. Therefore, if you have any spaces or extraneous values before your state abbreviation, either incorrect data will be entered or potentially major problems could occur. You need to correct this! All state fields need to be populated - if you have null or incorrect values, correct them by populating this field with the appropriate state abbreviation.
- Manually, complete the remaining QC procedures listed below. These are procedures that either have not yet made it into the batch process, or procedures that do not have a "TCL-callable" interface (that is, they cannot be executed from a batch file)
Remaining Manual QC and Configuration Procedures
This section lists the QC procedures that are not executed by the HDMS.QC batch process. You should complete and resolve issues in that batch process before you move on to these procedures (although the order is not technically important).
| |
QC Name |
Fix Notes |
| _____ |
DuplicateGnames |
(are they really duplicates--how to tell what will happen during conversion) |
| _____ |
NonNumericEonum |
Three-part QC/configuration: Failure to perform this process completely could cause serious conversion failure and rejection of affected EOR records.
One part of this process used to be run by the HDMS.QC batch command, but was removed because this 3-part QC must be run in the correct order. |
| _____ |
ConfigurePrincipalEo |
Configuration: if you store EOCODE for principal EOs, here's your chance to have those data migrate to HDMS and link up your Principal and Sub-EOs |
| _____ |
AssignOwnerName |
Configuration based on data in OWNERCODE: If your ownercode values often refer to specific organizations, here's your chance to have these data pre-populated in HDMS for you--by default, the OWNERCODE value itself goes into the OWNER_NAME field. |
| _____ |
UnrecognizedWatershedCodes |
|
UNDOCUMENTED PROCESSES
These are listed here so they will not be lost. However, either no specific QC procedures have been developed (document will be updated as new information arrives), or the QC and configuration depend on what data exist in certain fields.
VALID_KEY QC?
There is a real need to select and review records that will not be exported from the BCD--nearly all (if not actually all) exports exclude records that do not have a valid key. Therefore, execute the following on every file: SELECT {bcdfile} WITHOUT VALID_KEY.
The most critical of these is the EOR (SELECT EOR WITHOUT VALID_KEY). Currently, a valid EOR key consists of the following:
10 character elcode
+ "*"
+ any-length-eonum
+ "*"
+ a valid state (which is a STATE in the STATES table whose NATION value matches the NATION value in the EOR record--if the BCD in question is not an English Subnational BCD, which means this may affect people converting from a multi-state system).
Also make sure that the SOURCECODE field does not contain disallowed characters (SELECT SA WITHOUT VALID_KEY).
Example: U8?MEL31HQUS (question marks not allowed)
Remove "ghost" values
See FixGhostProblems page--not for the faint of heart!
DATA CLEAN-UP FOR FIELDS NOT CHECKED BY QC ROUTINES
Some possible conversion issues came to light after the batch and manual QC routines listed above were developed. Data in the following fields should be examined and cleaned up if necessary to ensure correct conversion.
BESTSOURCE
Carefully document how the BESTSOURCE and SOURCECODE fields in the EOR file have been used (e.g., if a citation is entered in BESTSOURCE, does it always correspond to the first listed sourcecode?). Based on this information, you and your point of contact can determine the best way to convert these data and what, if any, data clean-up you need to do before conversion to Biotics. See the document conversion.faq at http://whiteoak.abi.org/hdms/SupportDoc/conversion_faq.htm for more details. (From the Biotics Home Page, this link is available under "Learn More" - "About Preparing" - "Frequently Asked Questions about Biotics 4.")
EORANKDATE
In Biotics, EO Rank Date is a "true" date field. It will only accept data that is formatted as a date, with year, month, and day. If you have 'fuzzy' or incomplete dates or other information in EORANKDATE (e.g., 1998-SU, 1994 , NO DATE), they will not convert and the field will be null in Biotics. You must either modify the non-date data or put it in another field. Consult with your point of contact/installer for options on how to handle these data. {NOTE: The FIRSTOBSDATE and LASTOBSDATE fields will accept "fuzzy" dates. Whatever you have in those fields in BCD will display verbatim in Biotics.}
FIELDS THAT MAY NEED SPECIAL ACTIONS
Data you have in any of the following fields will NOT move into Biotics 4 unless you and your installer (a) make a change on the appropriate conversion spreadsheet to map it to a standard field, or (b) set up an extensible field to receive that data. Check to see if these fields contain data you need to keep, and let your point of contact know so the conversion can be configured correctly. You may or may not need to do any further QC on the data.
SREFNAME
SREFNAME is so often used non-standardly that it could not be dealt with in a standard way in data conversion. Determine if your data matches any standard field in Biotics Tracker. If so, the field must be mapped on the element_tracking_ss.xls spreadsheet. If not, decide whether you want to migrate the data and discuss how with your point of contact.
NREFNAME--Canadian CDCs only
Confirm that data in NREFNAME is the French common name. If so, notify your point of contact to remove the “comment out” symbol (“-) from the lines on the element_tracking_ss.xls that map the symbolic CANADA_ COMMON_ NAME and NREFNAME to other_natl_common_name and d_language_id, respectively. Comment out the unmapped NREFNAME field. If it's not the French common name, decide whether you want to migrate the data and discuss how with your point of contact.
NREFNAME--other programs
If this field is used, decide if you want to migrate it and discuss how with your data conversion point of contact.
SRANGECOM in PCA or VCA
Data will be taken from ESR*SRANGECOM for the equivalent field in Biotics. If you have data in the PCA or VCA field and want to keep it, move it/reconcile it with the data in ESR. If you do not maintain ESR records, discuss how to proceed with your point of contact. (National programs: same comments apply to NRANGECOM and ENR.)
SMANAGECOM in PCA, ICA, or VCA
There is no comparable Characterization field in the Biotics data model, so the field is currently mapped to internal_notes in the Characterization record. You need to decide if the data should go elsewhere. If so, discuss how to proceed with your point of contact. (National programs: same comments apply to NMANAGECOM.)
SAGENCYSTAT in PCA
If there is data in this field, decide how to map it into the EL_SUBNATL_AGENCY_STATUS table (or elsewhere) and notify your data conversion point of contact. The species_charac_ss.xls spreadsheet will need to be edited. (National programs: same comments apply to NAGENCYSTAT.)
Missing Procedures
- Desire to catch "dangling" or "orphaned" records (e.g., ESR without related ET).
Back to RunningTheDumpForAnNHP
|
|