HDMS Wiki Home

  • DuplicateGnames
  • Last edited by Maggie Woo on Mar 14, 2003 2:00 pm

    Tally to find duplicate GNAME

    Duplicate GNAME values can cause serious data conversion problems.

    Technically, it's OK to have a duplicate GNAME. What's not OK is to have the same GNAME/AUTHOR/NAMEREF combination in more than one ET. The times you run into this are when the GNAME has a value, but the other columns do not (in other words, they get the default value).

    QC Procedure:

    1. Execute a Tally on the GNAME field in the ET file. See further below for instructions using KEYFILE.
    2. Open each ET with the duplicates found and make certain they refer to different concepts (i.e., Author is different and/or Nameref is different, and/or classification status (derived from Namesource) is different).
    3. Make a list of the ET records found in the tally (this can sometimes be a real pain, but you can do this using KEYFILE--see further below)
    4. Review the list by running the following:
      • GETLIST ET_GNAME_DUPES_LIST
      • LIST ET CONCEPT_NAME_SYNCODE CONCEPT_REFERENCE BY CONCEPT_NAME_SYNCODE BY CONCEPT_REFERENCE
    5. If the list above has any duplication between the two pieces of data (CONCEPT_NAME_SYNCODE and CONCEPT_REFERENCE), then the BCD2HDMS conversion on the Perl side will encounter commit errors (due to unique constraint violations). Please resolve the differences.
    6. Execute REBUILD_SYN_FILE at TCL (see the RebuildTheSynFile page for more information). Repeat the QC until there are no duplicates left.

    Resolution:

    Consolidate Elements that represent the same concept. If some ET records exist to manage "overflow" EOR records, see the NonNumericEonum QC for ways of handling these ET records so that the EO records will be correctly consolidated under the correct Element in the HDMS.


    Using KEYFILE to run a tally (quick instructions)

    Step Example
    Select the records you want to tally on. SELECT ET
    Execute the keyfile to a temporary file. KEYFILE ET GNAME ET_GNAME_TALLY (O)
    Select the duplicates for a KEYLIST. SELECT ET_GNAME_TALLY WITH {COUNT({ET.KEYS}, @VM)} > 0
    Make the list of duplicates. KEYLIST ET_GNAME_TALLY ET.KEYS ET_GNAME_DUPES_LIST
    View the records in the window. Open the ET window, then retrieve the saved list ET_GNAME_DUPES_LIST

    Back to DataQc