Accession passport data basics

Matija Obreza

version 2.3, December 2015
Documentation commit 3f00370354dda8e82047b4f462dd565046351c3d

1. Introduction

This manual contains basic information on commonly used standards for accession documentation and formats for data exchange.

Genesys PGR (Plant Genetic Resources) is a free online global portal accessible at www.genesys-pgr.org that allows the exploration of the world’s crop diversity through a single website. The data published on Genesys follows the Multi-crop Passport Descriptors (MCPD) standard.

The manual introduces:

2. Accession documentation in genebanks

Collections of plant genetic resources in genebanks document at least the following information for every accession:

2.1. Accession number

The accession number is the unique identifier assigned to material as it enters the collection. This identifier is often made up of a prefix, a sequence number, and sometimes a suffix.

The prefix is commonly used to differentiate between different crop collections maintained by a genebank.

Some prefixes used by the IITA genebank
  • TMe Manihot esculenta (Cassava) collection

  • TVSu Vigna subterranea (Bambara groundnut) collection

  • TZm Zea mays (Maize) collection

The sequence number is assigned manually or by a computer system to ensure there are no duplicates. Some institutes prefer to zero-pad the number, as in 00000102.

A suffix allows for differentiating samples of the same original material. A suffix might be used after making a selection from the original accession (e.g. a single seed descent) to be maintained as a separate sample. The exact meaning of the suffix is different for every institute.

Table 1. Example accession numbers
Prefix Sequence number Suffix Accession number

TMe

419

TMe-419

TVSu

13

TVSu-13

2.2. Other accession identifiers

Material enters a collection through collecting activities, breeding programs or acquisition from other institutes. In each case, the material will already have some kind of identifier assigned by the collector, breeder or other institute.

An accession name is the vernacular name of the material, and is commonly captured by the collector or assigned by the breeder.

2.2.1. Collected material

Genebank accessions obtained through collecting missions should maintain data about the site, date of collection and collector information.

2.2.2. Breeder’s material

Lines developed by an institute’s breeding programs may be included in its collection. Information provided by the breeders should include the pedigree or ancestral information (selection history) of the material, along with names and identifiers used by the breeding program and the codes and names of institutes that developed the material.

2.2.3. Acquisitions

Material coming from other institutes and genebanks must be accompanied by accession passport data as documented in the source genebank.

Country of origin is the country where the material was collected or bred, not the country of the source genebank.

Accession documentation should capture any identifiers provided by the source institute. This data allows for validation and curation of passport data between genebanks and allows researchers to obtain material from either collection.

2.3. Taxonomy

Accession genus, species, species author, subtaxon and subtaxon authority are usually known, but are subject to change after expert identification or change in the taxonomic system.

2.4. Storage and maintenance

Ex situ genebanks maintain plant genetic resources as seed, in the field, in vitro, in cryo or in DNA collections. A single accession may be maintained as several individual inventories or lots. Each inventory follows different management policies and is maintained in different conditions. For example, different inventories may be held in cryo and in vitro, or in base and active collections.

See Storage under the MCPD standard for more on how to capture multiple types of storage.

3. FAO WIEWS

The United Nations Food and Agriculture Organization (FAO) maintains the World Information and Early Warning System (WIEWS) on Plant Genetic Resources for Food and Agriculture (PGRFA). WIEWS was establised as a worldwide dynamic mechanism to foster information exchange among FAO Member Countries, and as an instrument for the periodic assessment of the state of the world’s PGRFA.

The FAO WIEWS database contains basic information about institutes working with PGRFA. The data includes full names, acronyms, website links and contact information.

Genesys regularly updates the list of institutes from the FAO WIEWS database and makes them accessible at https://www.genesys-pgr.org/wiews/active.

This data cannot be directly managed through Genesys. Changes must be applied to the WIEWS database itself.

3.1. WIEWS Institute Codes

An FAO WIEWS Institute Code consists of the 3-letter ISO 3166-1 alpha 3 country code of the country where the institute is located plus a number (e.g. COL001, USA1004).

The MCPD standard relies on WIEWS codes. The automated import of institute data through this code also allows Genesys to present individual pages for genebanks registered in the FAO WIEWS database.

Table 2. Direct access to genebank pages using a WIEWS code
WIEWS Institute Code Genesys URL

COL001

https://www.genesys-pgr.org/wiews/COL001

NGA039

https://www.genesys-pgr.org/wiews/NGA039

3.2. Obtaining a WIEWS code

A new WIEWS code can be generated by contacting your National Focal Point or wiews@fao.org.

3.3. Inactive WIEWS codes

The WIEWS code of an institute may change. In that case, the old record is marked as inactive and will refer to the newly assigned code. Genesys will render a message stating that the institute record is archived, and provide a link to the new code:

wiews archived
Figure 1. Notice of an archived institute record at https://www.genesys-pgr.org/wiews/ALB017

4. Multi-crop Passport Descriptors

The Multi-crop Passport Descriptors (MCPD) V.2.1 were released in 2012 as an update to the MCPD V.2. The MCPD V.2, in turn, was a 2001 revision of the first FAO/IPGRI publication, expanded to accommodate emerging needs such as the broader use of GPS tools and the implementation of the International Treaty on Plant Genetic Resources for Food and Agriculture Multilateral System for access and benefit sharing.

The 2001 list, developed jointly by Bioversity International (formerly IPGRI) and FAO, has been widely used as the international standard to facilitate germplasm passport information exchange. These descriptors aim to be compatible with Bioversity’s crop descriptor lists, with the descriptors used for FAO WIEWS and with the Genesys PGR global portal.

For each MCPD, a brief explanation of content, coding scheme and, in parentheses, suggested fieldname are provided to assist in the computerized exchange of this type of data.

The authors of the MCPD recognize that networks or groups of users may further expand the descriptor list to meet their specific needs. As long as these additions allow for easy conversion to the format proposed in MCPD V.2 and V2.1, basic passport data can be exchanged worldwide in a consistent manner.

4.1. MCPD descriptors

Table 3. MCPD descriptors
Field name Description

PUID

Any persistent unique identifier assigned to the accession so it can be unambiguously referenced at the global level and the information associated with it harvested through automated means. Report one PUID for each accession.

INSTCODE

FAO WIEWS code of the institute where the accession is maintained.

ACCENUMB

Unique identifier of the accession within a genebank.

COLLNUMB

Original identifier assigned by the collector(s) of the sample, normally composed of the name or initials of the collector(s) followed by a number (e.g. FM9909). This identifier is essential for identifying duplicates held in different collections.

COLLCODE

FAO WIEWS code of the institute that collected the sample.

COLLNAME

Name of the institute that collected the sample. This descriptor should only be used if COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.

COLLINSTADDRESS

Address of the institute that collected the sample. This descriptor should only be used if COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.

COLLMISSID

Identifier of the collecting mission as used by the collecting institute (e.g. CIATFOR-052, CN426).

GENUS

Genus name for taxon. An initial uppercase letter is required.

SPECIES

Specific epithet portion of the scientific name in lowercase letters.

The abbreviation sp. or spp. is allowed when the exact species name is unknown.

SPAUTHOR

The authority for the species name.

SUBTAXA

A subtaxon can be used to store any additional taxonomic identifier. The following abbreviations are allowed: subsp. (for subspecies); convar. (for convariety); var. (for variety); f. (for form); Group (for cultivar group).

SUBTAUTHOR

The subtaxon authority at the most detailed taxonomic level.

CROPNAME

Common name of the crop (e.g. malting barley, macadamia, maize).

ACCENAME

Either a registered or other designation given to the material received, other than the donor’s accession number (DONORNUMB) or collecting number (COLLNUMB). An initial uppercase letter is required.

ACQDATE

The date on which the accession entered the collection, in the format YYYYMMDD. Missing data (MM or DD) may be indicated with two hyphens or two zeros.

ORIGCTY

3-letter ISO 3166-1 code of the country in which the sample was originally collected (for a landrace, crop wild relative or farmers' variety), bred or selected (for breeding lines, GMOs, segregating populations, hybrids, modern cultivars, etc.).

COLLSITE

Location information below the country level that describes where the accession was collected, preferably in English. This might include the distance in kilometers and direction from the nearest town, village or map grid reference point (e.g. 7km south of Curitiba in the state of Parana).

DECLATITUDE

Latitude expressed in decimal degrees. Positive values are north of the Equator; negative values are south of the Equator (e.g. -44.6975).

DECLONGITUDE

Longitude expressed in decimal degrees. Positive values are east of the Greenwich Meridian; negative values are west of the Greenwich Meridian (e.g. +120.9123).

COORDUNCERT

Uncertainty associated with the coordinates in meters. Leave the value empty if the uncertainty is unknown.

COORDDATUM

The geodetic datum or spatial reference system upon which the coordinates given in decimal latitude and longitude are based (e.g. WGS84, ETRS89, NAD83). The GPS uses the WGS84 datum.

GEOREFMETH

The georeferencing method used (GPS, determined from map, gazetteer or estimated using software). Leave the value empty if georeferencing method is not known.

ELEVATION

Elevation of collecting site expressed in meters above sea level. Negative values are not allowed.

COLLDATE

Collecting date of the sample, in the format YYYYMMDD. Missing data (MM or DD) may be indicated with two hyphens or two zeros.

BREDCODE

FAO WIEWS code of the institute that has bred the material. If the holding institute has bred the material, the breeding institute code (BREDCODE) should be the same as the holding institute code (INSTCODE).

BREDNAME

Name of the institute (or person) that bred the material. This descriptor should only be used if BREDCODE cannot be filled because a FAO WIEWS code is not available or applicable.

SAMPSTAT

Biological status of the accession.

ANCEST

Information about pedigree (e.g. Hanna/7*Atlas//Turk/8*Atlas) or other description of ancestral information (e.g. mutation found in Hanna, selection from Irene, cross involving amongst others Hanna and Irene).

COLLSRC

Collecting/acquisition source.

DONORCODE

FAO WIEWS code of the donor institute.

DONORNAME

Name of the donor institute (or person). This descriptor should be used only if DONORCODE cannot be filled because a FAO WIEWS code is not available or applicable.

DONORNUMB

Identifier assigned to an accession by the donor. Follows the ACCENUMB standard.

OTHERNUMB

Any other identifiers known to exist in other collections for this accession. Use the following format: INSTCODE:ACCENUMB;INSTCODE:identifier;… INSTCODE and identifier are separated by a colon : without a space. Pairs of INSTCODE and identifier are separated by a semicolon ; without a space. When the institute is not known, the identifier should be preceded by a colon.

DUPLSITE

FAO WIEWS code of the institute(s) where a safety duplicate of the accession is maintained.

The WIEWS institute code for the Svalbard Global Seed Vault is NOR051.

DUPLINSTNAME

Name of the institute(s) where a safety duplicate of the accession is maintained. This descriptor should be used only if DUPLSITE cannot be filled because a FAO WIEWS code is not available.

STORAGE

Type of germplasm storage. If germplasm is maintained under different types of storage, multiple choices are allowed, separated by a semicolon (e.g. 20;30).

MLSSTAT

The status of an accession with regards to the Multilateral System (MLS) of the International Treaty on Plant Genetic Resources for Food and Agriculture. Leave the value empty if the status is not known.

REMARKS

The remarks field is used to add notes or to elaborate on descriptors with value 99 or 999 (= Other). Prefix remarks with the field name they refer to and a colon (:) without a space (e.g. COLLSRC:riverside). Distinct remarks referring to different fields are separated by a semicolon without a space.

4.1.1. Persistent unique identifier

A persistent unique identifier (PUID) is assigned to an accession so it can be unambiguously referenced at the global level and the information associated with it harvested through automated means. One PUID should be reported for each accession.

There are various standards for PUIDs, including DOI, UUID and LSID. The Secretariat of the International Treaty on Plant Genetic Resources for Food and Agriculture is facilitating the assignment of DOI to genetic resources at the accession level (http://www.planttreaty.org/doi).

UUID (Universally unique identifier) is an identifier standard used in software. A UUID is simply a 128-bit value (16 bytes). For human-readable display, many systems use a canonical format of hexadecimal text with inserted hyphen characters. For example:

de305d54-75b4-431b-adb2-eb6b9e546014

The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. In this context the word unique should be taken to mean "practically unique" rather than "guaranteed unique".

Different variants and versions of UUID exist. Version 4 (Random UUID) is the most commonly used in software.

Genebanks not applying a true PUID to their accessions should use, and request recipients to use, the concatenation of INSTCODE, ACCENUMB and GENUS as a globally unique identifier, similar in most respects to a PUID, whenever they exchange information on accessions with third parties (e.g. NOR017:NGB17773:ALLIUM).

4.1.2. Crop name

Genesys will read the CROPNAME as provided and attempt to link the name with an existing crop record in Genesys. Genesys currently supports the following crop names:

apple, banana, barley, beans, breadfruit, cassava, chickpea, coconut, cowpea, eggplant, fababean, fingermillet, grasspea, lentil, lettuce, maize, pearlmillet, pigeonpea, potato, rice, sorghum, sunflower, sweetpotato, taro, tomato, wheat, yam

The up-to-date list of crops and their coded names is available at https://www.genesys-pgr.org/c/.

As more data is uploaded to Genesys we will add aliases to crops, making sure that future uploads properly link the accession with the specified crop.

You are encouraged to use the crop names listed above, but more importantly, let helpdesk@genesys-pgr.org know if your crop is not yet listed.

4.1.3. Institute codes in MCPD

Values for INSTCODE, COLLCODE, BREDCODE, DONORCODE and DUPLSITE must be provided as FAO WIEWS codes of institutes.

4.1.4. Biological status of accession

The coding scheme for biological status can be used at two different levels of detail: either as a general code (e.g. 100, 200) or a more specific code (e.g. 110, 120).

Allowed values for SAMPSTAT field
  • 100 Wild

    • 110 Natural

    • 120 Semi-natural/wild

    • 130 Semi-natural/sown

  • 200 Weedy

  • 300 Traditional cultivar/landrace

  • 400 Breeding/research material

    • 410 Breeder’s line

    • 411 Synthetic population

    • 412 Hybrid

    • 413 Founder stock/base population

    • 414 Inbred line (parent of hybrid cultivar)

    • 415 Segregating population

    • 416 Clonal selection

    • 420 Genetic stock

    • 421 Mutant (e.g. induced/insertion mutant, tilling population)

    • 422 Cytogenetic stock (e.g. chromosome addition/substitution, aneuploid, amphiploid)

    • 423 Other genetic stock (e.g. mapping population)

  • 500 Advanced or improved cultivar (conventional breeding methods)

  • 600 GMO (by genetic engineering)

  • 999 Other (elaborate in REMARKS field)

4.1.5. Accession storage

If germplasm is maintained under different types of storage, multiple values are allowed. For example, when an accession is maintained in active and base collections, STORAGE corresponds to both 11 and 13 and can be encoded as 11;13.

Allowed values for STORAGE field
  • 10 Seed collection

    • 11 Short term

    • 12 Medium term

    • 13 Long term

  • 20 Field collection

  • 30 In vitro collection

  • 40 Cryopreserved collection

  • 50 DNA collection

  • 99 Other (elaborate in REMARKS field)

4.2. Genesys extensions to MCPD

Table 4. MCPD extensions
Field name Description

ACCEURL

Accession URL.

AVAILABLE

Indicates current availability of accession for distribution.

HISTORIC

Indicates whether the record represents an accession no longer actively maintained by the genebank.

UUID

Universally unique identifier of the accession record.

4.2.1. Accession URL

ECPGR originally extended the MCPD list with the Accession URL field ACCEURL. The field should contain a direct link to the provider’s online portal where additional data about the accession may be available.

Passport data of IITA’s TDr-3616 yam accession
ACCEURL: http://my.iita.org/accession2/accession/TDr-3616

4.2.2. Accession availability

Genesys allows end-users to request material from holding institutes. Accession records marked as not available in Genesys will be excluded from user’s requests.

In addition to setting the availability flag, genebanks must opt in to allow end-users to request material through Genesys.

4.2.3. Historic records

Accessions are on occasion removed from a collection. This is especially true for pre-bred material and genetic stocks that are maintained by the genebank for a limited period of time. The records about such material must not be deleted from databases, as they can potentially be tracked to other collections where the material is still actively maintained.

The holding genebank may want to mark such records by setting the value of the HISTORIC field to true.

Values null (not specified) and false indicate that the record represents an actively managed accession.

Historic accessions cannot be requested through Genesys.

5. Other relevant standards

5.1. ISO-3166 country codes

The ISO-3166 standard defines Codes for the representation of names of countries and their subdivisions. ISO-3166-1 alpha-3 codes are three-letter country codes. Genesys uses http://download.geonames.org/export/dump/countryInfo.txt as the source of ISO-3166 country codes.

6. Acknowledgements

Special thanks go to Michael Mackay, Angela Marcela Hernandez and Edwin Rojas for their input, feedback and support.

You can contact the author, Matija Obreza, at matija.obreza@croptrust.org.