Accession passport data basics

Matija Obreza

version 2.0, December 2015
Documentation commit 0585666da5cfcf3210edd9d6975c25cad04620c5

1. Introduction

This manual contains basic information on commonly used standards for accession documentation and formats for data exchange.

Genesys PGR (Plant Genetic Resources) is a free online global portal accessible at www.genesys-pgr.org that allows the exploration of the world’s crop diversity through a single website. The data published on Genesys follows the Multi-crop Passport Descriptors standard.

The manual introduces

2. Acknowledgements

Special thanks go to Michael Mackay, Angela Marcela Hernandez, Edwin Rojas for their input, feedback and support.

You can contact Matija Obreza at matija.obreza@croptrust.org.

3. Accession documentation in genebanks

Collections of PGRFA material in genebanks document at least the following for each accession

A single accession is usually maintained as several individual inventories or lots, where each inventory follows different management policies and is maintained in different conditions (e.g. cryo and in vitro, or base and active collection).

Inventory management is a topic of genebank collection management and is not further described here.

3.1. Accession number

Accession number is the unique identifier assigned to the material as it enters the collection. This identifier generally has three components:

Prefix + Sequence number + Suffix

The prefix is commonly used to differentiate between different crop collections maintained by the genebank.

Some prefixes used by IITA genebank
  • TMe Cassava Manihot esculenta collection

  • TVSu Bambara groundnut Vigna subterranea collection

  • TZm Maize Zea mays collection

Sequence number is assigned manually or by a computer system to ensure there are no duplicates. Some institutes prefer to zero-pad the number 00000102.

The suffix allows differentiating samples of the same original material. A suffix might be used after making a selection from the original accession (e.g. a single seed descent) to be maintained as a separate sample. The exact meaning of the suffix is different for every institute.

Table 1. Example accession numbers
Prefix Sequence number Suffix Accession number

TMe

419

TMe-419

TVSu

13

TVSu-13

3.2. Other accession identifiers

Material enters the collection by collecting, from breeding programs, or acquisition from other institutes. In each case, the material will already have some identifier assigned by the collector, breeder or other institute.

Accession name is the vernacular name of the material and is commonly captured by the collector or assigned by the breeder.

3.2.1. Collected material

Genebank accessions obtained through collecting missions should maintain data about the site and dates of the collecting and collector information.

3.2.2. Breeders material

Lines developed by breeding programs of the institute may be included the collection. Information provided by the breeders should include the pedigree or ancestral information (selection history) of the material, along with names and identifiers used by the breeding program and the codes and names of institutes that developed the material.

3.2.3. Acquisitions

Material coming from other institutes and genebanks must be accompanied by accession passport data as documented in the source genebank.

Country of origin is the country where the material was collected or bred, not the country of the source genebank.

Accession documentation should capture any identifiers provided by the source institute. This data allows for validation and curation of passport data between the genebanks and allows researchers to obtain material from either collection.

3.3. Taxonomy

Accession genus, species, species author, subtaxon and subtaxon authority are usually known, but are subject to change after expert identification or change in taxonomic system.

3.4. Storage and maintenance

Ex situ genebanks maintain PGR material as seed, in the field, in vitro, cryo or in DNA collections. Inventories (lots) of one accession may be managed by different methods (e.g. seed and cryo). See Storage in MCPD standard on how to capture multiple types of storage.

4. FAO WIEWS

The World Information and Early Warning System (WIEWS) on Plant Genetic Resources for Food and Agriculture (PGRFA), has been established by FAO, as a world-wide dynamic mechanism to foster information exchange among Member Countries and as an instrument for the periodic assessment of the State of the World’s PGRFA.

The FAO WIEWS database contains basic information about institutes working with PGRFA. The data includes full names, acronyms, website links and contact information.

Genesys regularly updates the list of institutes from the FAO WIEWS database and makes them accessible at https://www.genesys-pgr.org/wiews/active.

This data cannot be directly managed through Genesys, changes must be applied to the WIEWS database.

4.1. WIEWS Institute Codes

The FAO WIEWS code of the institute consist of the 3-letter ISO 3166-1 alpha 3 country code of the country where the institute is located plus a number (e.g. COL001, USA1004).

The Multi-Crop Passport Descriptors standard relies on WIEWS codes.

The automated import of institute data allows Genesys to present individual pages for genebanks registered in FAO WIEWS database.

Table 2. Direct access to genebank pages using WIEWS code
WIEWS Code Genesys URL

COL001

https://www.genesys-pgr.org/wiews/COL001

NGA039

https://www.genesys-pgr.org/wiews/NGA039

4.2. Obtaining a WIEWS code

A new WIEWS INSTCODE can be generated online by contacting your country National Focal Point or wiews@fao.org.

4.3. Inactive WIEWS codes

The WIEWS code of an institute may change. In that case, the record is marked as inactive and it will refer to the newly assigned code. Genesys will render a message that the institute record is archived and provide a link to the new code:

5. Multi-Crop Passport Descriptors

The Multi-crop Passport Descriptors (MCPD V.2.1) is an update to MCPD V.2 which was released in 2012. The MCPD V.2 was a revision of the first FAO/IPGRI publication released in 2001, expanded to accommodate emerging needs, such as the broader use of GPS tools, or the implementation of the International Treaty on Plant Genetic Resources for Food and Agriculture Multilateral System for access and benefit sharing.

This MCPD V.2.1 list is an expansion of the first version of the MCPD, the descriptors and allowed values of the first version form a subset of those in this revision. The 2001 list, developed jointly by Bioversity International (formerly IPGRI) and FAO, has been widely used and is considered the international standard to facilitate germplasm passport information exchange. These descriptors aim to be compatible with Bioversity’s crop descriptor lists, with the descriptors used for the FAO World Information and Early Warning System (WIEWS) on plant genetic resources (PGR), and with the Genesys PGR global portal.

For each multi-crop passport descriptor, a brief explanation of content, coding scheme and, in parentheses, suggested fieldname are provided to assist in the computerized exchange of this type of data.

The authors of the MCPD recognize that networks or groups of users may further expand the MCPD list to meet their specific needs. As long as these additions allow for an easy conversion to the format proposed in MCPD V.2, basic passport data can be exchanged worldwide in a consistent manner.

5.1. MCPD Descriptors

Table 3. MCPD descriptors
Field name Description

PUID

Any persistent, unique identifier assigned to the accession so it can be unambiguously referenced at the global level and the information associated with it harvested through automated means. Report one PUID for each accession.

INSTCODE

FAO WIEWS code of the institute where the accession is maintained.

ACCENUMB

Unique identifier of the accession within a genebank.

COLLNUMB

Original identifier assigned by the collector(s) of the sample, normally composed of the name or initials of the collector(s) followed by a number (e.g. FM9909). This identifier is essential for identifying duplicates held in different collections.

COLLCODE

FAO WIEWS code of the institute collecting the sample.

COLLNAME

Name of the institute collecting the sample. This descriptor should only be used if COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.

COLLINSTADDRESS

Address of the institute collecting the sample. This descriptor should only be used if COLLCODE cannot be filled because the FAO WIEWS code for this institute is not available.

COLLMISSID

Identifier of the collecting mission used by the Collecting Institute (e.g. CIATFOR-052, CN426).

GENUS

Genus name for taxon. Initial upper case letter required.

SPECIES

Specific epithet portion of the scientific name in lower case letters.

The abbreviation sp. or spp. is allowed when exact species name is unknown.

SPAUTHOR

Provide the authority for the species name.

SUBTAXA

Subtaxon can be used to store any additional taxonomic identifier. The following abbreviations are allowed: subsp. (for subspecies); convar. (for convariety); var. (for variety); f. (for form); Group (for cultivar group).

SUBTAUTHOR

Provide the subtaxon authority at the most detailed taxonomic level.

CROPNAME

Common name of the crop. Example: malting barley, macadamia, maize.

ACCENAME

Either a registered or other designation given to the material received, other than the donor’s accession number (DONORNUMB) or collecting number (COLLNUMB). First letter upper case.

ACQDATE

Date on which the accession entered the collection where YYYY is the year, MM is the month and DD is the day. Missing data (MM or DD) should be indicated with hyphens or 00 [double zero].

ORIGCTY

3-letter ISO 3166-1 code of the country in which the sample was originally collected (e.g. landrace, crop wild relative, farmers' variety), bred or selected (breeding lines, GMOs, segregating populations, hybrids, modern cultivars, etc.).

COLLSITE

Location information below the country level that describes where the accession was collected, preferable in English. This might include the distance in kilometers and direction from the nearest town, village or map grid reference point, (e.g. 7km south of Curitiba in the state of Parana).

DECLATITUDE

Latitude expressed in decimal degrees. Positive values are North of the Equator; negative values are South of the Equator (e.g. -44.6975).

DECLONGITUDE

Longitude expressed in decimal degrees. Positive values are East of the Greenwich Meridian; negative values are West of the Greenwich Meridian (e.g. +120.9123).

COORDUNCERT

Uncertainty associated with the coordinates in meters. Leave the value empty if the uncertainty is unknown.

COORDDATUM

The geodetic datum or spatial reference system upon which the coordinates given in decimal latitude and longitude are based (e.g. WGS84, ETRS89, NAD83). The GPS uses the WGS84 datum.

GEOREFMETH

The georeferencing method used (GPS, determined from map, gazetteer, or estimated using software). Leave the value empty if georeferencing method is not known.

ELEVATION

Elevation of collecting site expressed in meters above sea level. Negative values are not allowed.

COLLDATE

Collecting date of the sample, where YYYY is the year, MM is the month and DD is the day. Missing data (MM or DD) should be indicated with hyphens or 00 [double szero].

BREDCODE

FAO WIEWS code of the institute that has bred the material. If the holding institute has bred the material, the breeding institute code (BREDCODE) should be the same as the holding institute code (INSTCODE). Follows INSTCODE standard.

BREDNAME

Name of the institute (or person) that bred the material. This descriptor should only be used if BREDCODE cannot be filled because the FAO WIEWS code for this institute is not available.

SAMPSTAT

Biological status of the accession.

ANCEST

Information about either pedigree or other description of ancestral information (e.g. parent variety in case of mutant or selection). For example a pedigree Hanna/7*Atlas//Turk/8*Atlas or a description mutation found in Hanna, selection from Irene or cross involving amongst others Hanna and Irene.

COLLSRC

Collecting/acquisition source

DONORCODE

FAO WIEWS code of the donor institute. Follows INSTCODE standard.

DONORNAME

Name of the donor institute (or person). This descriptor should be used only if DONORCODE cannot be filled because FAO WIEWS code for this institute is not available.

DONORNUMB

Identifier assigned to an accession by the donor. Follows ACCENUMB standard.

OTHERNUMB

Any other identifiers known to exist in other collections for this accession. Use the following format: INSTCODE:ACCENUMB;INSTCODE:identifier;… INSTCODE and identifier are separated by a colon : without space. Pairs of INSTCODE and identifier are separated by a semicolon ; without space. When the institute is not known, the identifier should be preceeded by a colon.

DUPLSITE

FAO WIEWS code of the institute(s) where a safety duplicate of the accession is maintained.

The WIEWS institute code for Svalbard Global Seed Vault is NOR051.

DUPLINSTNAME

Name of the institute where a safety duplicate of the accession is maintained.

STORAGE

Type of germplasm storage. If germplasm is maintained under different types of storage, multiple choices are alllowed, separated by a semicolon (e.g. 20;30).

MLSSTAT

The status of an accession with regards to the Multilateral System (MLS) of the International Treaty on Plant Genetic Resources for Food and Agriculture. Leave the value empty if the status is not known.

REMARKS

The remarks field is used to add notes or to elaborate on descriptors with value 99 or 999 (= Other). Prefix remarks with the field name they refer to and a colon (:) without space (e.g. COLLSRC:riverside). Distinct remarks referring to different fields are separated by semicolon without space.

5.1.1. Persistent unique identifier

Any persistent, unique identifier assigned to the accession so it can be unambiguously referenced at the global level and the information associated with it harvested through automated means. Report one PUID for each accession.

There are various "types" of PUIDs: DOI, UUID, LSID, etc.

The Secretariat of the ITPGRFA is facilitating the assignment of DOI to PGRFA at the accession level (http://www.planttreaty.org/doi).

UUID (Universally unique identifier) is an identifier standard used in software. A UUID is simply a 128-bit value (16 bytes).

For human-readable display, many systems use a canonical format using hexadecimal text with inserted hyphen characters. For example:

de305d54-75b4-431b-adb2-eb6b9e546014

The intent of UUIDs is to enable distributed systems to uniquely identify information without significant central coordination. In this context the word unique should be taken to mean "practically unique" rather than "guaranteed unique".

Different variants and versions of UUID exist. Version 4 (Random UUID) is most commonly used in software.

Genebanks not applying a true PUID to their accessions should use, and request recipients to use, the concatenation of INSTCODE, ACCENUMB, and GENUS as a globally unique identifier similar in most respects to the PUID whenever they exchange information on accessions with third parties (e.g. NOR017:NGB17773:ALLIUM).

5.1.2. Crop name

Genesys will read the CROPNAME as provided and attempt to link the name with an existing crop record in Genesys. Genesys currently supports the following crop names:

apple, banana, barley, beans, breadfruit, cassava, chickpea, coconut, cowpea, eggplant, fababean,
fingermillet, grasspea, lentil, lettuce, maize, pearlmillet, pigeonpea, potato, rice, sorghum,
sunflower, sweetpotato, taro, tomato, wheat, yam

The up-to-date list of crops and their coded names is available at https://www.genesys-pgr.org/c/

As more data is uploaded to Genesys we will add aliases to crops, making sure that future uploads properly link the accession with the specified crop.

You are encouraged to use the crop names listed above, but more importantly, let helpdesk@genesys-pgr.org know if your crop is not yet listed.

5.1.3. Institute codes in MCPD

Values for INSTCODE, COLLCODE, BREDCODE, DONORCODE and DUPLSITE must be provided as FAO WIEWS codes of institutes.

5.1.4. Biological status of accession

The coding scheme proposed can be used at 2 different levels of detail: either by using the general codes such as 100, 200, 300, 400, or by using the more specific codes such as 110, 120, etc.

Allowed values for SAMPSTAT field
  • 100 Wild

    • 110 Natural

    • 120 Semi-natural/wild

    • 130 Semi-natural/sown

  • 200 Weedy

  • 300 Traditional cultivar/landrace

  • 400 Breeding/research material

    • 410 Breeder’s line

    • 411 Synthetic population

    • 412 Hybrid

    • 413 Founder stock/base population

    • 414 Inbred line (parent of hybrid cultivar)

    • 415 Segregating population

    • 416 Clonal selection

    • 420 Genetic stock

    • 421 Mutant (e.g. induced/insertion mutants, tilling populations)

    • 422 Cytogenetic stocks (e.g. chromosome addition/substitution, aneuploids, amphiploids)

    • 423 Other genetic stocks (e.g. mapping populations)

  • 500 Advanced or improved cultivar (conventional breeding methods)

  • 600 GMO (by genetic engineering)

  • 999 Other (Elaborate in REMARKS field)

5.1.5. Accession storage

If germplasm is maintained under different types of storage, multiple values are allowed. When an accession is maintained in active- and base collections, STORAGE corresponds to 11 and 13 and can be encoded as 11;13.

Allowed values for STORAGE field
  • 10 Seed collection

    • 11 Short term

    • 12 Medium term

    • 13 Long term

  • 20 Field collection

  • 30 In vitro collection

  • 40 Cryopreserved collection

  • 50 DNA collection

  • 99 Other (elaborate in REMARKS field)

5.2. Genesys extensions to MCPD

Table 4. MCPD extensions
Field name Description

ACCEURL

Accession URL

AVAILABLE

Indicates current availabilty of accession for distribution

HISTORIC

Indicates whether the record represents an accession no longer actively maintained by the genebank

UUID

Universally unique identifier of the accession record

5.2.1. Accession URL

ECPGR originally extended the MCPD list with Accession URL field ACCEURL. The field should contain the direct link to the provider’s on-line portal where additional data about the accession may be available.

Passport data of IITA’s TDr-3616 yam accession
ACCEURL: http://my.iita.org/accession2/accession/TDr-3616

5.2.2. Accession availability

Genesys allows end-users to request for material from holding institutes. Accession records marked as not available in Genesys will be excluded from user’s request.

In addition to the availability flag, genebanks must opt-in to allow end-users to request for material through Genesys.

5.2.3. Historic records

Accessions are on occasion removed from the collection. This is especially true for pre-bred material and genetic stocks that are maintained by the genebank for a limited period of time. The records about such material must not be deleted from the databases as they can potentially be tracked to other collections where the material is still actively maintained.

The holding genebank may want to mark these records by setting the value of HISTORIC field to true.

Values null (not specified) and false indicate that the record represents an actively managed accession.

Historic accessions cannot be requested through Genesys.

6. Other relevant standards

6.1. ISO-3166 Country codes

ISO-3166 standard defines Codes for the representation of names of countries and their subdivisions. ISO-3166-1 alpha-3 codes are three-letter country codes. The Wikipedia page contains the listing of valid country codes. Genesys uses http://download.geonames.org/export/dump/countryInfo.txt as the source of ISO-3166 country codes.

6.2. UN M.49

UN defines standard country or area codes and geographical regions for statistical use: