Genesys now unpacks raw characterization and evaluation data from genebanks.
Characterization and evaluation (C&E) data stimulates the use of germplasm collections. Since 2018, Genesys has provided access to C&E datasets from various genebanks for download and analysis by users. But querying trait data directly in Genesys required some database magic. The latest update of Genesys provides just that: yes, it is now possible to search the C&E datasets for specific traits!
Searchable datasets display the following notice, and you can access them by clicking the banner.
Searchable datasets follow the familiar Genesys layout: the filtering options are listed on the left, and the main display area is used to show trait data that match the specified filters. The data are initially not filtered and the grid allows you to browse through the entire dataset.
The Observations tab displays trait data in a grid, one row per accession.
The first column (My List) indicates whether the accession is already in your list. Tick the checkbox to add it, and untick the checkbox to remove it.
The Accession column displays the accession number with a link to its passport data.
The remaining columns are for the traits included in the dataset. Multiple observations may be listed for individual traits, and when searching for traits of interest, a match is made if any of the observations of the trait fit the criteria.
The initial order of traits in the grid is determined by the genebank when it registers the dataset in Genesys. As with all other grids in Genesys, you can rearrange the columns by dragging them to the desired position.
The gear icon ⚙️on the right edge of the grid header allows you to toggle the visibility of columns and adjust their display: wrap text, apply heavier font weight, and adjust text alignment. You can close the grid settings by clicking or tapping on the Save icon 💾.
Grid settings are saved in your browser and will be used every time you visit this dataset.
The action button in the bottom right corner of the grid allows you to download the data in Excel format. It downloads only the rows currently loaded in the grid and will not include data that is not yet loaded in your browser.
The left panel lists the individual traits: their names, descriptions, filtering options, and a link to the definition of the descriptor that was used. The order of filters follows the order of traits when the dataset was registered.
Crop descriptors document how the observations were recorded. The filtering options depend on the type of descriptor.
The most common type of descriptor in characterization data is coded. These are used to capture categorical data and specify what each code represents.
Filtering options for coded descriptors are displayed as a list of checkboxes, indicating that multiple options can be selected.
Quantitative data from exact measurements (counts, lengths, widths, etc.) or their mean values use numerical descriptors. In addition to the methodology used to make the observations, these must specify the unit of measurement.
Numerical descriptors allow you to enter a value range, where At least means the observed value must be greater than or equal to this value (i.e. the minimum value) and At most requires that the trait value be less than or equal to the specified maximum.
Minimum or maximum value can be left blank for the total range.
Numerical observations are sometimes normalized and expressed on a scale, for example from short to tall, or from susceptible to resistant. Such ordinal scale descriptors are often found in characterization and evaluation data. In the example below, the damaged area of leaves is not expressed as the exact percentage of leaf area, but as a scale where leaf damage goes from low damage (1) to extensive damage (5). You can adjust the range of interest by dragging the two circular buttons.
Text descriptors are used for unstructured data such as notes and comments. Entering a value in a text filter will match accessions where the note contains the phrase specified in the filter (e.g. “2006” will match both “2006DS” and “DS2006”).
Genesys does not automatically refresh the data in the grid when you set the trait filters. You must click or tap Apply filters for the filters to work.
Your combination of filters may result in zero matching records. If that happens, try relaxing the criteria by specifying a wider option or by removing one or more filters.
After Observations, the next tabs are: Overview; Analyze (this is a work in progress, and we’ll reserve it for later); and Description.
The Overview tab provides various visualizations of the data. Numerical traits are shown as histograms, while coded and scale descriptors use bar charts.
These visualizations reflect the selected data range (records that match your filters) and may help you in determining which filters to apply next!
This tab displays the metadata for the dataset. It also has links to download the source files as uploaded by the genebank, including trait observations and the passport data of accessions included in the dataset.
In 2018, we completed the “Genesys Catalog of Phenotypic Datasets Linked to Genebank Accessions” project, which aimed to make accession-level data on the plant genetic resources that have been characterized or evaluated by genebanks available to researchers. The work was supported by the Federal Republic of Germany (project reference GenR 2016-1).
We learned that requiring data to be provided in a specific format results in reluctance to share data, primarily because genebanks were not able to reliably convert the data from its original shape to the required standard format. The project therefore focused on providing access to data files as produced by genebanks, without any interpretation that might cause loss of information.
Genebank C&E datasets come in all shapes and sizes, making the unpacking and interpretation of data quite a complex task. Genesys enables genebanks to validate and document trait data as part of the publishing process. They provide assurance that data is correctly decoded and linked to a descriptor that fully documents the trait and its encoding.
If the Catalog project was the first big step in making trait data more easily available in Genesys, then making them searchable marks the second step. The visualizations and searching should make it more attractive for genebanks to publish datasets in Genesys. And of course more attractive for users to explore the data and hopefully easier to find the diversity they want.
As genebanks often use a fairly consistent set of descriptors for their accessions of a particular crop, it may be possible to make traits searchable across multiple datasets of one crop in a given genebank. Making data searchable across genebanks is a bigger challenge, as genebanks unfortunately do not all use the same descriptors. When traits “Grain color” or “Susceptibility to rust” are coded differently, the descriptors first need to be mapped and their data made compatible for searching. This will require development of methods to make different descriptors somehow compatible.
We’re on it.