Advanced data filters

By christelle.rabil@croptrust.org
5 June 2023

Filtering data by taxonomy and provenance is cool, but do you know about our advanced filtering options?

There are many ways to find the accessions you are looking for in Genesys. On the Genesys homepage, you can begin your search for accessions, subsets, datasets, or descriptor lists by entering the keywords of interest.

image.png

The text search is a useful way to get started with Genesys, but it is not the silver bullet you may be hoping for. The most reliable method to query Genesys is by using the data filters described in this post.

Layout

Regardless of how you access the accession passport data page, the filters pane will be on the left-hand side, with the filters organized into groups (which are collapsed by default). The main section of the layout presents accession data in a few different ways using the five tabs at the top.

image.png

The Overview tab summarizes the number of accessions matching your filters by different categories. This tab provides an overall picture of accession data. The second tab, Accessions, shows passport data row by row. This is also where you can download the passport data of your selected accessions. Map displays the localities where the material was collected (matching any filters you have applied). Images provides access to a gallery of pictures associated with accessions (again matching your filters). The final tab, Subsetting Tool, lets you create subsets – more on that in an upcoming post!

The content shown in this main section of the layout always corresponds to the currently applied filters. A good understanding of which filters to apply, and how, will enable you to find accessions of interest faster and more reliably.

Filter groups

Historical Records

A historical record is the passport data of an accession that no longer exists in the record-keeping genebank collection. By default, historical records are excluded from the search.

Text Search

Whether you type in a species name, country, or simply a number, the full text search will scan all the passport data in Genesys and yield accessions with the associated text.

You can build a search query using double quotes to limit the search to an exact phrase ("rice"); a pipe symbol as an OR operator (rice | leaf); an asterisk to match a prefix (rice*); and parentheses for grouping.

You can even search accessions for words that may be spelled slightly differently using a fuzzy search. Apply a fuzzy search by adding a tilde at the end of a word (solanum~). 

💡 Tip: A full text search query will not necessarily result in a 100% match. Other filters below may offer greater precision.

Holding Institute

A holding institute is the genebank that conserves an accession. International, regional, and national genebanks are the primary contributors of data to Genesys. They are all listed in our data providers directory, either as an independent partner or under a data provider, since some of our partners aggregate and provide data to Genesys on behalf of several genebanks.

Institute code

You can narrow your search to only accessions in a specific genebank, using the genebank’s WIEWS institute code. Type the first few letters of the genebank’s full name or acronym and watch Genesys auto-complete your typing. Note that only the first 10 matches are displayed, sorted by the number of that institute’s accessions in Genesys. You may need to enter more characters to find the institute you’re looking for.

image.png

Even if you have not entered anything in the field, Genesys simply displays the top 10 institutes by number of accessions. You can apply one of these suggested options by clicking or tapping on it. The list of top 10 suggestions is automatically updated as you add other filters to your search query. If you filter for Zea mays, for example, the top 10 suggestions will include only the institutes with Zea mays accessions.

Country of holding institute

Instead of focusing on one or more institutes, you may want to focus your search on the country where the genebank is located, i.e. the country of the holding institute. For example, to filter for accessions that are conserved in genebanks in Italy, type in the country name to bring up the auto-completer, or enter the country’s ISO code (ITA).

Accession Number

The accession number is the unique identifier of the material in the collection. This identifier is often composed of a prefix, a number, and a suffix (e.g. “TMe-419”, “AGG 5 WHEA”).

This filter allows you to search for accessions by entering accession numbers manually, by copy-pasting a list of accession numbers from a spreadsheet, or by specifying the minimum and maximum sequential number.

image.png

Press the Enter key to add more accession numbers to the query.

image.png

💡 Tip: You can copy a list of accession numbers from Excel and paste it into the Accession number filter.

Sequential number

Genesys automatically extracts the numerical part of the accession number so that you can query for a numeric range of accession numbers.

Date Search

There are many dates associated with accessions, such as when they were acquired by the genebank or when they were collected from the field. The date search filters are concerned specifically with metadata about the accession information. There are three filters:

Created on

This filters by the time period when the accession information was created in Genesys for the first time.

Last modified

This filters by the time period when the accession information was most recently updated or modified in Genesys.

Taxonomy last modified

This filters more specifically by the time period when the taxonomy of an accession was most recently updated or modified.

image.png

The date search filters always work in a range, with the left-hand date meaning “created/last modified starting...” and the right-hand date meaning “created/last modified before...”. When either is left unspecified, this indicates “whenever”. If you need to filter for one specific day, then you need to select this date in both the left and right field.

image.png

Crop

Genesys aggregates accession data based on the crop names in English as provided by genebanks, and enables you to search these aggregations by crop.

💡 Tip: Crop aggregations are very limited and may be incomplete due to differences in spelling or local names of crops. For a more accurate search and overview of the accessions in Genesys, search by taxonomic names instead.

Taxonomy

You can filter out accessions by their scientific names in different ways:

  • Genus name (make sure to capitalize the first letter, for example: Zea)

  • Species, meaning both the genus name and species name (for example: Zea mays)

  • Specific epithet, meaning only the species name (for example: mays)

  • Subtaxon, any level of taxonomy underneath the species name

GRIN Taxon ID

The four fields above allow you to filter for accessions by their taxonomy as provided by the genebank. Alternatively, you may filter for accessions by their taxonomy as grouped by Genesys according to the GRIN Taxonomy.

Genesys publishes the passport data that genebanks submit, including the taxonomy. The Genesys algorithm then tries to match the genebank’s taxonomy with GRIN Taxonomy, and allows users to filter by GRIN Taxon ID.

💡 Tip: you can copy and paste a list of cells from Excel in this filter

Matching to GRIN Taxonomy is also what allows Genesys to include synonyms in the search results using taxonomy. Look for the button “Include synonyms in search results” under the Taxonomy filter group.

image.png

Origin of Material

Provenance of material

This option filters passport data according to the country from which the material comes:

  • If an accession was collected in the field during the course of a collecting expedition, then its provenance is the country of the collecting site.

  • If the material is a breeding line, then its provenance is the country where the material was developed.

Alternatively, you may search for an accession by its origin at specific geographic coordinates such as elevation, longitude, and latitude (in decimal format). These filters work using ranges of numerical values. To filter for accessions where the elevation is over 1,000 meters, enter 1000 in the left field, At least (min). To filter for accessions where the elevation is at most 2,000 meters, enter 2000 in the right field, At most (max). Using both of these will filter for accessions where the elevation is between 1,000 and 2,000 meters.

image.png

💡 Tip: If you want to look for a specific number instead of a range, you need to insert the value in both the minimum and maximum fields.

Collecting Data

image.png

The Collecting Data filter group enables you to filter by information related to the accession collecting mission. This includes the following filters:

Collecting date

This is the collecting date of the accession in YYYYMMDD format, where YYYY is the year, MM is the month, and DD is the day. If you do not wish to specify a month or day, you can replace the MM or DD with two hyphens or zeroes (-- or 00).

Collecting number

This is an original identifier assigned by the collector(s) of the sample, normally composed of the name or initials of the collector(s) followed by a number (e.g. “FM9909”).

Collecting mission

This is the identifier of the collecting mission used by the collecting institute (e.g. “CIATFOR-052”, “CN426”).

Location of collecting site

This is location information below the country level that describes where the accession was collected. This might include the distance in kilometers and direction from the nearest town, village or map grid reference point (e.g. “7 km south of Curitiba in the state of Parana”).

Biological Status of Accession

Here you can filter for accessions by their level of improvement: wild, landrace, breeding material, etc.

image.png

Type of Germplasm Storage

Ex situ genebanks maintain plant genetic resources as seed, in the field, in vitro, in cryo, or in DNA collections. A single accession may be maintained in different conditions; therefore, multiple options are possible.

Status

This filter category groups six variables about an accession related to its status in Genesys, in the genebank, and in relation to different international bodies.

Available for distribution

Through this status variable, the genebank indicates to Genesys users whether the accession has enough inventory to be distributed.

Geo-referenced

This is a simple filter to limit the search to accessions with (or without) geographic coordinate data (latitude and longitude).

Included in the MLS

Through this variable, the genebank indicates whether the accession is in the Multilateral System (MLS) of the International Treaty on Plant Genetic Resources for Food and Agriculture.

Backed up in SGSV

Through this variable, the genebank indicates whether the accession is backed up in the Svalbard Global Seed Vault (SGSV).

Accession with images

This will limit the search to accessions with (or without) images of the accession published in Genesys.

AEGIS accession

This filters by whether the accession is part of A European Genebank Integrated System (AEGIS), an initiative by the European Cooperative Programme for Plant Genetic Resources aiming to efficiently conserve and provide access to unique germplasm in Europe through the establishment of the European Collection.

Referenced Accessions

Many accessions are included in the subsets or datasets that genebanks publish in Genesys. In order to filter out accessions that are part of a certain subset or dataset, copy the unique ID from the set’s page in the Data and resources section (see the highlighted part in the screenshot below).

image.png

Then paste this unique ID in the passport data filters sidebar. For one dataset, you can also click Filter Accessions on the dataset page directly – but copying and pasting the unique ID allows for filtering accessions from many sets simultaneously.

image.png

💡 Tip: Check out the recording of our webinar on subsets and trait data in Genesys for more information.

Climate at Origin

Genesys allows you to filter for accessions according to 19 temperature and precipitation variables from collecting sites. This filter only works for accessions with geographic coordinates provided by the genebanks.

A separate article details how to use this filter. 

Tips and tricks

Directory pages

In the Directory navigation menu on the top banner of Genesys, you will find pages that aggregate accessions by country of provenance, crop, and holding institute. These pages include filtered accessions based on those categories.

Getting an overview as you filter

The filters sidebar also acts as a mini-dashboard where you can get an overview of the distribution of accessions across the different variables, for example how many accessions in the current filtering status are aggregated under the amaranth crop.

image.png

After filtering accessions using any option, not only do the overview numbers re-adjust, but also the filter choices themselves. For example, if you filter for accessions collected from Uruguay and none of the results are tagged as amaranth, then this option will be rendered invisible under the Crops filter group. This is one of the many ways Genesys makes it convenient to sift through over 4 million records. 

Excluding matches

Genesys will search for the value you provide for a filter, and these values are prefixed by a plus symbol. For example, filtering for rice accessions shows that many are maintained by the IRRI genebank (PHL001) – but you may want to exclude IRRI accessions from the search. Just filter by this Institute code, and click or tap the “+” to toggle it to “-”. You are now instructing Genesys that the results must exclude the specified value.

image.png

The excluded PHL001 is displayed at the top of the page when this change is applied.

image.png

💡 Tip: The number of accessions that you see at the top of the page after filtering sometimes has an “about” before it, as here. This means that the number is an approximation; Genesys does this to increase the performance of the software. You can get the exact number by downloading the MCPD file of the passport data in your results.

Removing individual filters

The top of the page is your go-to in order to remove one of the filters you applied without resetting them all. In the screenshot above, if you changed your mind about filtering by the Oryza genus, you could simply click on the “x” button of that filter.

Further examples and help

In case you missed it, here is the recording of the webinar we organized on how to use Genesys that includes examples of how to use these filters, and more! If you have any questions, send them over to helpdesk@genesys-pgr.org.

You may also be interested in