User guide: How to Upload Trait Data

By christelle.rabil@croptrust.org
1 July 2024

Welcome to the comprehensive guide designed for genebank data managers. This manual aims to equip you with the knowledge and skills necessary to efficiently successfully upload trait data in Genesys in a searchable format.

This tutorial covers uploading trait data to Genesys, based on the Genesys webinar series. The process includes preparing your data, ensuring accuracy, and using the web-based tool for uploading.

Prerequisites

  1. Account Access: Ensure you have login details and permissions to upload from your account in either Genesys or its Sandbox environment.

  2. Excel File Preparation: Before utilizing the web-based upload tool in Genesys, confirm that your Excel file contains the following minimum passport data columns:

  • Institute code (INSTCODE)

  • Genus name (GENUS)

  • Accession number (ACCENUMB)

  • DOI numbers of the accessions (DOI) when applicable

Then verify that the accessions listed in your Excel document are paired with existing passport data in Genesys, a crucial step to ensure the seamless upload of your trait data:

  • Paste Accessions: Begin by selecting all relevant accession numbers from your Excel file and pasting them into the Genesys Accession Number filter to identify any accessions not currently recorded in Genesys (figure 1).

  • Genus Name Verification: Ensure genus names are accurately listed to prevent errors during the upload process. Incorrectly listed names may necessitate a re-upload of the file.

image.png

Figure 1: Checking accession passport data

Accessing the dataset uploader

1. Log In: Use the provided credentials to log into the Genesys or Sandbox platform (figure 2).

image.pngFigure 2: Logging in to Genesys

2. Dashboard: Navigate to the dashboard and locate the "Trait Data" menu and then select "Datasets" (figure 3).

image.pngFigure 3: Accessing the trait data uploader

Step 1: Create a Dataset in Genesys

  1. Select Data Provider: Indicate the source or institution providing the dataset.

  2. Dataset Title: Name your dataset in a clear, descriptive manner.

  3. Dataset Version: Specify the current version of your dataset. This is crucial for tracking updates and revisions.

  4. Dataset Description: Provide a comprehensive description of your dataset. Basic markdown is supported here, enabling you to format your text for better clarity and readability.

  5. Date of creation of the dataset: Provide the date when the dataset was created in Genesys in the YYYYMMDD format. If the month or day are unknown, then please replace them with a double zero (e.g. 20240600).

  6. Dataset Rights: Selecting the appropriate dataset rights is essential to define how others can use and cite your data. Genesys PGR supports a range of Creative Commons licenses, allowing you to choose the level of copyright protection and sharing freedom that best suits your dataset:

    • CC0 Public Domain Dedication: Maximizes data usability by placing your dataset in the public domain. More Info

    • CC BY 4.0 Attribution 4.0 International: Allows others to distribute, remix, adapt, and build upon your data, even commercially, as long as they credit you for the original creation. More Info

    • CC BY-SA 4.0 Attribution-ShareAlike 4.0 International: Lets others remix, adapt, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms. More Info

    • CC BY-NC 4.0 Attribution-NonCommercial 4.0 International: Enables others to remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms. More Info

    • CC BY-NC-ND 4.0 Attribution-NonCommercial-NoDerivatives 4.0 International: Allows others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially. More Info

  7. Language: Specify the language(s) of your dataset to ensure accessibility and usability on a global scale.

  8. Source: A link to the publication detailing the origin or methodology through which the dataset was compiled.

  9. Crops: Select the specific crops featured in your dataset, facilitating targeted research and exploration. If you cannot find the crop listed among the options, you may skip this field.

Step 2: Upload Your Excel File

  1. Upload: Click "+ Add Dataset File" and upload your prepared Excel file.

  2. Ingest Data: After uploading, click "Load Trait Data to Genesys" to start the data ingestion process (figure 4).

image.pngFigure 4: Adding source file

Step 2.1: Link Accessions to passport data in Genesys

The most important step in this process is to make sure that the accessions in your Excel file are linked to their passport data in Genesys. If this step is not successful, the work that follows will be done in vain.

You will know that the accessions are still not linked when the first column in the trait data uploader titled “Accessions” (which Genesys adds automatically, it’s not part of your Excel file) is empty. When the accessions are successfully linked, the “Accessions” column will be populated with links to passport data. 

Properties: Set default properties such as crop name, genebank, and genus (figure 5).

image.pngFigure 5: Set properties

2. Map Required Columns: Click on required column titles (Institute Code, Genus, Accession Number and DOI if available) and rename them to corresponding MCPD descriptors in Genesys (INSTCODE, GENUS, ACCENUMB and DOI) only in the Column name field, then click “Enter” on your keyboard (figure 6).

image.pngFigure 6: Map minimum passport data columns

3. Link accessions: Click "Link Accessions" to link your data columns with the existing passport data in Genesys. This should populate the first column in the dataset titled “Accession” with links to the accession passport data in Genesys (figure 7).

image.pngFigure 7: Link accessions to passport data in Genesys

4. Verification: Ensure all accessions are successfully linked by checking that the first column “Accession” is all populated. For that, you can untick the second filter in the filter pane on the left hand side titled “Accession exists for now” and only keep the first one titled “Row not mapped to accession, then click on the “Apply filters” button.

If all accessions were successfully linked, then you will see the following message “No records match the selected criteria”.

You can remove those filters and visualize the entire unfiltered trait dataset by clicking on the Reset button (figure 8).

image.pngFigure 8: Check for unlinked accessions

Step 2.3: Map Trait Data to existing descriptors

1. Link to existing Descriptors: Link your data columns to existing descriptors in Genesys if available. For this purpose, click on the trait column title. This will open a dialog box prompting you to search for existing descriptors in Genesys on the left hand side. If you have found the descriptor that applies to your trait, click on it and it will automatically link to your column (figure 9).

image.pngFigure 9: Map column to an existing descriptor in the Genesys database

2. Handling error messages: When mapping your data to an existing descriptor, you might encounter errors if some values in your file do not match the allowed values for that descriptor. Here’s how to handle such situations. Upon attempting to link, an error message will appear indicating that some values do not match the descriptor's criteria. The Genesys interface highlights the mismatched values in your dataset. If the system’s guess isn’t accurate, manually map each incorrect value to the correct one. Use the tool to change the mismatched values to the nearest acceptable value defined in the descriptor (figure 10).

image.pngFigure 10: Fix irregularities in the data and register observations

3. Register observations: Once all mismatched values are corrected, register the observations to finalize the linkage. This step is important, skipping the “Register observations” button means that the trait data in that column will not be included in the published dataset. You may try the “Dry run” button to ensure there are no further mismatches, and to confirm that all data values align with the descriptor’s criteria. After clicking on “Dry run”, you still need to click on “Register observations”.

After applying these steps above, preview the dataset to see how it will appear by clicking on “View Dataset” (figure 11). Ensure that all corrected values are properly linked and displayed.

image.pngFigure 11: Preview dataset

Step 2.3: Map Trait Data to new descriptors

If Genesys could not find an existing descriptor that matches a trait column, you can create a new descriptor directly as you are uploading the dataset.

  1. Enter Descriptor Details: Click on the column header for the trait you want to create a new descriptor for. Then describe your data column by entering the following (figure 12-13):

    • Title: Enter a clear and descriptive title. Leave the “Column name” field as it is, and only change “Title”, as this is what the users will see when the dataset is published. Do not add units in the title, as there is a dedicated field for that.

    • Separator: If you have many values in the same cell (e.g. purple, white, yellow) then please specify the separator (it can be a space, dash, comma, semicolon, etc.). Only type the separator in this field.

    • Data Type: Select the appropriate data type* (e.g., coded, numeric, scale).

    • Allowed Values: Define the possible values for coded or scale data, including a description for each value.

    • Unit of measurement: For numeric descriptors, specify the unit of measurement. 

    • Minimum and maximum values: For numerical descriptors, you may specify a minimum and/or maximum values allowed for this descriptor. This doesn't mean the minimum and maximum values found in your specific dataset (e.g. the plant height minimum allowed value is 0 because a height cannot be negative, and not 100 cm as found in your dataset). For scale descriptors, you need to specify the minimum and maximum values as these define the range of your scale.

    • Type of numeric values: Specify if the numeric descriptor is discrete i.e allows only values without decimal points (e.g. 10, 20, 30) or continuous i.e allows both values that are with and without decimal points (e.g. 10, 20.98, 10.25, etc.) The decimal symbol should be a point in your data, and not a comma (e.g 10.75 instead of 10,75)

    • Category: Specify the descriptor category (e.g. characterization, evaluation, etc.)

  2. Validate data

    • Check values: Ensure all the values in your data column match the allowed values and formats specified in the descriptor.

    • Correct irregularities: Use the Genesys interface to remap or correct any data entries that do not conform to the descriptor's validation criteria.

  3. Save Descriptor to Genesys

    • Add the version of this descriptor, for example version 1.0 or 2024.1 etc.

    • Include additional metadata such as the methodology for data collection, any relevant information, and descriptions.

    • Add the code of descriptor language for example en for English, fr for French, etc.

    • Click “Register a new descriptor in Genesys” to save this new descriptor.

  4. Register observations: Once all mismatched values are corrected, register the observations to finalize the linkage. This step is important, skipping the “Register observations” button means that the trait data in that column will not be included in the published dataset. You may try the “Dry run” button to ensure there are no further mismatches, and to confirm that all data values align with the descriptor’s criteria. After clicking on “Dry run”, you still need to click on “Register observations”.

* Coded data represents distinct categories or groups, typically using numbers or strings as codes. Scale data consists of coded data that follows a certain hierarchy. values where the differences between values are meaningful. Numeric data includes both integers and floating-point numbers used for quantitative measurements. Text data represents sequences of characters used to store words and text, including letters, numbers, and symbols. Date data type is used to represent dates and times in various formats. Boolean data type represents two possible values, e.g. true or false.

image.pngFigure 12: Describe a new descriptor

image.pngFigure 13: Update new descriptor and register observations

Step 2.4: Preview Mapped Data

  1. View Dataset: Use the preview function by clicking on the “View Dataset” button to see how the mapped data will appear to users. Ensure all descriptors are correctly linked and data is displayed as expected.

  2. Test Filters: In the preview, use the filtering options to test the usability of the dataset. This helps ensure users can effectively query and interact with the data once it’s published.

This step is already demonstrated in figure 11 above. We advise you to continually preview the mapped data and make necessary edits to descriptors or data entries to ensure data integrity and accuracy.

Step 3: Review the list of accessions

After mapping your data and linking descriptors, the next step in the Genesys uploader tool is to review the list of accessions. This step ensures that all accessions in your dataset are correctly linked to their corresponding records in Genesys.

The system displays a list of all accessions included in your uploaded dataset. Each accession should have a link to its passport data in Genesys (indicated by blue hyperlinks). Unlinked accessions (shown in black) indicate a mismatch or missing record in Genesys. The detailed review process of this step is as follows (figure 14):

  1. Check for Links: Scroll through the list to ensure that each accession number has a corresponding hyperlink to its passport data.

    • Hyperlinks to accessions in blue indicate successful mapping to existing records in Genesys.

  2. Identify Issues: Look for any accessions that are not linked. These will be displayed without hyperlinks and in a black text color. Common issues include incorrect accession numbers, missing genus information, or nonexistent records in Genesys.

  3. Correct Errors: If errors are found, you have two main options: Rematch Accessions button that attempts to rematch accessions to correct minor issues. This button is necessary  when you upload the trait data and later on update the passport data before publishing the trait dataset. Clear List: If significant errors are present, you may choose to clear the list and re-upload the corrected file.

Example

Suppose you have 194 accessions in your Excel file, but only 191 are linked:

  1. Unlinked Accessions:

    • Three accessions are not linked. These might have incorrect genus names or are missing from Genesys.

  2. Action Steps:

    • Correct the errors in your passport data, we have a separate webinar on this (e.g., fix genus names or add missing accessions to Genesys).

    • Come back to your trait dataset in step 3 and use the "Rematch Accessions" feature to refresh the links.

image.pngFigure 14: Review the list of accessions

Step 4: Add dataset creators

In this step, you will add and credit the individuals who contributed to the creation and management of the dataset. This includes roles such as data managers, collectors, digitizers, and curators. This step ensures proper attribution and acknowledges the contributions of various team members. If the dataset was a collaborative effort with multiple institutions, ensure all contributors are credited. Make sure to always obtain consent before sharing personal contact details. Note that roles can vary for different datasets. For example, Jennifer might be a Data Manager in one dataset and a Data Collector in another (figure 15).

1. Full Name: Enter the full name of the dataset creator.

2. Role: Select the appropriate role from the list:

  • Data Manager: Responsible for overseeing the data collection and management.

  • Data Collector: The person who collected the data in the field.

  • Data Digitizer: The individual who transferred the data from paper to digital format.

  • Data Curator: The person who organized, validated, and ensured the quality of the data.

3. Institutional Affiliation:

  • Institutional Name: Enter the full name of the institution (not the institute code).

  • Optional details: Include email address, phone number, fax, and address. Ensure you have consent to share personal information.

Add Multiple Creators: Repeat the process to add more creators if necessary, ensuring that each person’s role and affiliation are accurately recorded.

Delete creators: Click on the trash can icon to remove a dataset creator entry.

image.pngFigure 15: Add dataset creators

Step 5: Location and Timing

In this step, you specify the geographical locations and time frames where the trait data was collected. This metadata provides important context for the data and enhances its utility for other users. By accurately filling out this information, you provide valuable context for the dataset, aiding users in understanding the conditions under which the data was collected (figure 16).

Adding Location and Timing

  1. Open Location and Timing Section:

    • Click "Add Location" to enter the details.

  2. Enter Location Details:

    • ISO Country Code: Enter the three-letter country code (e.g., KEN for Kenya). And then select the option from the drop down menu.

    • Country Name: Type the full name of the country. And then select the option from the drop down menu.

    • State/Province: Enter the state or province name if applicable.

    • Locality: Specify the locality or region.

    • Latitude and Longitude: Provide decimal degrees for precise location. This is optional but recommended for accuracy.

  3. Enter Timing Details:

    • Starting Date: Format as YYYY-MM-DD. If the exact day is unknown, use YYYY-MM-00.

    • Ending Date: Format as YYYY-MM-DD. If the exact day is unknown, use YYYY-MM-00.

    • Environment Description: Provide a general description of the environment and conditions during data collection (e.g., climate, soil type).

  4. Save and Continue:

    • Review all entries for accuracy.

    • Save and proceed to the next step.

Add multiple Locations and Times: Add more locations and time frames if the data was collected at different sites or during different periods.

Delete Locations and Times: Click on the trash can icon to remove a Location and Time entry.

image.pngFigure 16: Add locations and timings

Step 6: Organize Descriptors

In this step, you arrange the trait descriptors for your dataset. Proper organization ensures that the data is presented in a logical and user-friendly manner. We recommend that you use this step to also check for the proper spelling and title capitalization of your descriptors: first letter in big caps and the rest in small caps (figure 17).

  1. Reorder Descriptors: Click and drag descriptors to rearrange their order. This is particularly useful if certain traits should be grouped or prioritized.

  2. Review Unmapped Descriptors: Click the "Check for Unmapped Descriptors" button to refresh this list. Review these descriptors to decide if any should be included. If so, return to step 2 to map them appropriately.

  3. Delete descriptors: Click on the trash can icon to remove a descriptor from the list. This means that the column and its associated data will not be included in the trait dataset. If you accidentally remove descriptors, return to step 2 to re-map them appropriately.

Tips for Organizing Descriptors

  • Group Similar Traits: Place related traits together (e.g., all morphological traits followed by all agronomic traits).

  • Prioritize Important Traits: List the most critical traits for your research or user needs at the top.

Logical Flow: Arrange traits in a sequence that reflects the growth or developmental stages of the plant, if applicable.

image.pngFigure 17: Organize trait descriptors

Step 7: Review and Publish

In this step, you perform a final check of your dataset before submitting it for publication on Genesys. This includes reviewing the dataset’s details, verifying that all descriptors are accurately mapped and organized, and confirming the roles and affiliations of all dataset creators. Additionally, you ensure that all accessions are correctly linked to their corresponding passport data. Once satisfied with the dataset’s accuracy and completeness, you click "Send to Review" to submit it for publication. The Genesys team will then review the dataset before making it publicly available, ensuring data integrity and usability.

-----------------------

By following these steps, you can successfully upload and integrate trait data into Genesys PGR, making it accessible and searchable for the global research community. Experiment with the sandbox environment and refine your process before uploading to the live platform.

And as usual, if you have any questions or need assistance with the new form, please don’t hesitate to contact helpdesk@genesys-pgr.org.

 

You may also be interested in