How to use RStudio to download passport data

By christelle.rabil@croptrust.org
3 June 2024

Say goodbye to endless Excel files from Genesys with the genesysr package!

Many thanks to Miguel Angel Acosta, Documentation Coordinator at the Alliance of Bioversity International and CIAT for providing this article to Genesys.

To harness the power of Genesys data seamlessly within the R environment, the genesysr package serves as an indispensable tool. In this article, we will explore how to utilize the genesysr package in RStudio to download passport data.

Proficiency in R is not a prerequisite; rather, having basic knowledge coupled with a strong eagerness to learn is sufficient.

What is genesysr?

genesysr is an R package designed to interact with the Genesys web API, allowing users to query and retrieve passport data of plant genetic resources directly into their R environment. Developed by the Crop Trust, this package streamlines the process of accessing and utilizing genetic resource data for various analytical and research purposes.

Getting Started

Before diving into the specifics of using the genesysr package, ensure that you have R and RStudio installed on your system. Additionally, you'll need an active internet connection to access the Genesys database through the API. The Information Management team on Genetic Resources Program at the Alliance of Bioversity and CIAT is happy to share with you this code to help you install, load and use the genesysr package.

The genesysr package is available in this link.

To begin working on a project in R, it's essential to first set up the necessary tools and libraries. For those utilizing the RStudio Integrated Development Environment (IDE), the process starts by installing and loading the required R packages. This foundational step ensures that all functionalities needed for the project are readily accessible, thereby streamlining the subsequent phases of data analysis or programming. This approach not only facilitates a more organized workflow but also enhances the reproducibility and efficiency of the research.

image.png

Next, it is crucial to configure the working environment—either production or test—before proceeding to log into Genesys. This setup allows for a secure and effective connection to the API service. Establishing the correct environment ensures that all interactions with the API are routed appropriately, facilitating smooth integration and reliable data exchange throughout the project.

image.png

At this stage, you will need to login. For those with a Gmail account, simply click the designated link for immediate access. Alternatively, there is an option to create a new account, offering flexibility and convenience for new users. This step ensures secure access and personalizes your experience with the service.

image.png

Once logged in, the system displays a message, confirming successful authentication.

image.png

This message serves as a direct communication link between the user and the service, ensuring clarity and direction moving forward with the consult services. As mentioned you can now continue in RStudio.

To apply the filters effectively, it is essential to understand the data outlined in the Multi-Crop Passport Descriptors (MCPD). For this illustration, we will access and download relevant data from the International Center of Tropical Agriculture (CIAT) (COL003). Specifically, we will focus on gathering information about the genus Phaseolus, targeting specimens with the biological status classified as wild and weedy. This approach ensures a precise and targeted retrieval of data, crucial for accurate analysis and research.

image.png

During the data acquisition phase, the console will display the progress of the operation, specifically noting the time taken to download each batch of 500 rows.

image.png

image.png

Leveraging the tidyverse package suite in R significantly streamlines data manipulation tasks, making it ideal for creating reports. In the provided example, the `filter_1` data frame undergoes transformation through a pipeline that first counts occurrences by the `sampStat` variable using the `count()` function. Subsequently, the `mutate()` function is employed to add a new column `sampStat_2`, which categorizes data into 'Wild', 'Semi-natural/wild', and 'Weedy'. This method showcases how tidyverse functions can be chained together to efficiently transform and annotate data sets for more nuanced analysis and reporting.

image.png

Creating a bar chart can significantly enhance the visualization of the downloaded data, making it easier to analyze and interpret.

image.png

image.png

---

The following example involves the International Maize and Wheat Improvement Center (CIMMYT) (MEX002). This example focuses on the search for the genus Zea, specifically the species mays and nicaraguensis, which include both wild types and inbred lines.

image.png

 

image.png

Creating Dynamic Maps with Leaflet in R

The leaflet library allows us to create dynamic, interactive maps, making it easy to visualize the locations of accessions. Using the data from MEX003, `filter_3` contains the coordinates of Zea accessions.

image.png

 

image.png

---

Another example involves data from the International Center of Potato (PER001), where we filter for the genus Solanum and the species acaule and stoloniferum that are in a wild biological status. This is illustrated below.

image.png

 

image.png

 

We hope this example proves to be helpful.


Click here to see the full tutorial in gitlab! 

You may also be interested in