From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding

Miranda, Melissa Cristina de Carvalho

doi:10.11606/T.11.2024.tde-02072024-112314

Home

Facilities

Doctoral Thesis

DOI

https://doi.org/10.11606/T.11.2024.tde-02072024-112314

Document

Doctoral Thesis

Author

Miranda, Melissa Cristina de Carvalho (Catálogo USP)

Full name

Melissa Cristina de Carvalho Miranda

E-mail

Institute/School/College

Escola Superior de Agricultura Luiz de Queiroz

Knowledge Area

Genetics and Improvement of Plants

Date of Defense

2024-05-02

Published

Piracicaba, 2024

Supervisor

Pinheiro, Jose Baldin (Catálogo USP)

Committee

Pinheiro, Jose Baldin (President)
Bruzi, Adriano Teodoro
Silva, Felipe Lopes da

Title in English

From seed to canopy: high-throughput phenotyping and machine learning in soybean breeding

Keywords in English

Convolutional neural network
Phenomics
RGB images
Seed morphology
Transfer learning
Vegetation indices

Abstract in English

Soybean breeding faces the challenge of evaluating large and complex populations in different environments to obtain accurate genetic values that can be used as selection criteria. This study aims to overcome this challenge by enhancing the understanding of the potential of highthroughput phenotyping (HTP) and the application of machine learning (ML) models in predicting classic phenotypic traits in soybean breeding programs, through the analysis of seed images and aerial canopy images of the plants. The methodology consisted of the phenotypic characterization of 275 soybean genotypes in different environments and management practices, including management with and without fungicide application for the control of Asian rust. In general, predictions based on regression algorithms (support vector machine (SVM), random forest (RF), multilayer perceptron neural network (MLP), and AdaBoosting) were initially evaluated, followed by the use of transfer learning techniques with convolutional neural networks (CNNs) to extract features from images (VGG16, VGG19, ResNet50, InceptionV3, and Inception-ResNetV2) integrated with the same models for prediction. In the first chapter, RGB (red-green-blue) images of seeds from each plot were collected, considering sparsely and densely distributed seeds. A custom image processing pipeline was developed for seed segmentation, which allowed for a detailed morphological evaluation. ML algorithms and different CNNs architectures were compared in predicting the weight of a hundred seeds. The image segmentation technique correctly identified over 98% of the seeds, and the morphological measurements achieved a predictive ability of 0.71, with a mean squared error (MSE) of 3.15. The same results were observed for the CNN features, highlighting the efficiency of the morphological measurements as extractors of image features. The ResNet-50 model stood out as the most accurate CNN for feature extraction. In the second chapter, we investigated the heritability and correlation between vegetation indices obtained from aerial images and traditional phenotypic traits. High heritability of the RGBVI and GLI vegetation indices (mean H² of 0.56) was found compared to other RGB-based indices, making them promising for genetic evaluations. The use of advanced ML techniques, especially transfer learning with ResNet 50, improved the prediction of traits such as days to R7 stage (DR7) and plant height measurement (PHM) from canopy images. The combination of ResNet 50 with RF for DR7 prediction and with MLP for PHM prediction showed promising results, highlighting the potential of these approaches to optimize decision-making in soybean breeding. In summary, the research concludes that the integration of image data with machine learning models offers a robust decision support system, enabling the prediction of classic phenotypic characteristics of soybeans through images, aiming to optimize the identification of high-performance genotypes.

Title in Portuguese

Da semente ao dossel: fenotipagem de alto rendimento e aprendizado de máquina no melhoramento da soja

Keywords in Portuguese

Aprendizagem por transferência
Fenômica
Imagens RGB
Índices de vegetação
Morfologia de sementes
Rede neural convolucional

Abstract in Portuguese

O melhoramento de soja enfrenta o desafio de avaliar populações grandes e complexas em diferentes ambientes para obter valores genéticos acurados que possam ser utilizados como critérios de seleção. Este estudo objetiva superar esse desafio, aprimorando o entendimento do potencial da fenotipagem de alto rendimento (HTP) e da aplicação de modelos de aprendizado de máquina (ML) na predição de características fenotípicas clássicas em programas de melhoramento de soja, por meio de imagens de sementes e aéreas do dossel das plantas. A metodologia consistiu na caracterização fenotípica de 275 genótipos de soja em diferentes ambientes e manejos, incluindo manejos com e sem aplicação de fungicidas para o controle da ferrugem asiática. De forma geral, primeiramente foram avaliadas predições de aprendizado de máquina baseadas em algoritmos de regressão (máquina de vetores de suporte (SVM), floresta aleatória (RF), rede neural perceptron multicamadas (MLP) e AdaBoosting), e em seguida foram testadas técnicas de aprendizado por transferência com redes neurais convolucionais (CNNs) para extrair características das imagens (VGG16, VGG19, ResNet50, InceptionV3 e Inception-ResNetV2) integrados com os mesmos modelos de predição. No primeiro capítulo, foram coletadas imagens RGB (vermelho-verde-azul) das sementes de cada parcela considerando sementes esparsamente e densamente distribuídas. Foi desenvolvido um pipeline de processamento de imagens para a segmentação das sementes, o que permitiu uma avaliação morfológica detalhada. Comparou-se algoritmos de ML e diferentes arquiteturas de CNNs na predição do peso de cem sementes. A técnica de segmentação de imagem conseguiu identificar corretamente mais de 98% das sementes, e as medições morfológicas alcançaram uma capacidade preditiva de 0,71, com um erro quadrático médio (MSE) de 3,15. Os mesmos resultados foram observados para as características da CNNs, destacando a eficiência das medidas morfológicas como extratores de recursos de imagem. O modelo ResNet-50 se destacou como a CNN mais acurada para a extração de características. No segundo capítulo, por sua vez, investigamos a herdabilidade e correlação entre índices de vegetação obtidos de imagens aéreas e as características fenotípicas tradicionais. Verificou-se alta herdabilidade dos índices de vegetação RGBVI e GLI (H² médio de 0,56) em comparação com outros índices baseados em RGB, o que os torna promissores para avaliações genéticas. O uso de técnicas avançadas de ML, em especial o aprendizado por transferência com a arquitetura ResNet 50, melhorou a predição de características como os dias até o estágio R7 (DR7) e a medição da altura da planta (PHM) a partir de imagens do dossel. A combinação do ResNet 50 com RF para a predição de DR7 e com MLP para a predição de PHM apresentou resultados promissores, evidenciando o potencial dessas abordagens para otimizar a tomada de decisões no melhoramento de soja. Em suma, a pesquisa conclui que a integração de dados imagéticos com modelos de aprendizado de máquina oferece um sistema robusto de suporte à decisão, permitindo a predição de características fenotípicas clássicas da soja por meio de imagens, visando otimizar a identificação de genótipos de alto desempenho.

WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.

Melissa_Cristina_de_Carvalho_Miranda_versao_revisada.pdf (6.05 Mbytes)

Publishing Date

2024-07-03

Derived works

WARNING: Learn what derived works are clicking here.