Unlocking big data doubled the accuracy in predicting the grain yield in hybrid wheat

A Broad-sense heritability values for hybrids and lines within experimental series are shown as bars and across experimental series as vertical lines. This finding illustrates the potential to increase the predictive power for hybrids by exploiting the precision of estimating additive effects in the large population of inbred lines.

Experimental series II was based on elite winter bread wheat lines and their single-cross hybrid progenies. Details of the plant material and phenotypic data have been published in a previous study Briefly, parental lines have been chosen to reflect a wide range of the diversity that exists in Central Europe.

The lines were divided into a female pool of lines and a male pool of 41 lines, depending on capacity, plant height, and flowering time. In each trial, an unreplicated alpha lattice design was used.

Different genotypes were evaluated in different trials linked by the 11 common checks. Plot sizes ranged from 5. Experimental series III was based on elite bread wheat lines and their single-cross hybrid progenies.

The lines were divided into a female pool of lines and a male pool of 40 lines, depending on pollination capability, plant height, and flowering time. Experiment IV was based on elite winter bread wheat lines and their single-cross hybrid progenies. The lines were divided into a female pool of 8 lines and a male pool of lines, depending on pollination capability, plant height, and flowering time. The parental lines and hybrids were split into two series linked by 16 common checks.

An unreplicated alpha lattice design was used. Experimental series V included hybrids between elite lines and historic varieties or accessions obtained from the gene bank of the IPK Gatersleben.

Six hundred sixty-seven hybrids were produced by crossing 45 elite winter bread wheat lines adapted to the growing conditions of Central Europe with diverse accessions. Here, the elite lines were used as females for hybrid seed.

The accessions were used as male parents and were selected by screening a sample of gene bank accessions from the gene bank of IPK Gatersleben for pronounced anther extrusion. In addition, hybrids were produced by crossing historic varieties with plants originating from four different seed mixtures, each including either two or three elite male lines fig.

Elite lines with good anther extrusion but which showed different flowering time were combined in the four mixtures and used as male crossing partners to optimize the hybrid seed production by an almost perfect match of flowering time between male and female lines and to guarantee the unambiguous identification.

As a baseline, we randomly sampled grain yield data of all lines and hybrids for 3 of the 12 environments, corresponding to plots, and estimated the correlation between grain yield estimates for the data of the subsets and the total 12 environments Fig.

The historic varieties originated from all over Europe from the past decades and were characterized by a short plant height. Trials 1, 2, and 3 included, and entries evaluated in the years, and, respectively. Plot sizes ranged from 6 to 9 m 2.

Part of the phenotypic data of the lines evaluated in it has been published in a previous study Briefly, the lines were evaluated in the years, and for grain yield in up to 10 sites in Germany. The lines were divided into 13 to 18 individual trials connected through five to six common checks.

The experimental design for each trial followed an alpha design with one to three replications per site, with the number of entries per trial ranging from 30 to Plot size ranged from 6. A linear mixed model was used including the effects of genotypes, trials, replications nested within trials, and blocks nested within trials and replications. All data were screened for outliers fig.

Outliers were removed, and best linear unbiased estimations BLUEs of the genotypes in each environment were obtained as outlined in detail elsewhere 18 and served as the input for the subsequent analyses.

All linear mixed models were implemented using the software ASReml-R 3. The genomic profiles of lines were determined using 15, or 90, SNP arrays based on an Illumina Infinium assay The number of markers in each experimental series ranged from 11, to 81, To reduce the risk of a high proportion of missing values in the integrated data, we used only common SNP markers across all six experiments.

After imputation, we removed the monomorphic markers, and the remaining 10, SNP markers were used for subsequent analyses. Marker profiles of hybrids were from the corresponding parental lines. LD between all of SNP markers within each chromosome was calculated as the squared Pearson correlation coefficient r 2 between vectors of SNP alleles using the lines.

The persistence of linkage phase between the experiments was inferred by analyzing how similar or dissimilar the correlations between pairs of markers were following the approach suggested previously The squared correlation between values of two different experiments was defined as LD phase and plotted against the physical map distance to fit natural smoothing splines.

A two-step procedure was applied to analyze the grain yield data across environments In the first step, the data for each environment were analyzed separately. BLUEs of the genotypes in each environment were obtained and served as the input for the second step, where a linear mixed model was applied including the effects of environments and genotypes.

Fixed genotypic effects were assumed to obtain the BLUEs of the genotypic values of the hybrids and their parents. Within each experimental series, broad-sense heritability was calculated from a one-step model.

For each experimental series, a submodel of the general model was applied, and only those factors relevant to the experimental design and the population used in a certain experimental series were retained. The following terms genotype-by-environment interaction effects. For integrated analysis across experimental series, the broad-sense heritability was calculated on the basis of the BLUEs within each environment with model.

Together, duplicate groups were identified representing lines or hybrids. These included 78 groups of hybrids and groups of lines.

The final genomic dataset comprised 10, unique and hybrids ; the latter were derived by crossing male and female lines. The population of 10, genotypes for which 10, high-quality SNP markers had been assessed was used for the genome-wide prediction analyses. We used a genomic best linear unbiased prediction model G-BLUP including additive and dominance effects.

The above models were implemented using the R package with 30, iterations, with the first iteration used as burn-in.

We used chessboard-like experimental series I, II, and III and random fivefold cross-validations experimental series IV, and VI to assess the prediction ability of genomic prediction within experimental series fig.

Basically, data were divided into two sets, a training set and a test set. The genomic data of the test set were used to predict the genetic values of hybrids and lines. The prediction ability for each test set was estimated as the Pearson correlation coefficient between the predictions and the observed phenotypic values. In addition, we tested the prediction ability across experimental series using different combinations of training sets.

In the first scenario, we used one out of the six experimental series as training set.

Each of the other experimental series was used as test set. The prediction ability for each test set was estimated as the Pearson correlation coefficient between the predicted and the observed hybrid performances.

For the training sets, we incrementally added the experimental series except the one used as test set. The prediction ability for each test set was estimated again as the Pearson correlation coefficient between the predicted and the observed hybrid performances. We estimated the effective population size N e as.

The estimated prediction accuracy depends on the heritability of the trait and the ratio between the effective number of segments in the genome and the number of individuals in the training population From Eq. We also calculated the broad-sense heritability of each subpopulation and used Eq. All the above analyses use software R version 3.

On the basis of the data of experimental series II, we tested in three scenarios of field designs, each requiring a similar number of plots. The full data of experimental series II included 11 checks, elite lines, and single-cross hybrid progenies. In scenario, a balanced missing design was considered in which all lines and hybrids were tested in three randomly selected environments, corresponding to plots.

We analyzed all combinations of three environments and estimated the across environment BLUEs in each subset. The random sampling was run for times as in scenario I, and the average number of plots used in this scenario was In scenario III, all lines and hybrids were divided into 10 subgroups, each of which was tested in only three environments, with the restriction that two environments overlapped with those of the next group.

The 11 checks were tested in all environments to estimate the environmental effects, and the average number of plots used in this scenario corresponded to The Pearson correlation between BLUEs from subsets of scenarios I, II, and III, and the total data including 12 environments were used to estimate the precision of the estimates of the genotype effects.

To assess the role of interactions between genotypes and experimental series, we generated previously untested T 2 hybrids that had both parental lines in common with experimental series II but were not tested in experimental series II. We selected the hybrids out of the 23, potential hybrids using the predicted yield performance and further information on producibility of single-cross hybrids, i.

The yield of the other 22 hybrids is somewhere in between these two groups. The hybrids were phenotyped in a separate validation experiment for grain yield in eight environments in Germany in the year We estimated the BLUEs as outlined above and studied their correlation with the predicted hybrid performance using data from experimental series.

The model is as follows. The elements of Z D are 0 and 1, while the homozygote classes are coded as 0, the heterozygotes are coded as 1. We thank GFPi and proWeizen for project coordination.

Author J. Competing interests: The authors declare that they have no competing interests. Requests for the data of the experimental series VI should be submitted to Patrick Thorwarth moc. National Center for Biotechnology Information, U. Journal List Sci Adv v. Sci Adv.

Published online Jun Schulthess1 Mario Gils4 Philipp H. Boeven5 C. Friedrich H. Schmidt1 and Jochen C. Find articles by Yusheng Zhao. Find articles by Patrick Thorwarth. Find articles by Yong Jiang. Find articles by Norman Philipp. Albert W. Find articles by Albert W. Find articles by Mario Gils. Philipp H. Boeven 5 Limagrain GmbH, Salderstr. Find articles by Philipp H. Find articles by C. Find articles by Johannes Schacht.

Find articles by Erhard Ebmeyer. KGaA, Grimsehlstr. Find articles by Viktor Korzun. Find articles by Vilson Mirdita. Ulrike Avenhaus 11 W. Find Prostitutes Leopoldshohe by Ulrike Avenhaus. Find articles by Ralf Horbach. Find articles by Josef Holzapfel. Find articles by Ludwig Ramgraber. Find articles by Pierrick Varenne. Find articles by Anne Starke. Find articles by Sebastian Beier. Find articles by Uwe Scholz.

Find articles by Fang Liu. Renate H. Find articles by Renate H. Jochen C. Find articles by Jochen C. Author information Article notes Copyright and License information Disclaimer. Email: ed.

People have become very sickle-minded.

Received Nov 27; Accepted Apr No claim to original U. Government Works. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

This article has been cited by other articles in PMC. Abstract The potential of big data Prostitutes Leopoldshohe support businesses has been demonstrated in financial services, manufacturing, and telecommunications. Open in a separate window.

Prediction ability of hybrid grain yield is determined mainly by relatedness Because each of the parents in experimental series I, II, and III was tested in several hybrid combinations, we investigated the ability to predict hybrid grain yield performance using a genomic-based unbiased prediction model incorporating both additive and dominance genomic relationships and a chessboard-like cross-validation with three level of relatedness: T 2, T 1, and T 0 fig.

Interactions between genotypes and experimental series affect across series prediction ability The ability to predict the hybrid performance from one experiment to another across experimental series I, II, III, or IV was lower 0.

The potential of big data for hybrid prediction One of the important tasks in hybrid wheat breeding is to predict for new environments the single-cross performance of parental lines that have not yet been evaluated in other hybrids. Relationship between prediction ability and effective population size N e in experimental series VI.

Optimized field designs to reduce interaction effects exemplified on the basis of yield trials of experimental series II in 12 environments. Curation of phenotypic data A linear mixed model was used including the effects of genotypes, trials, replications nested within trials, and blocks nested within trials and replications. Genomic data analyses The genomic profiles of lines were determined using 15, or 90, SNP arrays based on an Illumina Infinium assay Broad-sense heritability for grain yield A two-step procedure was applied to analyze the grain yield data across environments Genomic prediction and validation Assessment of the prediction ability We used a genomic best linear unbiased prediction model G-BLUP including additive and dominance effects.

Optimized field designs to reduce genotype-by-environment interaction effects On the basis of the data of experimental series II, we tested in silico three scenarios of field designs, each requiring a similar number of plots.

Juliana P. D. Jiang Y. Rogers A.

G3 Bethesda 11jkaa Zhao Y. Meuwissen T. Genetics— Wray N. X. Nature— Hickey J. Rembe M. Yang J. Kim H. Daetwyler H. Visscher P. Twin Res. Schrag T. Boeven P.

Basnet B. Plant Genome 12 Scienceeaar Norman A. G3 8— Belamkar V. Liu F. Plant Biotechnol. Ward, A. Barker, Undefined by data: A survey of big data definitions ; arXiv: Wang S.

The information on the environmental drivers can then be integrated as covariables into the statistical analyses to obtain more accurate estimates of the genotype main effects, thus reducing the estimation bias caused by interaction effects between genotypes and experimental series 38 , The six experimental series were based on different crossing designs comprising factorial mating and topcross designs and include not only a very broad genetic diversity of the European wheat breeding pool but also plant genetic resources. The set of overlapping genotypes allowed an integrated analysis.
