Universidade Federal de Viçosa
Viçosa, MG. Brasil
A Software in the Area of Genetics and Experimental Statistics
Departamento de Biologia Geral
Viçosa, MG. 36570-00
To achieve superior genetic material, the selected individuals must assemble, simultaneously, a series of favorable attributes that will impart a comparatively higher yield to them and meet consumers’ demands. Therefore, a way to increase the chances of success for a breeding program is the performance of reliable experiments that provide a great volume of experimental data. Based on the adequate processing of these data, genetic parameters can be estimated and biological phenomena can be interpreted. In the phase of analysis and interpretation of results, it is very important for researchers to have available software systems and computer resources.
The development of software in the area of Genetics and Breeding is crucial due to the scarcity of such resources in the scientific community. The availability of these tools will supply an increasing demand of users in several research institutions that manipulate a great volume of data, requiring an adequate processing for the accurate estimation of statistical and biological parameters.
Specifically for Genetics, it can be observed that the intensive breeding of many species and the complexity of the most important characters have required the use of selection criteria increasingly accurate. In all breeding stages, breeders must use information that is expressed in parameters of the biometric models, which are usually available in the exits of most software systems for the scientific area.
Hence, the GENES software was especially developed to meet the needs of the areas of Genetics and Experimental Statistics.
The Genes software system is compatible with IBM PCs and should be used with the Windows operational system.
Some configurations are indispensable, such as:
- a screen resolution 1024 x 768 (large fonts 120ppp) and
- the use of a decimal symbol expressed by points.
It comprises 201 executable projects, 131 text documents in the rtf format, occupying around 250Mbytes, available in English and Portuguese.
Provision of Data for Processing
The procedures generally present a common sequence of data analysis. Basically, users supply the name of the file that contains the data to be processed, give information about the parameters (number of variables, treatments, blocks, etc), supply the names of the variables (optional) and print or save the results achieved.
The provision of data is carried out by a file containing data in a spread sheet, in which each column represents a certain characteristic to be analyzed, and each line, the experimental observation. Sometimes, the first columns are reserved to describe the classificatory variables or effect describers, such as treatments, blocks, years, locations, etc.
The Genes software system presents modules of analysis, which are described below, that involve several procedures of biometric analysis.
Genotype x Environment Interaction: stratification analysis, dissimilarity and correlations between environments.
Stability and Adaptability: analysis by methods based on the ANOVA (traditional, Plaisted and Peterson, 1959, Wricke,1965 and Annicchiarico,1992), regression (Eberhart and Russell, 1966, Finlay and Wilkinson, 1963 and Tai, 1971), bissegmented regression (Verma, Chahal and Murty, 1978, Silva and Barreto, 1985 and Cruz, Torres and Vencovsky, 1989) nonparametric analysis (Huehn, 1990, visual analysis and Lin and Binns,1988), analysis of factors and main components or centroids.
Gains per Selection – Indices: calculation of gains per selection between families (univariate and indices), considering the direct and indirect selection, the classic index of Smith,1936 and Hazel, 1943, based on the sum of ranks of Mulamba and Mock,1978, base of Willians, 1962, multiplicative of Subandi et al., 1973, free of weights and parameters of Elston, 1963, based on the desired gains of Pesek and Baker, 1969 and on the genotype-ideotype distance index. Calculation of gains per selection between families by univariate methods or by following restricted indices: classic of Smith,1936 and Hazel, 1943, of Kempthorne and Nordskog, 1959, of Tallis,1962, of James, 1968, of Cunningham et al., 1970, and based on the desired gains of Pesek and Baker, 1969. Calculation of gain per selection between, considering indices under colinearity, of gains per selection between and within, in balanced and unbalanced experiments, per massal and stratified selection between and within. Visual selection analysis, selection of several environments and prediction of gains per selection within, without information from the plants within the parcel.
Diallel Analysis: Analysis of the balanced diallels (Methodologies of Griffing, 1956, Gardner and Eberhart, 1966, Hayman,1954 and Cocherhan and Weir,1977, test between hybrids and reciprocals, prediction of compounds and hybrids and of family indices) joint diallel analysis (of balanced diallels of Griffing, 1956, of Gardner and Eberhart, 1966, and of partial and circulating diallels), Partial diallels ( by the methodologies of Geraldi and Miranda Filho, 1988, of Miranda Filho and Geraldi,1984, of Kempthorne, 1966, of Viana et al. 1999 and 2000 and prediction of triple and double hybrids). Analysis of circulating diallels, circulating partial and unbalanced.
Segregant and Non-segragant generations: scale joint test (P1, P2, F1, F2 with optional inclusion of RC1 and RC2 ), analysis of experiments of segregant lines and parents in intercalating rows and analysis of individuals in the generation Ft and their derivative lines Ft+1
Repetibility : Analysis of original or classified data
Combined selection : analysis of experiments of families with balanced and unbalanced data. Analysis of genetic design proposed by Comstock and Robinson (1948), Comstock and Robinson (1948) involving several Sets
Genetic and Environmental Progress
2. Multivariate Analysis
Discriminant Analysis (by the method proposed by
Analysis of Factors
Measures of Dissimilarity: based on continuous, multicateegoric or binary phenotipic quantitative variables. Analysis of molecular data from dominant or codominant markers.
Grouping Analysis: Tocher optimization method, hierarchical, graphic dispersion and 2D and 3D projection. Identification of more and less similar accesses.
Importance of Characters: by main components or by the distance by the Generalized Mahalanobis distance and canonical variable analysis.
Simulation of experiments
Simulation of Samples (p populations and v variables)
Optimal Number of Families
Optimal Number of Plants (Random or Predifined Sampling)
Optimal Number of Repetitions or Optimal Sample Size
4. Genetic Diversity
Diversity between Accesses: based on continuous, multi-category, binary phenotypic variables, and analysis of data of dominant and codominant markers (multi-allelic).
Diversity between Populations: Nei’s Genetic identity Calculation (1972) and the following distances:
Diversity within populations: calculation of the coefficient of endogamy and heterozygosis, Shannon-Wiener index and the heterozygosis from the binary data.
Diversity between and within populations: descriptive analysis, Nei’s diversity calculation (1973), Wright fixation index (Two alleles or Multiple alleles), from the heterozigosity of Weir (1996). Analysis of the Contingency Table, anova of the allelic frequency (F, f and ), Amova of Excoffier et al (1992) and analysis of binary data.
Discriminant Analysis: discriminant
Grouping analysis: using the following methods: Tocher optimization and hierarchical methods, by graphic dispersion, 2D and 3D projection and analysis of more and less similar accesses.
Matrices of Dissimilarities: calculation of the correlation and sum between elements of matrices of dissimilarity.
Importance of Characters: considering phenotypic quantitative characters or molecular information, by means of the Manova
Optimization: Analysis of the optimal number of binary or multi-allelic markers for the study on genetic variance.
Simulation: simulation of populations, crossings and samples of populations, under the effect of the divergent selection or genetic drift.
Hardy-Weinberg Equilibrium: Analysis of populations based on the information of codominant diallelic or multi-alleclic markers.
Unbalance of the Gametic Stage Disequilibrium
5. Experimental Statistics
Variance Analysis: analysis of completely randomized designs and schemes, of experiments with regular and non-regular treatments, in casualized blocks, factorial and subdivided parcels. Analysis of origin/progeny/plant, simple and triple lattices and hierarchical models.
Regressions: simple linear, non-linear, multiple and polynomial, response surface and 3D graphics analysis.
Correlations: calculation of genetic correlations, partial and canonical Pearson and Spearman correlations. Path analysis (involving 1 or 2 chains) and path analysis under colinearity.
Comparison Between Averages: Tests of Tukey, Duncan, Scheffé and Scott and Knott, Tukey test with variable number of repetitions, Dunnett, t test, Tocher, chi-square test to evaluate hypotheses, heterogeneity and factorial linkage.
Stand Correction Methods
Diagnosis of Multicolinearity
Algebra of Matrices
Solution of the System
Solution of the System
References concerning the software system
CRUZ, C. D. . Programa Genes - Análise multivariada e simulação. 1. ed. Viçosa, MG: Editora UFV, 2006. v. 1. 175 p.
CRUZ, C. D. . Programa Genes - Biometria. 1. ed. Viçosa,MG: Editora UFV, 2006. v. 1. 382 p.
CRUZ, C. D. . Programa Genes - Diversidade Genética. 1. ed. Viçosa, MG: Editora UFV, 2008. v. 1. 278 p.
CRUZ, C. D. . Programa Genes - Estatística Experimental e Matrizes. 1. ed. Viçosa: Editora UFV, 2006. v. 1. 285 p.