Universidade Federal de Viçosa

Viçosa, MG. Brasil

 

Program Genes

 

A Software in the Area of Genetics and Experimental Statistics

 

Departamento de Biologia Geral

Viçosa, MG. 36570-00

 

 

 

 

 

Download

 

Introduction

            To achieve superior genetic material, the selected individuals must assemble, simultaneously, a series of favorable attributes that will impart a comparatively higher yield to them and meet consumers’ demands. Therefore, a way to increase the chances of success for a breeding program is the performance of reliable experiments that provide a great volume of experimental data. Based on the adequate processing of these data, genetic parameters can be estimated and biological phenomena can be interpreted. In the phase of analysis and interpretation of results, it is very important for researchers to have available software systems and computer resources. 

            The development of software in the area of Genetics and Breeding is crucial due to the scarcity of such resources in the scientific community. The availability of these tools will supply an increasing demand of users in several research institutions that manipulate a great volume of data, requiring an adequate processing for the accurate estimation of statistical and biological parameters.

            Specifically for Genetics, it can be observed that the intensive breeding of many species and the complexity of the most important characters have required the use of selection criteria increasingly accurate. In all breeding stages, breeders must use information that is expressed in parameters of the biometric models, which are usually available in the exits of most software systems for the scientific area.

            Hence, the GENES software was especially developed to meet the needs of the areas of Genetics and Experimental Statistics.

 

 

Description

            The Genes software system is compatible with IBM PCs and should be used with the Windows operational system.

 

            Some configurations are indispensable, such as:

-          a screen resolution 1024 x 768 (large fonts 120ppp) and

-           the use of a decimal symbol expressed by points.

 

It comprises 201 executable projects, 131 text documents in the rtf format, occupying around 250Mbytes, available in English and Portuguese.

 

 

Provision of Data for Processing

The procedures generally present a common sequence of data analysis. Basically, users supply the name of the file that contains the data to be processed, give information about the parameters (number of variables, treatments, blocks, etc), supply the names of the variables (optional) and print or save the results achieved.

The provision of data is carried out by a file containing data in a spread sheet, in which each column represents a certain characteristic to be analyzed, and each line, the experimental observation. Sometimes, the first columns are reserved to describe the classificatory variables or effect describers, such as treatments, blocks, years, locations, etc.

 

Modules

The Genes software system presents modules of analysis, which are described below, that involve several procedures of biometric analysis.

 

1. Biometrics

 

         Genotype x Environment Interaction:   stratification analysis, dissimilarity and correlations between environments.

 

         Stability and Adaptability: analysis by methods based on the ANOVA (traditional,  Plaisted and Peterson, 1959,  Wricke,1965 and  Annicchiarico,1992),  regression (Eberhart and Russell, 1966, Finlay and Wilkinson, 1963 and  Tai, 1971),  bissegmented regression (Verma, Chahal and Murty, 1978,  Silva and Barreto, 1985 and Cruz, Torres and Vencovsky, 1989)  nonparametric analysis (Huehn, 1990,  visual analysis and Lin and Binns,1988), analysis of factors and main components or centroids.

 

            Gains per Selection – Indices: calculation of gains per selection between families (univariate and indices), considering the direct and indirect selection, the classic index of Smith,1936 and Hazel, 1943, based on the sum of ranks  of Mulamba and Mock,1978, base  of Willians, 1962,   multiplicative  of Subandi et al., 1973, free of weights and parameters of Elston, 1963,  based on the desired gains of Pesek and Baker, 1969 and on the genotype-ideotype distance index. Calculation of gains per selection between families by univariate methods or by following restricted indices: classic of Smith,1936 and Hazel, 1943,  of Kempthorne and Nordskog, 1959, of Tallis,1962, of James, 1968, of Cunningham et al., 1970, and based on the desired gains of Pesek and Baker, 1969. Calculation of gain per selection between, considering indices under colinearity, of gains per selection between and within, in balanced and unbalanced experiments,  per massal and stratified selection between and within. Visual selection analysis, selection of several environments and prediction of gains per selection within, without information from the plants within the parcel.

 

             Diallel Analysis:  Analysis of the balanced diallels (Methodologies of Griffing, 1956, Gardner and Eberhart, 1966, Hayman,1954 and Cocherhan and Weir,1977, test between hybrids and reciprocals, prediction of compounds and hybrids and of family indices) joint diallel analysis (of balanced diallels of Griffing, 1956, of Gardner and Eberhart, 1966, and of partial and circulating diallels),  Partial diallels ( by the methodologies of Geraldi and Miranda Filho, 1988, of Miranda Filho and Geraldi,1984, of Kempthorne, 1966, of Viana et al. 1999 and 2000 and prediction of  triple and double hybrids).  Analysis of circulating diallels, circulating partial and unbalanced.  

              Segregant and Non-segragant generations: scale joint test (P1, P2, F1, F2 with optional inclusion of  RC1 and RC2 ), analysis of experiments of segregant lines and parents in intercalating rows and analysis of individuals in the generation  Ft and their derivative lines Ft+1

 

             Repetibility :  Analysis of original or classified data

             Combined selection : analysis of experiments of families with balanced and unbalanced data. Analysis of genetic design proposed by Comstock and Robinson (1948),   Comstock and Robinson (1948) involving several Sets

              Genetic and Environmental Progress

             Nuclear  Collection

 

2. Multivariate Analysis

         Main Components

         Canonical Variables

         Canonical Correlations

         Discriminant Analysis (by the method proposed by Anderson or based on main components)

         Analysis of Factors

          Measures of Dissimilarity:  based on continuous, multicateegoric or binary  phenotipic quantitative variables. Analysis of  molecular data from dominant or codominant markers.

         Grouping Analysis: Tocher optimization method, hierarchical, graphic dispersion and 2D and 3D projection. Identification of more and less similar accesses.       

           Importance of Characters: by main components or by the distance by the  Generalized Mahalanobis distance and canonical variable analysis.

 

3. Simulation

         Simulation of experiments

         Simulation of Samples (p populations and v variables)

         Optimal Number of Families

         Optimal Number of Plants (Random or Predifined Sampling)

         Optimal Number of Repetitions or Optimal Sample Size

 

4. Genetic Diversity

           Diversity between Accesses: based on continuous, multi-category, binary phenotypic variables, and analysis of data of dominant and codominant markers (multi-allelic).

 

            Diversity between Populations: Nei’s Genetic identity Calculation (1972) and the following distances: Euclidean, of Rogers, Angular, of Goldstein et. al (1985) and of Hedrick.

           

           Diversity within populations: calculation of the coefficient of endogamy and heterozygosis, Shannon-Wiener index and the heterozygosis from the binary data.

 

            Diversity between and within populations: descriptive analysis, Nei’s diversity calculation (1973), Wright fixation index (Two alleles or Multiple alleles), from the heterozigosity of Weir (1996). Analysis of the Contingency Table, anova of the allelic frequency (F, f and ), Amova  of Excoffier et al (1992) and analysis of binary data.

 

            Discriminant Analysis: discriminant analysis of Anderson, analysis based on main components or in k-nearest neighbors. Discriminant analyses from the dissimilarity matrices.

 

          Relationship  Coefficient

 

         Grouping analysis: using the following methods: Tocher optimization and hierarchical methods, by graphic dispersion, 2D and 3D projection and analysis of more and less similar accesses.

 

          Matrices of Dissimilarities: calculation of the correlation and sum between elements of matrices of dissimilarity.

 

           Importance of Characters: considering phenotypic quantitative characters or molecular information, by means of the Manova

 

            Optimization: Analysis of the optimal number of binary or multi-allelic markers for the study on genetic variance.

 

            Simulation: simulation of populations, crossings and samples of populations, under the effect of the divergent selection or genetic drift.

 

            Hardy-Weinberg Equilibrium: Analysis of populations based on the information of codominant diallelic or multi-alleclic markers.

 

            Unbalance of the Gametic Stage Disequilibrium

 

5. Experimental Statistics

         Descriptive Statistics

         Normality Test

         Variance Analysis: analysis of completely randomized designs and schemes, of experiments with regular and non-regular treatments, in casualized blocks, factorial and subdivided parcels. Analysis of origin/progeny/plant, simple and triple lattices and hierarchical models.

         Regressions: simple linear, non-linear, multiple and polynomial,            response surface and 3D graphics analysis.

 

         Correlations: calculation of genetic correlations, partial and canonical Pearson and Spearman correlations. Path analysis (involving 1 or 2 chains) and  path analysis under colinearity.

 

         Comparison Between Averages: Tests of Tukey, Duncan, Scheffé and Scott and Knott, Tukey  test with variable number of repetitions, Dunnett,  t test, Tocher, chi-square test to evaluate hypotheses, heterogeneity and factorial linkage.

 

         Stand Correction Methods

 

6. Matrices

         Diagnosis of  Multicolinearity

         Algebra of Matrices

         Solution of the System

         Solution of the System

 

References concerning the software system

 

 

 

CRUZ, C. D. . Programa Genes - Análise multivariada e simulação. 1. ed. Viçosa, MG: Editora UFV, 2006. v. 1. 175 p. 

 

 

 

CRUZ, C. D. . Programa Genes - Biometria. 1. ed. Viçosa,MG: Editora UFV, 2006. v. 1. 382 p. 

 

 

CRUZ, C. D. . Programa Genes - Diversidade Genética. 1. ed. Viçosa, MG: Editora UFV, 2008. v. 1. 278 p.

 

 CRUZ, C. D. . Programa Genes - Estatística Experimental e Matrizes. 1. ed. Viçosa: Editora UFV, 2006. v. 1. 285 p.