# Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method

## Authors

• Ãurea Sousa Department of Mathematics University of Azores 9501-855- Ponta Delgada Portugal
• Osvaldo Silva Department of Mathematics, CMATI, University of Azores, 9501-855-Ponta Delgada
• Helena Bacelar-Nicolau Laboratory of Statistics and Data Analysis, FP, University of Lisbon, 1649-013-Lisboa
• Fernando C. Nicolau Department of Mathematics, FCT, New University of Lisbon, 2829-516-Caparica

## Keywords:

Affinity coefficient, Pearson's correlation coefficient, Monte Carlo simulation method, probability laws

## Abstract

The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients â€‹â€‹are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficientsâ€™ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10.

## References

Ahrens, J. H. and Dieter, U., â€œComputer Generation of Poisson Deviates From Modified Normal Distributionsâ€, ACM Trans. Math. Software, vol. 8, no.2, pp.163-179, 1982.

Aldenderfer, M. and Blashfield, R., Cluster Analysis, Sage University Paper, 44, 1984.

Bacelar-Nicolau, H., â€œContribuiÃ§Ãµes ao Estudo dos Coeficientes de ComparaÃ§Ã£o em AnÃ¡lise ClassificatÃ³riaâ€, PhD Thesis, FCL, Universidade de Lisboa, 1980.

Bacelar-Nicolau, H., â€œTwo Probabilistic Models for Classification of Variables in Frequency Tablesâ€, In: Bock, H. H. (Eds.), Classification and Related Methods of Data Analysis. North Holland, pp. 181-186, 1988.

Bacelar-Nicolau, H., â€œThe Affinity Coefficientâ€, In: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday (Eds.), Berlin: Springer-Verlag, pp. 160-165, 2000.

Bacelar-Nicolau, H., â€œOn the Generalised Affinity Coefficient for Complex Data.Biocybernetics and Biomedical Engineeringâ€, vol. 22, no. 1, pp. 31-42, 2002.

Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., â€œMeasuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Setsâ€, Biocybernetics and Biomedical Engineering, vol. 29, no. 2, pp. 9-18, 2009.

Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., â€œClustering Complex Heterogeneous Data Using a Probabilistic Approachâ€, In Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), published on the CD Proceedings of SMTDA2010 (electronic publication), 2010.

Brandt, S., Data Analysis â€“ Statistical and Computational Methods for Scientists and Engineers, Third ed., Springer - Verlag, New York, 1999.

Box, G. E. P. and Muller, M. E., â€œA Note on the Generation of Random Normal Deviatesâ€, Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610-611, 1958.

Dagpunar, J., Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom, 1988.

Fraser, D. A. S., Non Parametric Methods in Statistics, Chapman and Hall, pp. 235-237, 1975.

Kachitvichyanukul, V., Schmeiser, B., â€œComputer Generation of Hypergeometric Random Variatesâ€, Journal of Statistical Computation and Simulation, vol. 22, pp. 127-145, 1985.

Kemp, C. D., â€œA Modal Method for Generating Binomial Variablesâ€, Commun. Statist. - Theor. Meth, vol. 15, no. 3, pp. 805-813, 1986.

Lâ€™Ecuyer, P., â€œEfficient and Portable Combined Random Number Generatorsâ€, Communications of the ACM, vol. 31, no. 6, pp. 742-751, 1988.

Lerman, I. C., â€œSur l`Analyse des DonnÃ©es PrÃ©alable Ã une Classification Automatiqueâ€, Rev. Math. et Sc. Hum., vol . 32, no. 8, pp. 5-15, 1970.

Lerman, I. C., Classification et Analyse Ordinale des DonnÃ©es, Paris, Dunod, 1981.

Matusita, K., â€œOn the Theory of Statistical Decision FuncÂ¬tionsâ€, Ann. Instit. Stat. Math., vol. III, pp. 1-30, 1951.

Matusita, K., â€œOn the Notion of Affinity of Several Distributions and Some of its Applicationsâ€, Annals of Mathematical Statistics, vol. 19, no. 2, pp. 181-192, 1967.

Nicolau, F. C., â€œCluster Analysis and Distribution Functionâ€, Methods of Operations Research, vol. 45, pp. 431-433, 1983.

Nicolau, F. C. and Bacelar-Nicolau, H., â€œSome Trends in the Classification of Variablesâ€, In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y. (Eds.), Data Science, Classification, and Related Methods. Springer-Verlag, pp. 89-98, 1998.

Nicolau, F. C., Bacelar-Nicolau, H., â€œTeaching and Learning Hierarchical Clustering Probabilistic Models for Categorical Dataâ€, Online IASE and ISI Conference Proceedings, IASE at ISI, 54, IPM-71, 2003.

Tiago de Oliveira, J., â€œThe ï¤-Method for Obtention of Asymptotic Distributionsâ€, Applications. Public. Inst. Statist, Univ. Paris, vol XXVII, pp. 49-70, 1982.

Sousa, Ã., â€œContribuiÃ§Ãµes Ã Metodologia VL e Ãndices de ValidaÃ§Ã£o para Dados de Natureza Complexaâ€, PhD Thesis, Universidade dos AÃ§ores, 2005.

2013-12-27

## How to Cite

Sousa, Ãurea, Silva, O., Bacelar-Nicolau, H., & Nicolau, F. C. (2013). Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method. Asian Journal of Applied Sciences, 1(5). Retrieved from https://www.ajouronline.com/index.php/AJAS/article/view/746

Articles