Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method

Authors

  • Áurea Sousa Department of Mathematics University of Azores 9501-855- Ponta Delgada Portugal
  • Osvaldo Silva Department of Mathematics, CMATI, University of Azores, 9501-855-Ponta Delgada
  • Helena Bacelar-Nicolau Laboratory of Statistics and Data Analysis, FP, University of Lisbon, 1649-013-Lisboa
  • Fernando C. Nicolau Department of Mathematics, FCT, New University of Lisbon, 2829-516-Caparica

Keywords:

Affinity coefficient, Pearson's correlation coefficient, Monte Carlo simulation method, probability laws

Abstract

The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients ​​are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficients’ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10.

References

Ahrens, J. H. and Dieter, U., “Computer Generation of Poisson Deviates From Modified Normal Distributionsâ€, ACM Trans. Math. Software, vol. 8, no.2, pp.163-179, 1982.

Aldenderfer, M. and Blashfield, R., Cluster Analysis, Sage University Paper, 44, 1984.

Bacelar-Nicolau, H., “Contribuições ao Estudo dos Coeficientes de Comparação em Análise Classificatóriaâ€, PhD Thesis, FCL, Universidade de Lisboa, 1980.

Bacelar-Nicolau, H., “Two Probabilistic Models for Classification of Variables in Frequency Tablesâ€, In: Bock, H. H. (Eds.), Classification and Related Methods of Data Analysis. North Holland, pp. 181-186, 1988.

Bacelar-Nicolau, H., “The Affinity Coefficientâ€, In: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday (Eds.), Berlin: Springer-Verlag, pp. 160-165, 2000.

Bacelar-Nicolau, H., “On the Generalised Affinity Coefficient for Complex Data.Biocybernetics and Biomedical Engineeringâ€, vol. 22, no. 1, pp. 31-42, 2002.

Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., “Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Setsâ€, Biocybernetics and Biomedical Engineering, vol. 29, no. 2, pp. 9-18, 2009.

Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., “Clustering Complex Heterogeneous Data Using a Probabilistic Approachâ€, In Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), published on the CD Proceedings of SMTDA2010 (electronic publication), 2010.

Brandt, S., Data Analysis – Statistical and Computational Methods for Scientists and Engineers, Third ed., Springer - Verlag, New York, 1999.

Box, G. E. P. and Muller, M. E., “A Note on the Generation of Random Normal Deviatesâ€, Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610-611, 1958.

Dagpunar, J., Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom, 1988.

Fraser, D. A. S., Non Parametric Methods in Statistics, Chapman and Hall, pp. 235-237, 1975.

Kachitvichyanukul, V., Schmeiser, B., “Computer Generation of Hypergeometric Random Variatesâ€, Journal of Statistical Computation and Simulation, vol. 22, pp. 127-145, 1985.

Kemp, C. D., “A Modal Method for Generating Binomial Variablesâ€, Commun. Statist. - Theor. Meth, vol. 15, no. 3, pp. 805-813, 1986.

L’Ecuyer, P., “Efficient and Portable Combined Random Number Generatorsâ€, Communications of the ACM, vol. 31, no. 6, pp. 742-751, 1988.

Lerman, I. C., “Sur l`Analyse des Données Préalable à une Classification Automatiqueâ€, Rev. Math. et Sc. Hum., vol . 32, no. 8, pp. 5-15, 1970.

Lerman, I. C., Classification et Analyse Ordinale des Données, Paris, Dunod, 1981.

Matusita, K., “On the Theory of Statistical Decision Func¬tionsâ€, Ann. Instit. Stat. Math., vol. III, pp. 1-30, 1951.

Matusita, K., “On the Notion of Affinity of Several Distributions and Some of its Applicationsâ€, Annals of Mathematical Statistics, vol. 19, no. 2, pp. 181-192, 1967.

Nicolau, F. C., “Cluster Analysis and Distribution Functionâ€, Methods of Operations Research, vol. 45, pp. 431-433, 1983.

Nicolau, F. C. and Bacelar-Nicolau, H., “Some Trends in the Classification of Variablesâ€, In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y. (Eds.), Data Science, Classification, and Related Methods. Springer-Verlag, pp. 89-98, 1998.

Nicolau, F. C., Bacelar-Nicolau, H., “Teaching and Learning Hierarchical Clustering Probabilistic Models for Categorical Dataâ€, Online IASE and ISI Conference Proceedings, IASE at ISI, 54, IPM-71, 2003.

Tiago de Oliveira, J., “The ï¤-Method for Obtention of Asymptotic Distributionsâ€, Applications. Public. Inst. Statist, Univ. Paris, vol XXVII, pp. 49-70, 1982.

Sousa, Ã., “Contribuições à Metodologia VL e Ãndices de Validação para Dados de Natureza Complexaâ€, PhD Thesis, Universidade dos Açores, 2005.

Downloads

Published

2013-12-27

How to Cite

Sousa, Áurea, Silva, O., Bacelar-Nicolau, H., & Nicolau, F. C. (2013). Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method. Asian Journal of Applied Sciences, 1(5). Retrieved from https://www.ajouronline.com/index.php/AJAS/article/view/746