Distribution of the Affinity Coefficient between Variables based on the Monte Carlo Simulation Method
Keywords:Affinity coefficient, Pearson's correlation coefficient, Monte Carlo simulation method, probability laws
The affinity coefficient and its extensions have both been used in hierarchical and non-hierarchical Cluster Analysis. The purpose of the present empirical study on the distribution of the basic and the generalized affinity coefficients and on the distribution of the standardized affinity coefficient, by the method of Wald and Wolfowitz, under different assumptions, is to assess the effect of the statistical probability distributions of the variables (columns) of the initial data matrix, and of the respective parameters, in the distribution of the values of these coefficients. We present some results concerning the asymptotic distribution of the referred coefficients under the assumption that the variables (for which the values of these coefficients â€‹â€‹are calculated) are independent and have statistical probability distributions specified apriori. In this distributional study, based on the Monte Carlo simulation method, we considered ten well-known statistical probability distributions with different variations of the respective parameters. The simulation studies lead to the conclusion that the coefficientsâ€™ convergence for the normal distribution is quite fast and, in general, a good approximation is obtained for small sample sizes, that is for sample sizes above 20 and in many cases for sample sizes above 10.
Ahrens, J. H. and Dieter, U., â€œComputer Generation of Poisson Deviates From Modified Normal Distributionsâ€, ACM Trans. Math. Software, vol. 8, no.2, pp.163-179, 1982.
Aldenderfer, M. and Blashfield, R., Cluster Analysis, Sage University Paper, 44, 1984.
Bacelar-Nicolau, H., â€œContribuiÃ§Ãµes ao Estudo dos Coeficientes de ComparaÃ§Ã£o em AnÃ¡lise ClassificatÃ³riaâ€, PhD Thesis, FCL, Universidade de Lisboa, 1980.
Bacelar-Nicolau, H., â€œTwo Probabilistic Models for Classification of Variables in Frequency Tablesâ€, In: Bock, H. H. (Eds.), Classification and Related Methods of Data Analysis. North Holland, pp. 181-186, 1988.
Bacelar-Nicolau, H., â€œThe Affinity Coefficientâ€, In: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, H.-H. Bock and E. Diday (Eds.), Berlin: Springer-Verlag, pp. 160-165, 2000.
Bacelar-Nicolau, H., â€œOn the Generalised Affinity Coefficient for Complex Data.Biocybernetics and Biomedical Engineeringâ€, vol. 22, no. 1, pp. 31-42, 2002.
Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., â€œMeasuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Setsâ€, Biocybernetics and Biomedical Engineering, vol. 29, no. 2, pp. 9-18, 2009.
Bacelar-Nicolau, H.; Nicolau, F.C.; Sousa, Ã.; Bacelar-Nicolau, L., â€œClustering Complex Heterogeneous Data Using a Probabilistic Approachâ€, In Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), published on the CD Proceedings of SMTDA2010 (electronic publication), 2010.
Brandt, S., Data Analysis â€“ Statistical and Computational Methods for Scientists and Engineers, Third ed., Springer - Verlag, New York, 1999.
Box, G. E. P. and Muller, M. E., â€œA Note on the Generation of Random Normal Deviatesâ€, Annals of Mathematical Statistics, vol. 29, no. 2, pp. 610-611, 1958.
Dagpunar, J., Principles of Random Variate Generation, Clarendon Press, Oxford, United Kingdom, 1988.
Fraser, D. A. S., Non Parametric Methods in Statistics, Chapman and Hall, pp. 235-237, 1975.
Kachitvichyanukul, V., Schmeiser, B., â€œComputer Generation of Hypergeometric Random Variatesâ€, Journal of Statistical Computation and Simulation, vol. 22, pp. 127-145, 1985.
Kemp, C. D., â€œA Modal Method for Generating Binomial Variablesâ€, Commun. Statist. - Theor. Meth, vol. 15, no. 3, pp. 805-813, 1986.
Lâ€™Ecuyer, P., â€œEfficient and Portable Combined Random Number Generatorsâ€, Communications of the ACM, vol. 31, no. 6, pp. 742-751, 1988.
Lerman, I. C., â€œSur l`Analyse des DonnÃ©es PrÃ©alable Ã une Classification Automatiqueâ€, Rev. Math. et Sc. Hum., vol . 32, no. 8, pp. 5-15, 1970.
Lerman, I. C., Classification et Analyse Ordinale des DonnÃ©es, Paris, Dunod, 1981.
Matusita, K., â€œOn the Theory of Statistical Decision FuncÂ¬tionsâ€, Ann. Instit. Stat. Math., vol. III, pp. 1-30, 1951.
Matusita, K., â€œOn the Notion of Affinity of Several Distributions and Some of its Applicationsâ€, Annals of Mathematical Statistics, vol. 19, no. 2, pp. 181-192, 1967.
Nicolau, F. C., â€œCluster Analysis and Distribution Functionâ€, Methods of Operations Research, vol. 45, pp. 431-433, 1983.
Nicolau, F. C. and Bacelar-Nicolau, H., â€œSome Trends in the Classification of Variablesâ€, In: Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H.-H., Baba, Y. (Eds.), Data Science, Classification, and Related Methods. Springer-Verlag, pp. 89-98, 1998.
Nicolau, F. C., Bacelar-Nicolau, H., â€œTeaching and Learning Hierarchical Clustering Probabilistic Models for Categorical Dataâ€, Online IASE and ISI Conference Proceedings, IASE at ISI, 54, IPM-71, 2003.
Tiago de Oliveira, J., â€œThe ï¤-Method for Obtention of Asymptotic Distributionsâ€, Applications. Public. Inst. Statist, Univ. Paris, vol XXVII, pp. 49-70, 1982.
Sousa, Ã., â€œContribuiÃ§Ãµes Ã Metodologia VL e Ãndices de ValidaÃ§Ã£o para Dados de Natureza Complexaâ€, PhD Thesis, Universidade dos AÃ§ores, 2005.
How to Cite
- Papers must be submitted on the understanding that they have not been published elsewhere (except in the form of an abstract or as part of a published lecture, review, or thesis) and are not currently under consideration by another journal published by any other publisher.
- It is also the authors responsibility to ensure that the articles emanating from a particular source are submitted with the necessary approval.
- The authors warrant that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required.
- The authors ensure that all the references carefully and they are accurate in the text as well as in the list of references (and vice versa).
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Attribution-NonCommercial 4.0 International that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- The journal/publisher is not responsible for subsequent uses of the work. It is the author's responsibility to bring an infringement action if so desired by the author.