CLUSTER ANALYSIS WITH WEIGHTED BINARY VARIABLES

  • M. K. Kamundi Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
  • J. M. Kihoro Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
  • M. Mwalili Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
  • B. Kiula 4Research, Consultancy and Training, Department of ICT Directorate, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Keywords: Binary variables, binary data, weights, cluster analysis, cluster membership, similarity, distance, dendrogram

Abstract

The objective of this study was to discover unique groupings/clusters resulting from performing cluster
analysis with weighted binary variables and with binary proximity measures. Cluster analysis techniques were
applied to both the simulated binary data and also to the real/survey data that was initially collected to
measure the ICT penetration among people in a certain county council in Kenya. For the survey data, only a
few indicators (binary variables) were selected for this study. The clustering binary variables used were based
on ownership of a Mobile Phone, a Desktop, a Laptop and a Palmtop, for the simulated data; whereas for the
survey data they were based on usage of the following: Mobile Data Processing, Mobile Internet, Computer
Internet, and Computer Data Processing. For both the simulated and the real/survey data, the names used
were fictitious. Ten clusters were identified for the simulated unweighted binary data whereas for the
simulated weighted binary data, there were four clusters. Twelve clusters were identified for the real/survey
unweighted binary data whereas there were seven clusters for the real weighted binary data. Results of cluster
analyses for both the simulated binary data and the real/survey binary data revealed that when the binary
variables were weighted very different and unique clusters were formed. Weighting of binary variables was
useful in showing that some variables are more important than others and when cluster analysis was
performed using the weighted binary variables, unique clusters were formed that portrayed the importance of
certain variables.

Published
2019-05-20