1. Importer les données
decath <- read.table("https://r-stat-sc-donnees.github.io/decathlon.csv", sep=";", dec=".", header=TRUE, row.names=1, check.names=FALSE)
2 et 3. Construire la Classification Ascendante Hiérarchique
set.seed(123)
classe <- kmeans(scale(decath[,1:10]), centers = 4, nstart = 100)
classe
K-means clustering with 4 clusters of sizes 12, 13, 3, 13
Cluster means:
100m Longueur Poids Hauteur 400m 110m H Disque Perche Javelot 1500m
1 -0.2713911 -0.06847836 0.11372756 0.3635437 -0.3949090 -0.2543941 0.1073831 -0.9020594 0.2519080 -0.7024505
2 1.0222463 -0.80958444 -0.43964769 -0.3362115 0.9594995 0.9463002 -0.3152426 -0.2467371 -0.1518058 0.2708137
3 -1.5260343 1.92792930 1.65317910 1.2722886 -1.2972738 -1.1781827 1.7272523 0.2550157 1.4378164 0.0869614
4 -0.4195697 0.42788847 -0.04683446 -0.2929723 -0.2955972 -0.4395866 -0.1824770 1.0205576 -0.4125285 0.3575341
Clustering vector:
Sebrle Clay Karpov Macey Warners Zsivoczky Hernu Nool Bernard Schwarzl
3 3 3 1 4 1 1 4 1 4
Pogorelov Schoenbeck Barras Smith Averyanov Ojaniemi Smirnov Qi Drews Parkhomenko
4 4 1 1 4 1 1 1 4 2
Terek Gomez Turi Lorenzo Karlivans Korkizoglou Uldal Casarsa SEBRLE CLAY
4 1 2 2 2 2 2 2 4 4
KARPOV BERNARD YURKOV WARNERS ZSIVOCZKY McMULLEN MARTINEAU HERNU BARRAS NOOL
4 4 2 4 1 1 2 2 2 2
BOURGUIGNON
2
Within cluster sum of squares by cluster:
[1] 55.90563 100.18299 12.62547 73.25410
(between_SS / total_SS = 39.5 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size"
[8] "iter" "ifault"
4. Caractériser les classes
decath.comp <- cbind.data.frame(decath,classe=factor(classe$cluster))
library(FactoMineR)
catdes(decath.comp, num.var = 14)
Link between the cluster variable and the quantitative variables
================================================================
Eta2 P-value
Points 0.7959632 7.543476e-13
Perche 0.6072763 1.213468e-07
100m 0.5935880 2.263340e-07
Longueur 0.5526927 1.290912e-06
400m 0.5006104 9.468719e-06
110m.H 0.4773576 2.150925e-05
Classement 0.3481185 1.114882e-03
Poids 0.2723874 7.667540e-03
Disque 0.2703340 8.052236e-03
Javelot 0.2368840 1.746553e-02
Hauteur 0.2256861 2.241818e-02
1500m 0.2139786 2.895804e-02
Description of each cluster by quantitative variables
=====================================================
$`1`
v.test Mean in category Overall mean sd in category Overall sd p.value
1500m -2.893339 270.825000 279.024878 5.8957039 11.5300118 0.0038117012
Perche -3.715512 4.511667 4.762439 0.1635967 0.2745887 0.0002027925
$`2`
v.test Mean in category Overall mean sd in category Overall sd p.value
100m 4.460054 11.266923 10.99805 0.1819292 0.2597956 8.193887e-06
400m 4.186290 50.723077 49.61634 1.0359268 1.1392975 2.835507e-05
110m.H 4.128702 15.052308 14.60585 0.3659583 0.4660000 3.648172e-05
Classement 3.196166 17.923077 12.12195 7.7604673 7.8217805 1.392670e-03
Longueur -3.532212 7.003846 7.26000 0.2492308 0.3125193 4.120991e-04
Points -4.463711 7655.076923 8005.36585 189.9592918 338.1839416 8.055212e-06
$`3`
v.test Mean in category Overall mean sd in category Overall sd p.value
Points 4.242103 8812.66667 8005.365854 68.78145745 338.18394159 2.214348e-05
Longueur 3.468581 7.87000 7.260000 0.06480741 0.31251927 5.232144e-04
Disque 3.107539 50.16000 44.325610 1.19668988 3.33639725 1.886523e-03
Poids 2.974272 15.84000 14.477073 0.46568945 0.81431175 2.936847e-03
Javelot 2.586808 65.25667 58.316585 6.87867397 4.76759315 9.686955e-03
Hauteur 2.289003 2.09000 1.976829 0.02449490 0.08785906 2.207917e-02
110m.H -2.119695 14.05000 14.605854 0.06531973 0.46599998 3.403177e-02
Classement -2.299627 2.00000 12.121951 0.81649658 7.82178048 2.146935e-02
400m -2.333955 48.12000 49.616341 0.98634004 1.13929751 1.959810e-02
100m -2.745523 10.59667 10.998049 0.18080069 0.25979560 6.041458e-03
$`4`
v.test Mean in category Overall mean sd in category Overall sd p.value
Perche 4.452686 5.046154 4.762439 0.1763536 0.2745887 8.480264e-06
LS0tDQp0aXRsZTogIkstbWVhbnMiDQphdXRob3I6ICJIdXNzb24gZXQgYWwuIg0KZGF0ZTogIjA5LzA5LzIwMTgiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZGVwdGg6IDMNCiAgICB0b2NfZmxvYXQ6IHllcw0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogeWVzDQogICAgdG9jX2RlcHRoOiAnMycNCiAgICB0b2NfZmxvYXQ6IHllcw0KLS0tDQoNCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCBjYWNoZSA9IFRSVUUpDQpgYGANCg0KIyAxLiBJbXBvcnRlciBsZXMgZG9ubsOpZXMNCg0KYGBge3J9DQpkZWNhdGggPC0gcmVhZC50YWJsZSgiaHR0cHM6Ly9yLXN0YXQtc2MtZG9ubmVlcy5naXRodWIuaW8vZGVjYXRobG9uLmNzdiIsIHNlcD0iOyIsIGRlYz0iLiIsIGhlYWRlcj1UUlVFLCByb3cubmFtZXM9MSwgY2hlY2submFtZXM9RkFMU0UpDQpgYGANCg0KIyAyIGV0IDMuIENvbnN0cnVpcmUgbGEgQ2xhc3NpZmljYXRpb24gQXNjZW5kYW50ZSBIacOpcmFyY2hpcXVlDQpgYGB7cixtZXNzYWdlPUZBTFNFLHdhcm5pbmc9RkFMU0V9DQpzZXQuc2VlZCgxMjMpDQpjbGFzc2UgPC0ga21lYW5zKHNjYWxlKGRlY2F0aFssMToxMF0pLCBjZW50ZXJzID0gNCwgbnN0YXJ0ID0gMTAwKQ0KY2xhc3NlDQpgYGANCg0KIyA0LiBDYXJhY3TDqXJpc2VyIGxlcyBjbGFzc2VzDQoNCmBgYHtyLG1lc3NhZ2U9RkFMU0Usd2FybmluZz1GQUxTRX0NCmRlY2F0aC5jb21wIDwtIGNiaW5kLmRhdGEuZnJhbWUoZGVjYXRoLGNsYXNzZT1mYWN0b3IoY2xhc3NlJGNsdXN0ZXIpKQ0KYGBgDQoNCmBgYHtyLG1lc3NhZ2U9RkFMU0Usd2FybmluZz1GQUxTRX0NCmxpYnJhcnkoRmFjdG9NaW5lUikNCmNhdGRlcyhkZWNhdGguY29tcCwgbnVtLnZhciA9IDE0KQ0KYGBg