GWS 2023: What do you like most?
By Jonathan Freitag
In a natural counterpoint and follow up to the earlier analysis that examined what survey respondents liked least about wargaming in Wargames, Soldiers, and Strategy's 2023 The Great Wargaming Survey (see What Do You Like Least About Miniatures Wargaming), today's instalment focuses on responses to the question: What do you like most about miniatures wargaming?
As with the Least Liked question, responses were captured as unstructured text with about a 2,100 character limit. Respondents could (and did) enter any text they wished. For the Most Liked question, the survey received 7,438 non-blank responses in this field. The total record count for Most Liked is slightly more than the 7,278 responses to the Least Liked question. Perhaps the count increase reflects a bias toward positive commentary? You know how if you have nothing nice to say, say nothing?
Structuring the data
To make sense out of these unstructured texts, machine learning techniques are employed. Again, these text analytics techniques focus on cluster analysis and principal component analysis (PCA). The goal of these techniques is to gain insight into the underlying data structure and reduce this large body of text down to a manageable number of associated "words" without losing the essential information contained within each response.
To begin analysis, a series of text transformations were applied to each response to standardize a collection of word tokens. After word tokenization and preprocessing, 4,144 unique terms and counts of term frequencies were created for follow-on analysis steps. By contrast, the Least Liked word tokenization resulted in 4,918 terms. After removing terms having near-zero variance, these 4,144 terms were reduced down to only fourteen word tokens. These remaining tokens are:
- games
- people
- painting
- miniatures
- history
- research
- collect
- model
- terrain
- table
- fun
- creative
- army
- build
What is especially interesting about the final list of tokens presented here is that three of the tokens (people, painting, miniatures) made the final list in the Least Liked analysis as well. What does this result suggest? Well, there seems to be a love/hate relationship with other wargamers and painting miniatures. Even without word association, this list of fourteen tokens paints an interesting snapshot of the factors enjoyed in miniatures wargaming.
Dendrogram analysis
Having reduced the tokens down from more than 4,000 to 14, time to examine the results from cluster analysis of categorical data. The dendrogram produced by cluster analysis is shown below:
The results in the graphic above illustrate an interesting and intuitive clustering of the words. Notice that "games" (and gaming) forms a cluster by itself with all else falling into a second cluster. Given that the hobby is "miniatures wargaming" in name, it is reassuring to see that gaming represents an important facet of the hobby. The two-cluster solution with "gaming" and everything else is illustrated in the dendrogram below:
In the three-cluster solution, "people" spins off from the large, 13-token group identified in the two-cluster solution. With three-clusters, "games" and "people" show the most separation from the remaining word tokens. For most respondents, games and other wargamers are the most frequently liked facets of wargaming.
While the cluster analysis iterations could keep slicing the dendrogram to increase the number of clusters, I stop at the four-cluster solution. With four clusters, painting miniatures separates itself out from the pack. From the survey, three universal drivers for wargaming happiness seem to fall to gaming, people, and painting miniatures. Based upon my own experience, I agree with the survey results but, of course, opinions and rationales may differ. The survey does produce intuitive results from all of these unstructured texts.
One final point before moving on to principal component analysis. Notice how the tokens, research/collect, terrain/table, fun/creative, and army/build pair up within the clustering? Very interesting and intuitive given the nearly 7,500 responses and the 4,100 unique terms. In the prior cluster analysis of Least Liked, some have suggested this is akin to magic.
Principal component analysis
Now we examine PCA. As in the Least Liked analysis, details for the technique are left for the reader. One useful visualization tool is PCA Variable Plotting which illustrates the relative importance (loadings) of each of the fourteen variables in 2D space. Color and length of each vector denotes its contribution to the PCA analysis.
Graphical analysis of PCA results tends to lend itself (in most cases) to inferences that are easier to interpret. While only two of the dimensional plots are illustrated here, only one of these dimensions lends itself to easy interpretation.
When considering the DIM1 (x-axis) in the graph below, there is no clear interpretation of the yellow and green semicircles. Only the "creative" token projects into the negative (yellow) space of DIM1. With all other tokens in the positive (green) space of DIM1, inferences are fuzzy.
When examining the PCA Variables plot in the DIM2 (y-axis) space, meaningful inferences regarding the underlying classification are easier to distinguish. In this case, the sign of loadings suggests a contrast between the Social (positive) and Solo (negative) principal components.
A different visualisation
Another visualization technique in the PCA toolkit focuses on examining scree plots. Scree plots allow a quick visual assessment of the relative importance of factors and principal components as well as suggesting an underlying classification or interpretation of each principal component. Interpreting scree plots can be subjective, but assigning a classification to each of the first three principal components, in this exercise, seems straightforward.
Results from PCA on the Most Liked unstructured text responses suggest that the principal components separate into three classifications of respondents. These classifications I label as Craftsman, Gamer, and Historian. The scree plots for each of these principal components and their contributing factors are illustrated below:
The significant factors comprising the Craftsman principal component are 'painting', 'miniatures', 'build', 'terrain', 'collect'. The factors driving the Gamer principal component are 'people', 'games', 'fun'. Finally, the Historian principal component sees 'history' and 'research' as the significant factors. Now, these are remarkable results in that we can transform roughly 7,500 unstructured text responses, reduce the dimensionality of the data (from 4,144 word tokens down to 14) and retain sufficient information to assign meaningful classifications to the principal components.
Do these 14 Most Liked factors reflect your own wargaming preferences? If so, does the dendrogram hierarchy for the survey population fit your experience? If not, what is your Most Liked factor? Finally, do the classifications of Social/Solo and Craftsman/Gamer/Historian fit your own profile?
In the Least Liked and Most Liked analyses, there were many insights to uncover from very large bodies of unstructured texts. For me, these exercises have been a fascinating glimpse behind the curtain.
1 comment
I find that mathematical stuff very difficult to get my head around.