Friday, January 16, 2015

The Trouble with Data

While browsing my usual purveyors of up-to-date academic research in museology and museum studies, I have come across what I would consider a notable effort in making sense of the varied, often criticised yet always poorly understood panorama of italian museums: Sociometrica's "Musei Index: Cultura e Big Data", a quantitative survey of online reactions by visitors to italian cultural sites, made under the direction of economist Antonio Preiti. This research piece semantically mined over 90 thousand online reactions to important italian cultural sites (ranging from Torino's Egyptian Museum to the Colosseum and Villa Tivoli) in order to capture, order and classify the 'measure of attraction' of a site: that is to say, how well it ranks according to the emotional reactions expressed, after the visit. Essentially, an exercise in the gathering and (surface) analysis of Big Data; which, while a well established practice abroad and not entirely new in the Italian context (see, for example, commentary by Elisa Bonacini), in this country has so far not been attempted in a systematic manner, nor has gathered much interest from administrative policymakers (at least that's my perception: I would gladly hear dissenting voices).

There surely is a value in daring innovation and novelty, and therefore I can easily symphatize with the enthusiastic tone throughout the report, fortunately well balanced by acknowledgement that much still needs to be done in order to transform these sporadic experiments into a common practice. Nonetheless, there are a few key arguments that leave me slightly perpelexed, and perhaps merit further thinking by Sociometrica, me or anyone else that feels up to the challenge.

A first possible problem that I see has to do with the materials that have been mined for information. According to the report, the semantic analysis has avoided factoring in 'neutral information' (such as opening hours, directions and so on) and focused on information that actually expressed a judgement, be it positive or negative. Conjugated with the large array of data considered (almost 90000 entries) the authors would seem justified in thinking that their research 'fully represents the point of view of a foreigner's experience in an Italian place of culture' (p. 8) without any a priori jdugement (p.6).
Nonetheless, I feel that the whole story is probably more complicated than this. First of all, the categories of materials analysed remain overall quite nebulous and poorly defined: as any Internet denizen will know, there is a gamut of extremely 'opinionated' literature around great cultural attractions to be found online; more often than not it is vague, or even malicious with regard to who actually produced it, and it is not rare to see automatically generated pages taking the form of an enthusiastic travelogue for advertising, spamming or scamming purposes. This, added to the necessary incompleteness of any dataset culled from an online setting; and to the poorly understood nature of opinion-forming and expression in the remote Web context, makes statements such as the above disingenuous. One would wish that more care was put in selecting, or at least in explaining the selection of the actual sources, since the mere appearance of relevant semantic data cannot and should not be the only deciding factor.

Equally suspect are, therefore, assertions that the resulting report would be an unbiased, authentic and immediate expression of visitors, clean from judgement and interpretations (p. 6). We should all, by now, be familiar with the idea that gathering Big Data is in no way a guarantee of impartiality and transparency: every data set that is not the whole will always be selective, and the macro-scale of Big Data does not resolve this fundamental issue that underlies quantitative analysis.
Similarly, a report such as this one is bound to have all kinds of interpretations and a priori evaluations involved: starting from the selection of sites, fifteen out of hundreds of thousands (Istat). Then comes the choice of grading reactions from 0 to 100, establishing reasonable but still arbitrary parameters as to what constitutes a positive or a negative response (p.7).

These considerations do not lessen the value of the report, and its innovative potential within the Italian context; nonetheless, as it often happens in first applications of novel systems and metodologies, proclamations and enthusiasm must be kept in check and counterbalanced by the knowledge that no methodology and no system can or should aspire to express the whole of an experience; especially when such experience is eminently qualitative yet measured with quantitative instruments.

"Shiny Ad Catalogue" by estroitia

I must admit I am almost entirely unfamiliar with the Idolmast er thing - as much of a weeb as I am, I also am entirely uninterested in the...