The Latin loanword data seems to be leading a double life in English usage. Whereas most speakers will use it as a singular/collective construction (i.e. “data is”), many prescriptive sources continue to highlight the word’s origin, and the fact that this Latin plural form (the singular in Latin is “datum” meaning “a given”) should take a plural verb in English as well (“data are”).
The division in usage seems to be down to academic English vs. most other registers. In the spoken section of the British National Corpus 2014, I have so far among 222 examples of data identified only two in which it is clearly used with a plural verb. In both instances the speakers are females aged between 30 and 39, one of whom is a language teacher and the other a non-native speaker. Both have a postgraduate degree, which may imply that their usage of data in the academic context trickled down to other registers, including their spoken interactions.
According to a 2019 Twitter poll by Andy Bechtel, singular agreement is preferred over plural with 4:1. The opposite trend is found in the usage of the word in the Academic subsection of the Corpus of Contemporary American English (“data are”:“data is”= 4:1). As Pam Peters observes (2018: 46), “academia has become a stronghold for resisting the singular use of data.”
The story however does not quite end there; there seems to be more to the variation. In the recent analysis of an unedited 12.5-million-word corpus of International Academic English, Adrian Stenton and I found preferences for data with plural verbs only in the linguistics, but not in the law journals, where the singular is slightly more common. Together with Adrian, I am soon launching a survey among copy-editors to find out more about their views and practices concerning verb agreement with data. We wonder how much variation we may find among those who enforce prescriptive rules and what that can tell us about the future acceptability of singular data agreement in the academic register. All things considered, Bryan Garner’s estimate that singular use of data may remain unacceptable for the next fifty years seems highly unlikely.