The classification of attribution principles written by the academician V.V.Vinogradov divides all attribution methods into groups based on their subjectivity and objectivity. The use of mathematical methods of linguistic analysis is the most fruitful objective method of attribution since the plethora of mathematical methods of analysis of objects makes it possible to successfully use these mathematical methods for attribution.
Historical-documental and philological methods of research dominated in the field of attribution for a long time. Subjective methods of attribution were used in order to find the particularities of each author. In accordance with this method, the external details of an author’s individual style, such as the author’s favorite words, terms, or expressions, were subjectively selected.
Mathematical-statistical methods of attribution were first used in the beginning of the 19th century to attribute authorship of works written in ancient times. The research work done by Campbell (Campbell, 1867) and Lutoslawski (Lutoslawski, 1897) was based on establishing the most rare (“original”) words used by a given author, as well as establishing the position of determinate and determined words.
Linguistic-mathematical methods gradually came to be used even more widely when describing an author’s individual style. As a result data was gathered about the properties of language units and a special scientific apparatus for attribution of texts was formed. The use of specially designed indices for evaluating the lexical structure of a text resulted from researchers’ striving to develop a new universal apparatus for objective lexical analysis. Many researchers developed new methods for evaluating the lexical content of texts. These new methods each had their own faults, while none of them took into account already-existing solutions in joint fields of knowledge, for example in mathematics, where the theory of recognition of patterns for describing objects of a various nature had already been successfully used for a long time.
One of the most recent research works on attribution of texts based on lexical analysis was done by Dominique Labbé, who suggested a formula in 2001 for calculating the “intertextual distance”. The research done by D. Labbé reveals the serious methodological and statistic-probability shortcomings in previously-used methods. These shortcomings are related first of all to the lack of authenticity of results of attribution made on the basis of an analysis on just one lexical level. When making a stylistic analysis with the aim of attributing texts a study of the lexical contents of a text should be supplemented with additional data on other language levels as well, and foremost on the syntactic structure of the text being analyzed.
The period from the end of the 1970’s to the present day has been characterized by strong development of computer calculation technology and software, as a result of which more and more researchers have become interested in using computer technologies for data processing when analyzing texts in their syntactical, grammatical, morphemic, and lexical aspects.
Researchers’ efforts to employ automatic stylistic diagnostics of texts and an automated search of individual characteristics of an author’s style have led to preference being given in stylistic analyses to any other language level besides the syntactic level. The dependence of stylistic analysis on computerized data processing and on methods which employ computer equipment leads to a simplification of the methodological basis of research. This, in the end, makes methods of attribution less effective.
The history of the development of methods of attribution has led to an understanding that an effective method for stylistic analysis with the aim of determining authorship should have the following characteristics:
- Characteristics of the whole text, and not of individual sentences, should be determined with the help of stylistic analysis;
- Description of the text should cover various levels of the language system, and the structure of the text should be analyzed as well as the text’s vocabulary.
- It is necessary to use multidimensional classifications.
Furthermore, studying the links between parameters has shown that simply increasing the number of parameters does not lead to an increase in the effectiveness of analysis. Thus it is necessary to have a special mathematical apparatus for estimating the links between parameters. Those informational parameters should be chosen which exclude excess parameters which have a strong correlation between one another.
All of these requirements are fulfilled by such a ready-to-use mathematical apparatus as the theory of pattern recognition. Methods of pattern recognition were first used when attributing anonymous and pseudonymous works on the basis of an author’s individual style characteristics in the work of M.A. Marusenko in 1990 (Marusenko, 1990).
>>> more about the mathematical apparatus developed by Marusenko