• JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
 
  Bookmark and Share
 
 
Doctoral Thesis
DOI
https://doi.org/10.11606/T.76.2017.tde-12092017-081937
Document
Author
Full name
Camilo Akimushkin Valencia
E-mail
Institute/School/College
Knowledge Area
Date of Defense
Published
São Carlos, 2017
Supervisor
Committee
Oliveira Junior, Osvaldo Novais de (President)
Martinez, Alexandre Souto
Mello, Rodrigo Fernandes de
Mesquita, Rickson Coelho
Pardo, Thiago Alexandre Salgueiro
Title in Portuguese
Propriedades de redes aplicadas à atribuição de autoria
Keywords in Portuguese
Línguas naturais
Reconhecimento de autoria
Redes complexas
Séries temporais
Abstract in Portuguese
O reconhecimento de autoria é uma área de pesquisa efervescente, com muitas aplicações, incluindo detecção de plágio, análise de textos históricos, reconhecimento de mensagens terroristas ou falsificação de documentos. Modelos teóricos de redes complexas já são usados para o reconhecimento de autoria, mas alguns aspectos importantes têm sido ignorados. Neste trabalho, exploramos a dinâmica de redes de co-ocorrência e a relação com as palavras que representam os nós e descobrimos que ambas são claras assinaturas de autoria. Com otimização dos descritores da topologia das redes e de algoritmos de aprendizado de máquina, foi possível obter taxas de acerto maiores que 85%, sendo atingida uma taxa de 98.75% em um caso específico, para coleções de 80 livros, cada uma compilada de 8 autores de língua inglesa com 10 livros por autor. Esta tese demonstra que existem ainda aspectos inexplorados das redes de co-ocorrência de textos, o que deve permitir avanços ainda maiores no futuro próximo.
Title in English
Network features for authorship attribution
Keywords in English
Authorship attribution
Complex networks
Spoken languages
Time series
Abstract in English
Authorship attribution is an active research area with many applications, including detection of plagiarism, analysis of historical texts, terrorist message identification or document falsification. Theoretical models of complex networks are already used for authorship attribution, but some issues have been ignored. In this thesis, we explore the dynamics of co-occurrence networks and the role of words, and found that they are both clear signatures of authorship. Using optimized descriptors for the network topology and machine learning algorithms, it has been possible to achieve accuracy rates above 85%, with a rate of 98.75% being reached in a particular case, for collections of 80 books produced by 8 English-speaking writers with 10 books per author. It is also shown that there are still many unexplored aspects of co-occurrence networks of texts, which seems promising for near future developments.
 
WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.
Publishing Date
2017-09-29
 
WARNING: Learn what derived works are clicking here.
All rights of the thesis/dissertation are from the authors
CeTI-SC/STI
Digital Library of Theses and Dissertations of USP. Copyright © 2001-2024. All rights reserved.