• JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
  • JoomlaWorks Simple Image Rotator
 
  Bookmark and Share
 
 
Master's Dissertation
DOI
https://doi.org/10.11606/D.3.2017.tde-23012017-141914
Document
Author
Full name
Rogério Guerra Borin
E-mail
Institute/School/College
Knowledge Area
Date of Defense
Published
São Paulo, 2016
Supervisor
Committee
Silva, Magno Teófilo Madeira da (President)
Attux, Romis Ribeiro de Faissol
Suyama, Ricardo
Title in Portuguese
Detecção de atividade vocal empregando máquinas de Boltzmann restritas.
Keywords in Portuguese
Inteligência artificial
Processamento de sinais
Processamento de som
Telefonia
Abstract in Portuguese
Neste trabalho, uma versão de RBM (Restricted Boltzmann Machine) tendo uma camada de classificação é adaptada a fim de permitir o seu uso com dados definidos num domínio contínuo. Essa adaptação dá origem a uma variante do modelo para o qual são desenvolvidas as regras de atualização de parâmetros dos treinamentos discriminativo, generativo e híbrido. A aplicação da variante como classificador no problema de VAD (Voice Activity Detection) é então investigada. Por meio de simulações envolvendo o corpus NOIZEUS e empregando como entradas do classificador tanto MFCCs (Mel-Frequency Cepstral Coefficients) quanto FBEs (Filter-Bank Energies), são obtidos resultados comparáveis aos de detectores considerados como estado da arte, com um menor custo computacional. A variante de RBM é comparada também com as SVMs (Support Vector Machines) lineares e com núcleo gaussiano. Com treinamento discriminativo, a RBM fornece desempenhos intermediários entre as duas versões de SVM, porém um custo computacional que é consideravelmente inferior aos de ambas. Adicionalmente, um conjunto de medidas do áudio que tiveram seu uso em VAD proposto recentemente são avaliadas com o emprego da RBM com treinamento discriminativo. Embora os resultados não sejam conclusivos, os desempenhos conseguidos indicam que essas medidas não são vantajosas quando comparadas com os tradicionais MFCCs.
Title in English
Voice activity detection employing restricted Boltzmann machines.
Keywords in English
Artificial intelligence
Signal processing
Sound processing
Telephony
Abstract in English
In this work, a type of Restricted Boltzmann Machine (RBM) having a classification layer is adapted to allow its use with data defined in a continuous domain. Such adaptation gives rise to a variant of the model for which the parameter update rules are developed for the discriminative, generative and hybrid types of training. The application of the variant as a classifier to the Voice Activity Detection (VAD) problem is then investigated. By means of simulations involving the corpus NOIZEUS and employing Mel-Frequency Cepstral Coefficients (MFCCs) or Filter-Bank Energies (FBEs) as classifier inputs, results comparable to those of state-of-the-art detectors are achieved with a lower computational cost. The RBM variant is also compared to the linear and Gaussian kernel Support Vector Machines (SVMs). With the discriminative training, the RBM provides intermediate performances between the two SVM types, but a computational cost that is considerably lower than theirs. Additionally, a set of measures from the audio whose application in VAD has been recently proposed are evaluated by employing the RBM with discriminative training. Although the results are not conclusive, the performances obtained indicate that the measures are not advantageous when compared to the traditional MFCCs.
 
WARNING - Viewing this document is conditioned on your acceptance of the following terms of use:
This document is only for private use for research and teaching activities. Reproduction for commercial use is forbidden. This rights cover the whole data about this document as well as its contents. Any uses or copies of this document in whole or in part must include the author's name.
Publishing Date
2017-01-26
 
WARNING: Learn what derived works are clicking here.
All rights of the thesis/dissertation are from the authors
CeTI-SC/STI
Digital Library of Theses and Dissertations of USP. Copyright © 2001-2024. All rights reserved.