Development of a deep learning system for hummed melody identification for BertsoBot

Biblioteca - Ver ficha Más información sobre esta sección

Documento: Inglés. Online

Autor(es): Alkorta Zabaleta, Asier
Título: Development of a deep learning system for hummed melody identification for BertsoBot / Asier Alkorta Zabaleta ; tutors, Ignacio Arganda-Carreras, Elena Lazkano Ortega.
Publicación: October 2020
Materias: Bertsolaritza
Contenido: Testu osoa

Otros autores: Arganda Carreras, Ignacio ; EHU. Informatika Fakultatea ; Lazkano Ortega, Elena
Descripción física: 105 or.
Tipología: Documento
Eduki mota: Trabajo Fin de Master
Notas: Euskal Herriko Unibertsitateko Master Amaierako Lana.
Azalean: Unibertsitate Masterra, Konputazio Ingeniaritza eta Sistema Adimentsuak / Konputazio Zientziak eta Adimen Artifiziala Saila =
Departamento de Ciencias de la Computación e Inteligencia Artificial.

The system introduced in this work tries to solve the problem of melody classification. The proposed approach is based on extracting the spectrogram of the audio of each melody and then using deep supervised learning approaches to classify them into categories. As found out experimentally, the Transfer Learning technique is required alongside Data Augmentation in order to improve the accuracy of the system. The results shown in this thesis, focus further work on this field by providing insight on the performance of different tested Learning Models. Overall, DenseNets have proved themselves the best architectures o use in this context reaching a significant prediction accuracy.

Abstract ...7
1. Introduction ... 9
2. Basic concepts ... 13
2.1. Physics ... 13
2.2. Artificial Neural Networks ... 19
2.2.1. History of ANNs ... 19
2.2.2. How does an AN work? ... 21
2.2.3. Activation functions ... 22
2.2.4. NNs ... 23
2.2.5. The learning process and Backpropagation ... 24
2.2.6. Loss Function and Backpropagation ... 25
2.2.7. Overfitting ... 28
2.2.8. Convolutional Neural Networks ... 30
2.2.8.1. Kernels ... 30
2.2.8.2. Pooling ... 32
2.2.9. Transfer Learning ... 33
2.2.10. Popular NN architectures ... 34
3. State of the art and bibliographic revision... 37
4. Preliminary attempts ... 41
4.1. UrbanSounds 8k dataset... 41
4.2. Proof of concept on the UrbanSounds 8k dataset ... 42
4.3. Results of the attempt ... 43
5. Methods for data processing, approach and testing phases ... 45
5.1. The EHU Bertso dataset ... 45
5.1.1. Data conversion ... 47
5.1.2. Audio filtering ... 48
5.1.3. Dataset preparation ... 51
5.1.3.1. File recordings ... 51
5.1.3.2. Metadata file ... 51
5.1.4. Number of samples ... 53
5.1.4.1. Data augmentation ... 53
5.1.5. Uneven number of samples per class ... 58
5.2. Approach ... 59
5.3. Some words on the challenges of the problem ... 61
5.4. Technological framework ... 62
5.5. The testing method proposed ... 63
6. Experiments and results ... 65
6.1. Round results ... 65
6.1.1. 1st round ... 65
6.1.2. 2nd round ... 68
6.1.3. 3rd round ... 69
6.2. Result analysis ... 71
7. Conclusions and further work ... 77
Appendix A ... 81
a) Full result tables ... 81
i) 1st round ... 81
ii) 2nd round ... 85
iii) 3rd round ... 87
b) Best result data breakdown ... 96
Bibliography ... 103

BDB Bertsolaritzaren datu-basea

Biblioteca - Ver ficha Más información sobre esta sección

Development of a deep learning system for hummed melody identification for BertsoBot