Biblioteca - Ver ficha   Más información sobre esta sección

Development of a deep learning system for hummed melody identification for BertsoBot

Documento: Inglés. Online
Autor(es)
Alkorta Zabaleta, Asier
Título
Development of a deep learning system for hummed melody identification for BertsoBot / Asier Alkorta Zabaleta ; tutors, Ignacio Arganda-Carreras, Elena Lazkano Ortega.
Publicación
October 2020
Materias
Bertsolaritza
Contenido
Testu osoa
Otros autores
Arganda Carreras, Ignacio ; EHU. Informatika Fakultatea ; Lazkano Ortega, Elena
Descripción física
105 or.
Tipología
Documento
Eduki mota
Trabajo Fin de Master
Notas
Euskal Herriko Unibertsitateko Master Amaierako Lana.
Azalean: Unibertsitate Masterra, Konputazio Ingeniaritza eta Sistema Adimentsuak / Konputazio Zientziak eta Adimen Artifiziala Saila =
Departamento de Ciencias de la Computación e Inteligencia Artificial.
The system introduced in this work tries to solve the problem of melody classification. The proposed approach is based on extracting the spectrogram of the audio of each melody and then using deep supervised learning approaches to classify them into categories. As found out experimentally, the Transfer Learning technique is required alongside Data Augmentation in order to improve the accuracy of the system. The results shown in this thesis, focus further work on this field by providing insight on the performance of different tested Learning Models. Overall, DenseNets have proved themselves the best architectures o use in this context reaching a significant prediction accuracy.
Abstract ...7
1. Introduction ... 9
2. Basic concepts ... 13
2.1. Physics ... 13
2.2. Artificial Neural Networks ... 19
2.2.1. History of ANNs ... 19
2.2.2. How does an AN work? ... 21
2.2.3. Activation functions ... 22
2.2.4. NNs ... 23
2.2.5. The learning process and Backpropagation ... 24
2.2.6. Loss Function and Backpropagation ... 25
2.2.7. Overfitting ... 28
2.2.8. Convolutional Neural Networks ... 30
2.2.8.1. Kernels ... 30
2.2.8.2. Pooling ... 32
2.2.9. Transfer Learning ... 33
2.2.10. Popular NN architectures ... 34
3. State of the art and bibliographic revision... 37
4. Preliminary attempts ... 41
4.1. UrbanSounds 8k dataset... 41
4.2. Proof of concept on the UrbanSounds 8k dataset ... 42
4.3. Results of the attempt ... 43
5. Methods for data processing, approach and testing phases ... 45
5.1. The EHU Bertso dataset ... 45
5.1.1. Data conversion ... 47
5.1.2. Audio filtering ... 48
5.1.3. Dataset preparation ... 51
5.1.3.1. File recordings ... 51
5.1.3.2. Metadata file ... 51
5.1.4. Number of samples ... 53
5.1.4.1. Data augmentation ... 53
5.1.5. Uneven number of samples per class ... 58
5.2. Approach ... 59
5.3. Some words on the challenges of the problem ... 61
5.4. Technological framework ... 62
5.5. The testing method proposed ... 63
6. Experiments and results ... 65
6.1. Round results ... 65
6.1.1. 1st round ... 65
6.1.2. 2nd round ... 68
6.1.3. 3rd round ... 69
6.2. Result analysis ... 71
7. Conclusions and further work ... 77
Appendix A ... 81
a) Full result tables ... 81
i) 1st round ... 81
ii) 2nd round ... 85
iii) 3rd round ... 87
b) Best result data breakdown ... 96
Bibliography ... 103