Development of a deep learning system for hummed melody identification for BertsoBot

Liburutegia - Fitxa ikusi Atal honi buruz gehiago jakiteko

Dokumentua: Ingelesa. Online

Egilea(k): Alkorta Zabaleta, Asier
Izenburua: Development of a deep learning system for hummed melody identification for BertsoBot / Asier Alkorta Zabaleta ; tutors, Ignacio Arganda-Carreras, Elena Lazkano Ortega.
Argitalpena: October 2020
Gaiak: Bertsolaritza
Edukia: Testu osoa

Beste egileak: Arganda Carreras, Ignacio ; EHU. Informatika Fakultatea ; Lazkano Ortega, Elena
Deskribapen fisikoa: 105 or.
Informazio formatua: Dokumentua
Eduki mota: Master Amaierako Lana
Oharrak: Euskal Herriko Unibertsitateko Master Amaierako Lana.
Azalean: Unibertsitate Masterra, Konputazio Ingeniaritza eta Sistema Adimentsuak / Konputazio Zientziak eta Adimen Artifiziala Saila =
Departamento de Ciencias de la Computación e Inteligencia Artificial.

The system introduced in this work tries to solve the problem of melody classification. The proposed approach is based on extracting the spectrogram of the audio of each melody and then using deep supervised learning approaches to classify them into categories. As found out experimentally, the Transfer Learning technique is required alongside Data Augmentation in order to improve the accuracy of the system. The results shown in this thesis, focus further work on this field by providing insight on the performance of different tested Learning Models. Overall, DenseNets have proved themselves the best architectures o use in this context reaching a significant prediction accuracy.

Abstract ...7
1. Introduction ... 9
2. Basic concepts ... 13
2.1. Physics ... 13
2.2. Artificial Neural Networks ... 19
2.2.1. History of ANNs ... 19
2.2.2. How does an AN work? ... 21
2.2.3. Activation functions ... 22
2.2.4. NNs ... 23
2.2.5. The learning process and Backpropagation ... 24
2.2.6. Loss Function and Backpropagation ... 25
2.2.7. Overfitting ... 28
2.2.8. Convolutional Neural Networks ... 30
2.2.8.1. Kernels ... 30
2.2.8.2. Pooling ... 32
2.2.9. Transfer Learning ... 33
2.2.10. Popular NN architectures ... 34
3. State of the art and bibliographic revision... 37
4. Preliminary attempts ... 41
4.1. UrbanSounds 8k dataset... 41
4.2. Proof of concept on the UrbanSounds 8k dataset ... 42
4.3. Results of the attempt ... 43
5. Methods for data processing, approach and testing phases ... 45
5.1. The EHU Bertso dataset ... 45
5.1.1. Data conversion ... 47
5.1.2. Audio filtering ... 48
5.1.3. Dataset preparation ... 51
5.1.3.1. File recordings ... 51
5.1.3.2. Metadata file ... 51
5.1.4. Number of samples ... 53
5.1.4.1. Data augmentation ... 53
5.1.5. Uneven number of samples per class ... 58
5.2. Approach ... 59
5.3. Some words on the challenges of the problem ... 61
5.4. Technological framework ... 62
5.5. The testing method proposed ... 63
6. Experiments and results ... 65
6.1. Round results ... 65
6.1.1. 1st round ... 65
6.1.2. 2nd round ... 68
6.1.3. 3rd round ... 69
6.2. Result analysis ... 71
7. Conclusions and further work ... 77
Appendix A ... 81
a) Full result tables ... 81
i) 1st round ... 81
ii) 2nd round ... 85
iii) 3rd round ... 87
b) Best result data breakdown ... 96
Bibliography ... 103

Datu-base honetan eskainitako informazioari buruz jakiteko, kontsultatu lege oharra

BDB Bertsolaritzaren datu-basea

Liburutegia - Fitxa ikusi Atal honi buruz gehiago jakiteko

Development of a deep learning system for hummed melody identification for BertsoBot