Liburutegia - Fitxa ikusi   Atal honi buruz gehiago jakiteko

Development of a deep learning system for hummed melody identification for BertsoBot

Dokumentua: Ingelesa. Online
Egilea(k)
Alkorta Zabaleta, Asier
Izenburua
Development of a deep learning system for hummed melody identification for BertsoBot / Asier Alkorta Zabaleta ; tutors, Ignacio Arganda-Carreras, Elena Lazkano Ortega.
Argitalpena
October 2020
Gaiak
Bertsolaritza
Edukia
Testu osoa
Beste egileak
Arganda Carreras, Ignacio ; EHU. Informatika Fakultatea ; Lazkano Ortega, Elena
Deskribapen fisikoa
105 or.
Informazio formatua
Dokumentua
Eduki mota
Master Amaierako Lana
Oharrak
Euskal Herriko Unibertsitateko Master Amaierako Lana.
Azalean: Unibertsitate Masterra, Konputazio Ingeniaritza eta Sistema Adimentsuak / Konputazio Zientziak eta Adimen Artifiziala Saila =
Departamento de Ciencias de la Computación e Inteligencia Artificial.
The system introduced in this work tries to solve the problem of melody classification. The proposed approach is based on extracting the spectrogram of the audio of each melody and then using deep supervised learning approaches to classify them into categories. As found out experimentally, the Transfer Learning technique is required alongside Data Augmentation in order to improve the accuracy of the system. The results shown in this thesis, focus further work on this field by providing insight on the performance of different tested Learning Models. Overall, DenseNets have proved themselves the best architectures o use in this context reaching a significant prediction accuracy.
Abstract ...7
1. Introduction ... 9
2. Basic concepts ... 13
2.1. Physics ... 13
2.2. Artificial Neural Networks ... 19
2.2.1. History of ANNs ... 19
2.2.2. How does an AN work? ... 21
2.2.3. Activation functions ... 22
2.2.4. NNs ... 23
2.2.5. The learning process and Backpropagation ... 24
2.2.6. Loss Function and Backpropagation ... 25
2.2.7. Overfitting ... 28
2.2.8. Convolutional Neural Networks ... 30
2.2.8.1. Kernels ... 30
2.2.8.2. Pooling ... 32
2.2.9. Transfer Learning ... 33
2.2.10. Popular NN architectures ... 34
3. State of the art and bibliographic revision... 37
4. Preliminary attempts ... 41
4.1. UrbanSounds 8k dataset... 41
4.2. Proof of concept on the UrbanSounds 8k dataset ... 42
4.3. Results of the attempt ... 43
5. Methods for data processing, approach and testing phases ... 45
5.1. The EHU Bertso dataset ... 45
5.1.1. Data conversion ... 47
5.1.2. Audio filtering ... 48
5.1.3. Dataset preparation ... 51
5.1.3.1. File recordings ... 51
5.1.3.2. Metadata file ... 51
5.1.4. Number of samples ... 53
5.1.4.1. Data augmentation ... 53
5.1.5. Uneven number of samples per class ... 58
5.2. Approach ... 59
5.3. Some words on the challenges of the problem ... 61
5.4. Technological framework ... 62
5.5. The testing method proposed ... 63
6. Experiments and results ... 65
6.1. Round results ... 65
6.1.1. 1st round ... 65
6.1.2. 2nd round ... 68
6.1.3. 3rd round ... 69
6.2. Result analysis ... 71
7. Conclusions and further work ... 77
Appendix A ... 81
a) Full result tables ... 81
i) 1st round ... 81
ii) 2nd round ... 85
iii) 3rd round ... 87
b) Best result data breakdown ... 96
Bibliography ... 103

Datu-base honetan eskainitako informazioari buruz jakiteko, kontsultatu lege oharra