About
What it is
This dataset contains 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker) with a length of 30000 data samples. The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. In each sample, the sound starts with a consonant and then there is a vowel sound and at last, there is silence. length of silence is dependent on the length of the combination of consonants and vowels. For example, if the combination ends in the 20000th data sample, the rest of the 10000 samples (until 30000, the length of each sound sample) are silent.
All the sound samples are denoised with the “Adaptive noise reduction” algorithm.
How to use
Matlab
All files are “.mat” files. “.mat” is a format for data files in MATLAB. Every file consists of a matrix with dimensions 1236*30000 in which 23 is referring to the number of consonants, 6 is referring to the number of vowels and 30000 is the length of the sound sample. order of phonemes is just like shown in Here. To use it, just open the file and tap on the “Finish” button to import the data in the workspace of MATLAB.
Python
To use “.mat” data files in Python you can use the code below to copy the matrix in the file in “aud” variable (Put your current path instead of “MyPath”). Every file consists of a matrix with dimensions 1236*30000 in which 23 is referring to the number of consonants, 6 is referring to the number of vowels and 30000 is the length of the sound sample. order of phonemes is just like shown in Here.
import scipy.io
import glob
import numpy as np
fns = glob.glob('../input/pcvcspeech/*.mat')
al = []
for i in range(len(fns)):
mat = scipy.io.loadmat(fns[i])
aud=(mat['x'])
al.append(aud)
al = np.array(al)
al.shape