音樂訊號簡介及Matlab音訊基本操作 - 音樂訊號分析與檢索 第二週

Posted by JSON on March 10, 2016

聲音訊號

聲音訊號(Audio Signals):泛指人耳聽到的訊號。可聽到的基本頻率大約是20~20000Hz。

聲音的產生過程,從某源頭例如聲帶發出震動(vibration), 對附近的物件共振產生共鳴(resonance)例如嘴巴、鼻子, 再經由空氣或其他媒介傳送到耳膜,最後由腦辨識。

音訊的具有的基本特色:

  1. Volume(音量): 音訊的振幅、強度。
  2. Pitch(音高): 一秒內的基本週期個數(the number of fundamental periods in a second)
  3. Timber(音色): waveform inside a fundamental period.

聲音的分類

以聲源的數量分類:

  • Monophonic: 單一聲源(like vocal only)
  • Polyphonic: 混合兩種以上的聲源(像是含有伴奏的歌曲)

以聲源類別分類:

  • Sounds from animals
  • Sounds from non-animals

錄音的參數

進行錄音時,有三個主要參數:

  1. Sample rate:表示每秒要取樣幾個點
    • 8KHz: 通話品質
    • 16KHz: 一般講話辨識
    • 44.1KHz: CD品質
  2. Bit resolution:以多少空間表達單一sample
  3. Number of channels
    • Mono: 單聲道
    • Stereo: 雙聲道

S/U/V in Speech

人聲訊號可以被分成三種:

  1. S(silence)
  2. U(unvoiced):例如發S音的時候,聲帶是沒有震動的
  3. V(voiced):聲帶有震動的說話

SUV of speec

音訊檔儲存大小計算

例如:一分鐘的錄音檔(sample rate: 16000、16bits、單聲道)的檔案大小 60(s) * 16(KHz) * 2(16bits=2bytes) * 1 = 1920KB = 1.92MB

Matlab讀取音訊、播放、繪出音訊

[y, fs] = audioread('nyan-cat.mp3'); % y會拿到音訊檔的所有點,fs為此音訊檔的sample rate
sound(y,fs); % 以原sample rate進行播放
sound(y,fs*2); % 以原sample rate的兩倍進行播放
sound(y,fs*0.5); % 以原sample rate的兩倍進行播放

time = 1:length(y)/fs;
plot(time, y);

[y, fs] = audioread('nyan-cat.mp3', [1 1000]); % 只讀取前一千個點
plot(y);

音訊檔的Metadata

除了音訊之外,常會有其他的資訊也存在音訊檔內,例如歌手的名字、壓縮方式、音訊長度…等。 這些資料稱為metadata,可以使用audioinfo(檔名)來讀取這些資訊:

audio metadata

Matlab讀取音訊的Metadata:

info = audioinfo('nyan-cat.mp3');
disp(info);

音訊檔內部儲存方式

8bits音訊檔內部儲存方式為0~255(無號),16bits為32767~-32768(有號), Matlab讀取音訊後會調整成-1~1,因此對於8bits音訊進行(y-128)/128,16bits將會進行y/32768。

立體聲Stereo Audio

表示音訊檔含有雙聲道,以audioread讀取音訊所得到的y會兩欄:

[y, fs] = audioread('nyan-cat.mp3'); 
left = y(:,1); % 左聲道
right = y(:,2); % 右聲道  

若嘗試將左聲道的音訊歸零,再進行播放,會發現只有右邊的Speaker會發出聲音:

[y, fs] = audioread('nyan-cat.mp3');
y(:,1) = 0;
sound(y, fs);  

一旦能讀取音訊,我們就能對音訊進行處理,例如:調整音量、調整pitch、消除雜訊⋯⋯等。 然後再利用播放的方式來驗證調整結果。

Matlab內建一個名為handel.mat的檔案,載入後會有yFs兩個變數,可以利用它們來做一些簡單的音訊實驗。

>> load handel.mat
>> whos
  Name          Size             Bytes  Class     Attributes

  Fs            1x1                  8  double              
  y         73113x1             584904  double
>> length(y)/Fs % duration

ans =

    8.9249
    
>> sound(y, Fs);

同步(Synchronous)與非同步(Asynchronous)播放

sound是非同步指令(Asynchronous),若一次執行多個sound,音訊會疊播在一起。 若要進行同步播放,須先利用audioplayer指令建立player object,在使用playblocking指令進行播放:

load handel.mat
p = audioplayer(y, Fs);
playblocking(p); % 必須播放完後才會執行下一行指令
playblocking(p);

更改音訊強度

load handel.mat
p1 = audioplayer(y, Fs);
p2 = audioplayer(y*3, Fs);
p3 = audioplayer(y*5, Fs);
playblocking(p1);
playblocking(p2);
playblocking(p3);

音訊音量不與音訊振幅成線性正比,而是對數正比,所以*3能感覺到聲音變大了但並非三倍音量。

取樣頻率Sample Rate

load handel.mat
p = audioplayer(y, Fs);
p.SampleRate = 2 * Fs; 
playblocking(p);
  • 增加sample rate會聽到唐老鴨的聲音,聲音變短,pitch變高。
  • 減少sample rate會聽到低沈的聲音,聲音變長,pitch變低。
  • 若想要增高pitch而不改變聲音長度 -> Pitch moditication

若將音訊的正負號改變,聽到的聲音不會有所變化

load handel.mat
p = audioplayer(-y, Fs);
playblocking(p);

使用Matlab錄音

下面的程式碼使用audiorecorder指令產生錄音物件,再用recordblocking指令開始錄音, 錄音完成後取得音訊資料後繪出訊號:

duration = 3;
recObj = audiorecorder;
recordblocking(recObj, duration);
play(recObj);
y = getaudiodata(recObj, 'double'); % 取得音訊資料
time = (1:length(y))/8000;
plot(time, y);

audiorecorder預設

  • sample rate:8000Hz
  • bit resolution:8bits

若要自訂參數格式為:audiorecorder(sampleRate, nBits, nChannel);

錄音後使用getaudiodata取得音訊資料後可使用audiowrite將音訊儲存成檔案:

y = getaudiodata(recObj, 'double');
audiowrite('myRec.wav', y, recObj.SampleRate);

Homework

  1. Record your own voice of “my name is xxx and I am a student at xxx university”, and save the mono recording to myVoice.wav. Write a MATLAB script that can read the audio data from myVoice.wav, duplicate the audio data to create a stereo audio, and then modify the volume of each channels such that the playback can create an illusion that the sound source is moving between your two speakers. (Hint: You can observe the waveforms of the two channels in flanger.wav.)
  2. Write a function to generate a sine wave with time-varying frequencies. The I/O format is

    outputSignal=mySine(duration, freq);
    

    where freq is a two-element vector [f1, f2], indicating the frequency of the sine wave should change linearly from f1 to f2.

    Note that The sample rate is 16 KHz. The first sample is zero, starting from time 0. (In other words, the time vector is (0:duration*fs-1)/fs, and the function to invoke is “sin”.)

    Hint: The instantaneous frequency of y=sin(2πϕ(t)) is equal to ϕ′(t).

  3. Write a function that can take a wave file, encrypt it, and save it as another wave file. The I/O format is

    myEncrypt(inputFileName, outputFileName);
    

    where “inputFileName” is a string specifying the input wave file, and “outputFileName” is a string specifying the output wave file. The encryption process is like this (assuming y is the original signal and z is the encrypted signal):

    1. z=y;
    2. if y(i)>0, z(i)=1-y(i) for all i
    3. if y(i)<0, z(i)=-1-y(i) for all i
    4. z=flipud(z);

    Note that:

    1. The encrypted file can be converted to the original file using the same function.
    2. Be aware that this is a naive encryption; better methods exist.