Estimation Of Pitch Of A Signal


1)      To Load, display and manipulate the sample speech signal
2)      To estimate pitch of speech signal by auto correlation method
3)      To estimate pitch of speech signal by cepstrum method
4)      To estimate pitch of speech signal by above methods for a female signal

Speech signal can be classified into voiced, unvoiced and silence regions. The near periodic vibration of vocal folds is excitation for the production of voiced speech. The random excitation is present for unvoiced speech. There is no excitation during silence region. Majority of speech regions are voiced in nature that include vowels, semivowels and other voiced components. The voiced regions looks like a near periodic signal in the time domain representation. In a short term, we may treat the voiced speech segments to be periodic for all practical analysis and processing. The periodicity associated with such segmentsis defined is 'pitch period To' in the time domain and 'Pitch frequency or Fundamental Frequency Fo' in the frequency domain. Unless specified, the term 'pitch' refers to the fundamental frequency ' Fo'. Pitch is an important attribute of voiced speech. It contains speaker-specific information. It is also needed for speech coding task. Thus estimation of pitch is one of the important issue in speech processing. There are a large set of methods that have been developed in the speech processing area for the estimation of pitch. Among them the three mostly used methods include, autocorrelation of speech, cepstrum pitch determination and single inverse filtering technique (SIFT) pitch estimation. One success of these methods is due to the involvement of simple steps for the estimation of pitch. Even though autocorrelation method is of theoretical interest, it produce a frame work for SIFT methods.

2.1 Autocorrelation:
The term autocorrelation can be stated as the similarity between observations as a function of the time lag between them. Autocorrelation is often used in signal processing for analyzing functions or series of values, such as time domain signals. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by harmonic frequencies. Initially we should have the basic understanding of identifying the voiced/unvoiced/silence regions of speech from their time domain and frequency domain representations. For this we need to plot the speech signal in time and frequency domains. The time domain representation is termed as waveform and frequency domain representation is termed as spectrum. we consider speech signals in short ranges for plotting their waveforms and spectra. The typical lengths include 10-30 msec. The time domain and frequency domain characteristics are distinct for the three cases. Voiced segment represents periodicity in time domain and harmonic structure in frequency domain. Unvoiced segment is random noise-like in time domain and spectrum without harmonic structure in frequency domain. Silence region does not have energy in either time or frequency domain.

Analysis of voiced speech

We should be able to identify whether given segment of speech, typically, 20 - 50 msec, is voiced speech or not. The voiced speech segment is characterized by the periodic nature, relatively high energy, less number of zero crossings and more correlation among successive samples. The voiced speech can be identified by observation of the waveform in the time domain due to its periodicity nature. In the frequency domain, the presence of harmonic structure is the evidence that the segment is voiced. Further, the spectrum will have more energy, typically, in the low frequency region. The spectrum will also have a downward trend starting from zero frequency and moving upwards. The autocorrelation of a segment of voiced speech will have a strong peak at the pitch period. The high energy can be observed in terms of high amplitude values for voiced segment. However, energy alone cannot decide the voicing information. Periodicity is crucial along with energy to identify the voiced segment unambiguously. Similarly the relatively low zero-crossings can also be indirectly observed as smooth variations among sequence of sample values. Figure 2 below shows the code to generate the waveform, spectrum and autocorrelation sequence for a given segment of voiced speech. 
Bhanu Namikaze

Bhanu Namikaze is an Ethical Hacker, Security Analyst, Blogger, Web Developer and a Mechanical Engineer. He Enjoys writing articles, Blogging, Debugging Errors and Capture the Flags. Enjoy Learning; There is Nothing Like Absolute Defeat - Try and try until you Succeed.

No comments:

Post a Comment