From: drowe67 Date: Sun, 22 Aug 2010 00:56:52 +0000 (+0000) Subject: part way through coding up codec2 functions, tquantoc unittets written X-Git-Url: http://git.whiteaudio.com/gitweb/?a=commitdiff_plain;h=89c69f9c5731e999b2eae6f2554d7e48bfa3663f;p=freetel-svn-tracking.git part way through coding up codec2 functions, tquantoc unittets written git-svn-id: https://svn.code.sf.net/p/freetel/code@181 01035d8c-6547-0410-b346-abe4f91aad63 --- diff --git a/codec2/README.txt b/codec2/README.txt index 4b17b17c..186a6561 100644 --- a/codec2/README.txt +++ b/codec2/README.txt @@ -1,482 +1,41 @@ -Codec2 - Open Source Low Bit Rate Speech Codec -=============================================== - -David Rowe, VK5DGR - -[[introduction]] -Introduction ------------- - -Codec2 is an open source low bit rate speech codec designed for -communications quality speech at around 2400 bit/s. Applications -include low bandwidth HF/VHF digital radio. It fills a gap in open -source, free-as-in-speech voice codecs beneath 5000 bit/s. - -The motivations behind the project are summarised in this -link:/blog/?p=128[blog post]. - -You can help and support Codec2 development via a <>. - -[[status]] -Status ------- - -Still in experimental/development stage - no 2400 bit/s codec -available yet, but we are getting very close! - -Progress to date: - -1. Unquantised encoder (sinenc) and decoder (sinedec) running under - Linux/gcc. The decoder (sinedec) is a test-bed for various - modelling and quantisation options - these are controlled via - command line switches. - -2. LPC modelling is working nicely and there is a first pass LSP - vector quantiser working at 32 bits/frame with acceptable voice - quality. Lots could be done to improve this (e.g. improved quality - and reduced bit rate). - -3. A phase model has been developed that uses 0 bits for phase and 1 - bit/frame for voiced/unvoiced decision but delivers high quality - speech. This works suspiciously well - codecs with a single bit - voiced/unvoiced decision aren't meant to sound this good. Usually - mixed voicing at several bits/frame is required. - -4. An experimental post filter has been developed to improve - performance with speech mixed with background noise. The post - filter delivers many of the advantages of mixed voicing but unlike - mixed voicing requires zero bits. - -5. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple - pitch tracker has been developed to help with some problem frames. - -6. An algorithm for decimating sinusoidal model parameters from the - native 10ms rate to a 20ms rate has been developed. High quality - speech is maintained, some small differences are only audible - through headphones. A 20ms frame rate is required for 2400 bit/s - coding. - -Current work areas: - -1. Reduce CPU Load of the first order phase model fit. Apart from - that the codec runs very quickly. - -2. Write separate encoder and decoder functions and demo programs for - the alpha 2400 bit/s codec. Currently most of the processing - happens inside the sinedec program. - -[[source]] -The Source Code ---------------- - -Browse: - -http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/[http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/] - -Check Out: - - $ svn co https://freetel.svn.sourceforge.net/svnroot/freetel/codec2 codec2 - -[[support]] -The Mailing List ----------------- - -For any questions, comments, support, suggestions for applications by -end users and developers please post to the https://lists.sourceforge.net/lists/listinfo/freetel-codec2[Codec2 Mailing List] - -[[quickstart]] -Quick Start ------------ - -To encode the file raw/hts1a.raw to a set of sinusoidal model -parameters (hts1.mdl) then decode to a raw file src/hts1a_uq: - - $ cd codec2/src - $ make - $ ./sinenc ../raw/hts1a.raw hts1.mdl 300 ../pitch/hts1a.p - $ ./sinedec ../raw/hts1a.raw hts1.mdl -o hts1a_uq.raw - $ play -f s -r 8000 -s w ../raw/hts1a.raw - $ play -f s -r 8000 -s w hts1a_uq.raw - -[[plan]] -Development Roadmap -------------------- - - [X] Milestone 0 - Project kick off - [X] Milestone 1 - Alpha 2400 bits/s codec - [X] Spectral amplitudes modelled using LPC - [X] Phase and voicing model developed - [X] Pitch estimator - [X] Spectral amplitudes quantised using LSPs - [X] Decimation of model parameters from 20ms to 10ms - [ ] Refactor to develop a seperate encoder/decoder functions - [ ] Complete 2400 bits/s codec demonstrated - [ ] Reduced complexity voicing estimator - [ ] Test phone call over LAN - [ ] Release 0.1 for Alpha Testing - [ ] Milestone 2 - Beta codec for digital radio - [ ] Gather samples from the community with different speakers, - input filtering, and background noise conditions that break - codec. - [ ] Further develop algorithm based on samples above - [ ] Design FEC scheme - [ ] Test over digital radio links - [ ] Milestone 3 - Fixed point port - [ ] Milestone 4 - codec2-on-a-chip embedded DSP/CPU port - -[[howitworks]] -How it Works ------------- - -Speech is modelled as a sum of sinusoids: - - for(m=1; m<=L; m++) - s[n] = A[m]*cos(Wo*m + phi[m]); - -The sinusoids are multiples of the fundamental frequency Wo -(omega-naught), so the technique is known as "harmonic sinusoidal -coding". For each frame, we analyse the speech signal and extract a set of -parameters: - - Wo, {A}, {phi} - -Where Wo is the fundamental frequency (also know as the pitch), { A -} is a set of L amplitudes and { phi } is a set of L phases. L is -chosen to be equal to the number of harmonics that can fit in a 4 KHz -bandwidth: - - L = floor(pi/Wo) - -Wo is specified in radians normalised to 4 kHz, such that pi radians -= 4 kHz. the fundamental frequency in Hz is: - - F0 = (8000/(2*pi))*Wo - -We then need to encode (quantise) Wo, { A }, { phi } and transmit them to -a decoder which reconstructs the speech. A frame might be 10-20ms in -length so we update the parameters every 10-20ms (100 to 50 Hz -update rate). - -The speech quality of the basic harmonic sinusoidal model is pretty -good, close to transparent. It is also relatively robust to Wo -estimation errors. Unvoiced speech (e.g. consonants) are well -modelled by a bunch of harmonics with random phases. Speech corrupted -with background noise also sounds OK, the background noise doesn't -introduce any grossly unpleasant artifacts. - -As the parameters are quantised to a low bit rate and sent over the -channel, the speech quality drops. The challenge is to achieve a -reasonable trade off between speech quality and bit rate. - -Bit Allocation +Codec 2 README -------------- -[grid="all"] -`-------------------------------.---------- -Parameter bits/frame -------------------------------------------- -Spectral magnitudes (LSPs) 32 -Low frequency LPC correction 1 -Energy 5 -Voicing 1 -Fundamental Frequency (Wo) 7 -Spare 2 -------------------------------------------- -Total 48 -------------------------------------------- +Codec 2 is an open source 2400 bit/s speech codec. For more +information please see: -At a 20ms update rate 48 bits/frame is 2400 bits/s. + http://rowetel.com.codec2.html -[[challenges]] -Challenges +Quickstart ---------- -The tough bits of this project are: - -1. Parameter estimation, in particular pitch (Wo) detection. - -2. Reduction of a time-varying number of parameters (L changes with Wo - each frame) to a fixed number of parameters required for a fixed - bit rate. The trick here is that { A } tend to vary slowly with - frequency, so we can "fit" a curve to the set of { A } and send - parameters that describe that curve. - -3. Discarding the phases { phi }. In most low bit rate codecs phases - are discarded, and synthesised at the decoder using a rule-based - approach. This also implies the need for a "voicing" model as - voiced speech (vowels) tends to have a different phase structure to - unvoiced (constants). The voicing model needs to be accurate (not - introduce distortion), and relatively low bit rate. - -4. Quantisation of the amplitudes { A } to a small number of bits - while maintaining speech quality. For example 30 bits/frame at a - 20ms frame rate is 30/0.02 = 1500 bits/s, a large part of our 2400 - bit/s "budget". - -5. Performance with different speakers and background noise - conditions. This is where you come in - as codec2 develops please - send me samples of it's performance with various speakers and - background noise conditions and together we will improve the - algorithm. This approach proved very powerful when developing - link:oslec.html[Oslec]. One of the cool things about open source! - -[[help]] -Can I help? ------------ - -Maybe; check out the the <> above and the latest -version of the -http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/TODO.txt?view=log[TODO] -list and and see if there is anything that interests you. - -Not all of this project is DSP. There are many general C coding tasks -like refactoring and writing a command line soft phone application for -testing the codec over a LAN. - -I will happily accept *sponsorship* for this project. For example -research grants, or development contracts from companies interested in -seeing an open source low bit rate speech codec. +$ cd codec2/src +$ make +$ ./c2enc ../raw/hts1a.raw hts1a_c2.bit +$ ./c2dec hts1a_c2.bit hts1a_c2.raw +$ ../scripts/menu.sh ../raw/hts1a.raw hts1a_c2.raw -You can also donate to the codec2 project via PayPal (which also -allows credit card donations): +Programs +-------- -+++++++++++++++++++++++++++ -
- - - - -Donation in US$: - -
-+++++++++++++++++++++++++++ +1/ c2enc encodes a file of speech sample to a file of encoded bits. +One bit is stored in the LSB or each byte. -[[patents]] -Is it Patent Free? ------------------- +2/ c2dec decodes a file of bits to a file of speech samples. -I think so - much of the work is based on old papers from the 60, 70s -and 80's and the PhD thesis work [2] used as a baseline for this codec -was original. A nice little mini project would be to audit the -patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and -compare. +3/ c2sim is a simulation/development version of codec 2. It allows +selective use of the various codec 2 algorithms. For example +switching phase modelling or LSP quantisation on and off. -Proprietary codecs typically have small, novel parts of the algorithm -protected by patents. However proprietary codecs also rely heavily on -large bodies of public domain work. The patents cover perhaps 5% of -the codec algorithms. Proprietary codec designers did not invent most -of the algorithms they use in their codec. Typically, the patents just -cover enough to make designing an interoperable codec very difficult. -These also tend to be the parts that make their codecs sound good. - -However there are many ways to make a codec sound good, so we simply -need to choose and develop other methods. - -[[compat]] -Is Codec2 compatible with xMBE or MELP? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Nope - I don't think it's possible to build a compatible codec without -infringing on patents or access to commercial in confidence -information. - -[[hacking]] -Hacking -------- - -If you would like to work on the Codec2 code base here are some -notes: - -* src/code.sh will perform the several processing steps - required to output speech files at various processing steps, for - example: - - $ ./code.sh hts1a -+ -will produce hts1a_uq (unquantised, i.e. baseline sinusoidal model), -hts1a_phase0 (zero phase model), hts1a_lpc10 (10th order LPC model). - -* You can then listen to all of these samples (and the original) - using: - - $ ./listen.sh hts1a - -* Specific notes about LPC and Phase modelling are below. - -* There are some useful scripts in the scripts directory, for example - wav2raw.sh, raw2wav.sh, playraw.sh, menu.sh. Note that code.sh and - listen.sh are in the src directory as thats where they get used most - of the time. - -[[lpc]] -LPC Modelling Notes -------------------- - -Linear Prediction Coefficient (LPC) modelling is used to model the -sine wave amplitudes { A }. The use of LPC in speech coding is -common, although the application of LPC modelling to frequency domain -coding is fairly novel. They are mainly used for time domain codecs -like LPC-10 and CELP. - -LPC modelling has a couple of advantages: - -* From time domain coding we know a lot about LPC, for example how to - quantise them efficiently using Line Spectrum Pairs (LSPs). - -* The number of amplitudes varies each frame as Wo and hence L vary. - This makes the { A } tricky to quantise and transmit. However it is - possible to convey the same information using a fixed number of - LPCs which makes the quantisation problem easier. - -To test LPC modelling: - - $ ./sinedec ../raw/hts1a.raw hts1a.mdl --lpc 10 - hts1a_lpc10.raw - -The blog post [4] discusses why LPC modelling works so well when Am -recovered via RMS method (Section 5.1 of thesis). Equal area model of -LPC spectra versus harmonic seems to work remarkably well, especially -compared to sampling. SNRs up to 30dB on female frames. - -There is a problem with modelling the low order (e.g. m=1, -i.e. fundamental) harmonics for males. The amplitude of the m=1 -harmonic is raised by as much as 30dB after LPC modelling as (I think) -LPC spectra must have zero derivative at DC. This means it's poor at -modelling very low freq harmonics which unfortunately the ear is very -sensitive to. To correct this an extra bit has been added to correct -LPC modelling errors on the first harmonic. When set this bit -instructs the decoder to attenuate the LPC modelled harmonic by 30dB. - -[[phase]] -Phase Modelling Notes ---------------------- - -"Zero order" and "First order" phase models have been developed to -synthesise phases of each harmonic at the decoder side. These are -described in source code of -http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/src/phase.c?view=log[phase.c]. - -The zero phase model required just one voicing bit to be transmitted -to the decoder, all other phase information is synthesised use a rule -based model. It seems to work OK for most speech samples, but adds a -"clicky" artcifact to some low picthed speakers. Also see the blog -posts below for more discussion of phase models. - -To determine voicing we attempt to fit a first order phase model, then -measure SNR of the fit. The frame is declared unvoiced if the SNR is -beneath a threshold. - -Current phase modelling makes no attempt to match harmonic phases at -frame edges. This area would be worth experimenting with, as it could -cause roughness. Then again it might be responsible for effective -mixed voicing modelling. - -Unvoiced speech can be represented well by random phases and a Wo -estimate that jumps around randomly. If Wo is small the number of -harmonics is large which makes the noise less periodic and more noise -like to the ear. With Wo jumping around phase tracks are -discontinuous between frames which also makes the synthesised signal -more noise like and prevents time domain pulses forming that the ear -is sensitive to. - -Running Phase Models -~~~~~~~~~~~~~~~~~~~~ - -In src/phase.c two phase models have been implemented, the algorithms -are explained in the source code. - -Zero Phase model (just one voicing bit/frame): - - $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase 0 - hts1a_phase0.raw - -First order Phase model (when quantised around 13 bits/frame bits for -pulse position, phase, and a voicing bit): - - $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase 1 - hts1a_phase1.raw - -[[octave]] -Octave Scripts --------------- - -* pl.m - plot a segment from a raw file - -* pl2.m - plot the same segments from two different files to compare - -* plamp.m - menu based GUI interface to "dump" files, move back and forward - through file examining time and frequency domain parameters, lpc - model etc - - $ ./sinedec ../raw/hts1a.raw hts1a.mdl --lpc 10 --dump hts1a - $ cd ../octave - $ octave - octave:1> plamp("../src/hts1a",25) - -* plphase.m - similar to plamp.m but for analysing phase models - - $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase [0|1] --dump hts1a_phase - $ cd ../octave - $ octave - octave:1> plphase("../src/hts1a_phase",25) - -* plpitch.m - plot two pitch contours (.p files) and compare - -* plnlp.m - plots a bunch of NLP pitch estimator states. link:/images/codec2/tnlp_screenshot.png[Screenshot] - -[[directories]] Directories ----------- - script - shell scripts to simply common operations - speex - LSP quantisation code borrowed from Speex for testing + script - shell scripts for playing and converting raw files src - C source code - octave - Matlab/Octave scripts + octave - Octave scripts used for visualising internal signals + during development pitch - pitch estimator output files raw - speech files in raw format (16 bits signed linear 8 KHz) - unittest - Unit test source code + unittest - unit test source code wav - speech files in wave file format -[[other]] -Other Uses ----------- - -The DSP algorithms contained in codec2 may be useful for other DSP -applications, for example: - -* The - http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/src/nlp.c?view=log[nlp.c] - pitch estimator is a working, tested, pitch estimator for human - speech. NLP is an open source pitch estimator presented in C code - complete with a GUI debugging tool (plnlp.m - link:/images/codec2/tnlp_screenshot.png[screenshot]). It can be run - stand-alone using the - http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/unittest/tnlp.c?view=log[tnlp.c] - unit test program. It could be applied to other speech coding - research. Pitch estimation is a popular subject in academia, - however most pitch estimators are described in papers, with the fine - implementation details left out. - -* The basic analysis/synthesis framework could be used for high - quality speech synthesis. - -[[refs]] -References ----------- - -[1] http://perens.com/[Bruce Perens] introducing the - http://codec2.org/[codec2 project concept] - -[2] David's PhD Thesis, - http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf["Techniques for - Harmonic Sinusoidal Coding"], used for baseline algorithm - -[3] http://www.rowetel.com/blog/?p=128[Open Source Low rate Speech - Codec Part 1 - Introduction] - -[4] http://www.rowetel.com/blog/?p=130[Open Source Low rate Speech - Codec Part 2 - Spectral Magnitudes] - -[5] http://www.rowetel.com/blog/?p=131[Open Source Low rate Speech - Codec Part 3 - Phase and Male Speech] - -[6] http://www.rowetel.com/blog/?p=132[Open Source Low rate Speech - Codec Part 4 - Zero Phase Model] - diff --git a/codec2/src/c2sim.c b/codec2/src/c2sim.c index 31addfea..26ac63ec 100644 --- a/codec2/src/c2sim.c +++ b/codec2/src/c2sim.c @@ -103,13 +103,12 @@ int main(int argc, char *argv[]) int phase0; float ex_phase[MAX_AMP+1]; - int voiced, voiced_1, voiced_2; int postfilt; float bg_est; - int hand_snr; - FILE *fsnr; + int hand_voicing; + FILE *fvoicing; MODEL model_1, model_2, model_3, model_synth, model_a, model_b; int transition, decimate; @@ -121,7 +120,6 @@ int main(int argc, char *argv[]) prev_Wo = TWO_PI/P_MAX; - voiced_1 = voiced_2 = 0; model_1.Wo = TWO_PI/P_MIN; model_1.L = floor(PI/model_1.Wo); for(i=1; i<=model_1.L; i++) { @@ -141,7 +139,7 @@ int main(int argc, char *argv[]) printf(" [--lsp]\n"); printf(" [--phase0]\n"); printf(" [--postfilter]\n"); - printf(" [--hand_snr]\n"); + printf(" [--hand_voicing]\n"); printf(" [--dec]\n"); printf(" [--dump DumpFilePrefix]\n"); exit(0); @@ -190,10 +188,10 @@ int main(int argc, char *argv[]) ex_phase[0] = 0; } - hand_snr = switch_present("--hand_snr",argc,argv); - if (hand_snr) { - fsnr = fopen(argv[hand_snr+1],"rt"); - assert(fsnr != NULL); + hand_voicing = switch_present("--hand_voicing",argc,argv); + if (hand_voicing) { + fvoicing = fopen(argv[hand_voicing+1],"rt"); + assert(fvoicing != NULL); } bg_est = 0.0; @@ -240,8 +238,8 @@ int main(int argc, char *argv[]) if (phase0) { float Wn[M]; /* windowed speech samples */ - float ak_phase[PHASE_LPC+1]; /* autocorrelation coeffs */ - float Rk[PHASE_LPC+1]; /* autocorrelation coeffs */ + float ak_phase[LPC_ORD+1]; /* autocorrelation coeffs */ + float Rk[LPC_ORD+1]; /* autocorrelation coeffs */ COMP Sw_[FFT_ENC]; dump_phase(&model.phi[0], model.L); @@ -266,7 +264,7 @@ int main(int argc, char *argv[]) /* determine voicing */ - snr = est_voicing_mbe(&model, Sw, W, (FS/TWO_PI)*model.Wo, Sw_, &voiced); + snr = est_voicing_mbe(&model, Sw, W, (FS/TWO_PI)*model.Wo, Sw_); dump_Sw_(Sw_); dump_snr(snr); @@ -274,14 +272,13 @@ int main(int argc, char *argv[]) for(i=0; i 2.0; + if (hand_voicing) { + fscanf(fvoicing,"%d\n",&model.voiced); } - phase_synth_zero_order(&model, ak_phase, voiced, ex_phase); + phase_synth_zero_order(&model, ak_phase, ex_phase); if (postfilt) - postfilter(&model, voiced, &bg_est); + postfilter(&model, &bg_est); } /* optional LPC model amplitudes */ @@ -363,8 +360,8 @@ int main(int argc, char *argv[]) if (dump) dump_off(); - if (hand_snr) - fclose(fsnr); + if (hand_voicing) + fclose(fvoicing); return 0; } diff --git a/codec2/src/codec2.c b/codec2/src/codec2.c index c30737dd..e288c90d 100644 --- a/codec2/src/codec2.c +++ b/codec2/src/codec2.c @@ -161,16 +161,18 @@ void codec2_encode(void *codec2_state, char bits[], short speech[]) /* Estimate pitch */ nlp(c2->Sn,N,M,P_MIN,P_MAX,&pitch,Sw,&c2->prev_Wo); - prev_Wo = TWO_PI/pitch; + c2->prev_Wo = TWO_PI/pitch; model.Wo = TWO_PI/pitch; + model.L = PI/model.Wo; /* estimate model parameters */ dft_speech(Sw, c2->Sn, c2->w); two_stage_pitch_refinement(&model, Sw); estimate_amplitudes(&model, Sw, c2->W); - est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_, &voiced1); - + est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_); + voiced1 = model.voiced; + /* Second Frame - send all parameters --------------------------------*/ /* Read input speech */ @@ -184,20 +186,22 @@ void codec2_encode(void *codec2_state, char bits[], short speech[]) /* Estimate pitch */ nlp(c2->Sn,N,M,P_MIN,P_MAX,&pitch,Sw,&c2->prev_Wo); - prev_Wo = TWO_PI/pitch; + c2->prev_Wo = TWO_PI/pitch; model.Wo = TWO_PI/pitch; + model.L = PI/model.Wo; /* estimate model parameters */ dft_speech(Sw, c2->Sn, c2->w); two_stage_pitch_refinement(&model, Sw); estimate_amplitudes(&model, Sw, c2->W); - est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_, &voiced2); + est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_); + voiced2 = model.voiced; /* quantise */ nbits = 0; - encode_pitch(bits, &nbits, model.Wo); + encode_Wo(bits, &nbits, model.Wo); encode_voicing(bits, &nbits, voiced1, voiced2); encode_amplitudes(bits, &nbits, c2->Sn, c2->w); assert(nbits == CODEC2_BITS_PER_FRAME); @@ -213,48 +217,58 @@ void codec2_encode(void *codec2_state, char bits[], short speech[]) \*---------------------------------------------------------------------------*/ -void codec2_decode(float Sn_[], char bits[]) +void codec2_decode(void *codec2_state, short speech[], char bits[]) { CODEC2 *c2; MODEL model; float ak[LPC_ORD+1]; int voiced1, voiced2; - int i, nbits, transition; - MODEL model_fwd, model_back, model_interp; + int i, nbits; + MODEL model_interp; assert(codec2_state != NULL); c2 = (CODEC2*)codec2_state; nbits = 0; - model.Wo = decode_pitch(bits, &nbits); + model.Wo = decode_Wo(bits, &nbits); + model.L = PI/model.Wo; decode_voicing(&voiced1, &voiced2, bits, &nbits); decode_amplitudes(&model, ak, bits, &nbits); assert(nbits == CODEC2_BITS_PER_FRAME); /* First synthesis frame - interpolated from adjacent frames */ - interp(c2->prev_model, &model, &model_interp, &model_fwd, &model_back, &transition); - phase_synth_zero_order(&model, ak, voiced1, &c2->ex_phase); - - postfilter(&model, voiced, &c2->bg_est); - if (transition) { - synthesise(Sn_,&model_a,Pn,1); - synthesise(Sn_,&model_b,Pn,0); - } - else { - synthesise(Sn_,&model,Pn,1); - } - - /* Save output speech to disk */ - - for(i=0; i 32767.0) - buf[i] = 32767; - else if (Sn_[i] < -32767.0) - buf[i] = -32767; - else - buf[i] = Sn_[i]; - } - + model_interp.voiced = voiced1; + interp(&model_interp, &c2->prev_model, &model); + phase_synth_zero_order(&model_interp, ak, voiced1, &c2->ex_phase); + postfilter(&model_interp, voiced1, &c2->bg_est); + synthesise(c2->Sn_, &model_interp, c2->Pn, 1); + + for(i=0; i 32767.0) + speech[i] = 32767; + else if (Sn_[i] < -32767.0) + speech[i] = -32767; + else + speech[i] = Sn_[i]; + } + + /* Second synthesis frame */ + + model.voiced = voiced2; + phase_synth_zero_order(&model, ak, voiced2, &c2->ex_phase); + postfilter(&model, voiced2, &c2->bg_est); + synthesise(c2->Sn_, &model, c2->Pn, 1); + + for(i=0; i 32767.0) + speech[i+N] = 32767; + else if (Sn_[i] < -32767.0) + speech[i+N] = -32767; + else + speech[i+N] = Sn_[i]; + } + + memcpy(&c2->prev_model, model, sizeof(MODEL); } diff --git a/codec2/src/codec2.h b/codec2/src/codec2.h index bca92f52..8c3d4043 100644 --- a/codec2/src/codec2.h +++ b/codec2/src/codec2.h @@ -35,7 +35,7 @@ void *codec2_create(); void codec2_destroy(void *codec2_state); -void codec2_encode(void *codec2_state, char bits[], short speech[]); -void codec2_decode(float Sn_[], char bits[]); +void codec2_encode(void *codec2_state, char bits[], short speech_in[]); +void codec2_decode((void *codec2_state, int speech_out[], char bits[]); #endif diff --git a/codec2/src/defines.h b/codec2/src/defines.h index 393bcd0d..def9e307 100644 --- a/codec2/src/defines.h +++ b/codec2/src/defines.h @@ -76,9 +76,9 @@ typedef struct { typedef struct { float Wo; /* fundamental frequency estimate in radians */ int L; /* number of harmonics */ - float v[MAX_AMP]; /* voicing measures (unused at present) */ float A[MAX_AMP]; /* amplitiude of each harmonic */ float phi[MAX_AMP]; /* phase of each harmonic */ + int voiced; /* non-zero if this frame is voiced */ } MODEL; #endif diff --git a/codec2/src/dump.c b/codec2/src/dump.c index 386443c1..aba48f25 100644 --- a/codec2/src/dump.c +++ b/codec2/src/dump.c @@ -172,10 +172,7 @@ void dump_model(MODEL *model) { fprintf(fmodel,"%f\t",model->A[l]); for(l=model->L+1; lL; l++) - fprintf(fmodel,"%f\t",model->v[l]); - for(l=model->L+1; lvoiced); fprintf(fmodel,"\n"); } diff --git a/codec2/src/interp.c b/codec2/src/interp.c index 21112b57..2bdf38c2 100644 --- a/codec2/src/interp.c +++ b/codec2/src/interp.c @@ -26,81 +26,97 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +#include #include #include #include "defines.h" #include "interp.h" +float sample_log_amp(MODEL *model, float w); + /*---------------------------------------------------------------------------*\ - interp() + FUNCTION....: interp() + AUTHOR......: David Rowe + DATE CREATED: 22/8/10 - Given two frames decribed by model parameters 20ms apart, determines the - model parameters of the 10ms frame between them. - + Given two frames decribed by model parameters 20ms apart, determines + the model parameters of the 10ms frame between them. Assumes + voicing is available for middle (interpolated) frame. Outputs are + amplitudes and Wo for the interpolated frame. + + This version can interpolate the amplitudes between two frames of + different Wo and L. + \*---------------------------------------------------------------------------*/ -void interp( +void interpolate( + MODEL *interp, /* interpolated model params */ MODEL *prev, /* previous frames model params */ - MODEL *next, /* next frames model params */ - MODEL *synth, /* interp model params for cont frame */ - MODEL *a, /* prev frame extended into this frame */ - MODEL *b, /* next frame extended into this frame */ - int *transition /* non-zero if this is a transition frame, this - information is used for synthesis */ + MODEL *next /* next frames model params */ ) { - int m; - - if (fabs(next->Wo - prev->Wo) < 0.1*next->Wo) { - - /* If the Wo of adjacent frames is within 10% we synthesise a - continuous track through this frame by linear interpolation - of the amplitudes and Wo. This is typical of a strongly - voiced frame. - */ - - *transition = 0; - - synth->Wo = (next->Wo + prev->Wo)/2.0; - if (next->L > prev->L) - synth->L = prev->L; - else - synth->L = next->L; - for(m=1; m<=synth->L; m++) { - synth->A[m] = (prev->A[m] + next->A[m])/2.0; - } + int l; + float w,log_amp; + + /* Wo depends on voicing of this and adjacent frames */ + + if (interp->voiced) { + if (prev->voiced && next->voiced) + interp->Wo = (prev->Wo + next->Wo)/2.0; + if (!prev->voiced && next->voiced) + interp->Wo = next->Wo; + if (prev->voiced && !next->voiced) + interp->Wo = prev->Wo; + } + else { + interp->Wo = TWO_PI/P_MAX; + } + interp->L = PI/interp->Wo; + + /* Interpolate amplitudes using linear interpolation in log domain */ + + for(l=1; l<=interp->L; l++) { + w = l*interp->Wo; + log_amp = (sample_log_amp(prev, w) + sample_log_amp(next, w))/2.0; + interp->A[l] = pow(10.0, log_amp); + } +} + +/*---------------------------------------------------------------------------*\ + + FUNCTION....: sample_log_amp() + AUTHOR......: David Rowe + DATE CREATED: 22/8/10 + + Samples the amplitude envelope at an arbitrary frequency w. Uses + linear interpolation in the log domain to sample between harmonic + amplitudes. + +\*---------------------------------------------------------------------------*/ + +float sample_log_amp(MODEL *model, float w) +{ + int m; + float f, log_amp; + + assert(w > 0.0); assert (w <= PI); + + m = floor(w/model->Wo); + f = w - m*model->Wo; + assert(f <= 1.0); + + if (m < 1) { + log_amp = f*log10(model->A[1]); + } + else if ((m+1) > model->L) { + log_amp = (1.0-f)*log10(model->A[model->L]); } else { - /* - transition frame, adjacent frames have different Wo and L - so set up two sets of model parameters based on prev and - next. We then synthesise both of them and add them - together in the time domain. - - The transition case is typical of unvoiced speech or - background noise or a voiced/unvoiced transition. - */ - - *transition = 1; - - /* a is prev extended forward into this frame, b is next - extended backward into this frame. Note the adjustments to - phase to time-shift the model forward or backward N - samples. */ - - memcpy(a, prev, sizeof(MODEL)); - memcpy(b, next, sizeof(MODEL)); - for(m=1; m<=a->L; m++) { - a->A[m] /= 2.0; - a->phi[m] += a->Wo*m*N; - } - for(m=1; m<=b->L; m++) { - b->A[m] /= 2.0; - b->phi[m] -= b->Wo*m*N; - } + log_amp = (1.0-f)*log10(model->A[m]) + f*log10(model->A[m+1]); } + return log_amp; } diff --git a/codec2/src/interp.h b/codec2/src/interp.h index 5cbfe909..2349859d 100644 --- a/codec2/src/interp.h +++ b/codec2/src/interp.h @@ -29,7 +29,6 @@ #ifndef __INTERP__ #define __INTERP__ -void interp(MODEL *prev, MODEL *next, MODEL *synth, MODEL *a, MODEL *b, - int *transition); +void interpolate(MODEL *interp, MODEL *prev, MODEL *next); #endif diff --git a/codec2/src/phase.c b/codec2/src/phase.c index 2d9e77cf..bcaa065c 100644 --- a/codec2/src/phase.c +++ b/codec2/src/phase.c @@ -190,7 +190,6 @@ void aks_to_H( void phase_synth_zero_order( MODEL *model, float aks[], - int voiced, float *ex_phase /* excitation phase of fundamental */ ) { @@ -203,7 +202,7 @@ void phase_synth_zero_order( float jitter; G = 1.0; - aks_to_H(model,aks,G,H,PHASE_LPC); + aks_to_H(model,aks,G,H,LPC_ORD); /* Update excitation fundamental phase track, this sets the position @@ -221,7 +220,7 @@ void phase_synth_zero_order( /* generate excitation */ - if (voiced) { + if (model->voiced) { /* This method of adding jitter really helped remove the clicky sound in low pitched makes like hts1a. This moves the onset of each harmonic over at +/- 0.25 of a sample. diff --git a/codec2/src/phase.h b/codec2/src/phase.h index 9ce2d057..dc2a9041 100644 --- a/codec2/src/phase.h +++ b/codec2/src/phase.h @@ -29,6 +29,6 @@ #ifndef __PHASE__ #define __PHASE__ -void phase_synth_zero_order(MODEL *model, float aks[], int voiced, float *ex_phase); +void phase_synth_zero_order(MODEL *model, float aks[], float *ex_phase); #endif diff --git a/codec2/src/postfilter.c b/codec2/src/postfilter.c index a2c36057..389f6e9d 100644 --- a/codec2/src/postfilter.c +++ b/codec2/src/postfilter.c @@ -93,7 +93,6 @@ void postfilter( MODEL *model, - int voiced, float *bg_est ) { @@ -112,7 +111,7 @@ void postfilter( of the threshold is to prevent updating during high level speech. */ - if ((e < BG_THRESH) && !voiced) + if ((e < BG_THRESH) && !model->voiced) *bg_est = *bg_est*(1.0 - BG_BETA) + e*BG_BETA; /* now mess with phases during voiced frames to make any harmonics @@ -120,7 +119,7 @@ void postfilter( */ uv = 0; - if (voiced) + if (model->voiced) for(m=1; m<=model->L; m++) if (20.0*log10(model->A[m]) < *bg_est) { model->phi[m] = TWO_PI*(float)rand()/RAND_MAX; diff --git a/codec2/src/postfilter.h b/codec2/src/postfilter.h index 1d5c4e81..11037174 100644 --- a/codec2/src/postfilter.h +++ b/codec2/src/postfilter.h @@ -29,6 +29,6 @@ #ifndef __POSTFILTER__ #define __POSTFILTER__ -void postfilter(MODEL *model, int voiced, float *bg_est); +void postfilter(MODEL *model, float *bg_est); #endif diff --git a/codec2/src/quantise.c b/codec2/src/quantise.c index a6fa4f74..3e87344c 100644 --- a/codec2/src/quantise.c +++ b/codec2/src/quantise.c @@ -1,7 +1,7 @@ /*---------------------------------------------------------------------------*\ FILE........: quantise.c - AUTHOR......: David Rowe + AUTHOR......: David Rowe DATE CREATED: 31/5/92 Quantisation functions for the sinusoidal coder. @@ -508,3 +508,81 @@ void aks_to_M2( } } + +/*---------------------------------------------------------------------------*\ + + FUNCTION....: encode_Wo() + AUTHOR......: David Rowe + DATE CREATED: 22/8/2010 + + Encodes Wo using a WO_BITS code and stuffs into bits[] array. + +\*---------------------------------------------------------------------------*/ + +void encode_Wo(char *bits, int *nbits, float Wo) +{ + int code, bit; + float Wo_min = TWO_PI/P_MAX; + float Wo_max = TWO_PI/P_MIN; + float norm; + int i; + + norm = (Wo - Wo_min)/(Wo_max - Wo_min); + code = floor(WO_LEVELS * norm + 0.5); + if (code < 0 ) code = 0; + if (code > (WO_LEVELS-1)) code = WO_LEVELS-1; + + for(i=0; i> (WO_BITS-i-1)) & 0x1; + bits[*nbits+i] = bit; + } + + *nbits += WO_BITS; +} + +/*---------------------------------------------------------------------------*\ + + FUNCTION....: decode_Wo() + AUTHOR......: David Rowe + DATE CREATED: 22/8/2010 + + Decodes Wo using a WO_BITS quantiser. + +\*---------------------------------------------------------------------------*/ + +float decode_Wo(char *bits, int *nbits) +{ + int code; + float Wo_min = TWO_PI/P_MAX; + float Wo_max = TWO_PI/P_MIN; + float step; + float Wo; + int i; + + code = 0; + for(i=0; i V_THRESH) - *voiced = 1; + model->voiced = 1; else - *voiced = 0; + model->voiced = 0; return snr; } diff --git a/codec2/src/sine.h b/codec2/src/sine.h index 18541616..09a31b30 100644 --- a/codec2/src/sine.h +++ b/codec2/src/sine.h @@ -33,7 +33,7 @@ void make_analysis_window(float w[], COMP W[]); void dft_speech(COMP Sw[], float Sn[], float w[]); void two_stage_pitch_refinement(MODEL *model, COMP Sw[]); void estimate_amplitudes(MODEL *model, COMP Sw[], COMP W[]); -float est_voicing_mbe(MODEL *model, COMP Sw[], COMP W[], float f0, COMP Sw_[], int *voiced); +float est_voicing_mbe(MODEL *model, COMP Sw[], COMP W[], float f0, COMP Sw_[]); void make_synthesis_window(float Pn[]); void synthesise(float Sn_[], MODEL *model, float Pn[], int shift); diff --git a/codec2/unittest/Makefile b/codec2/unittest/Makefile index 9364eb00..35143c20 100644 --- a/codec2/unittest/Makefile +++ b/codec2/unittest/Makefile @@ -1,6 +1,6 @@ -CFLAGS = -I. -I../src -I../speex -Wall -g -DFLOATING_POINT -DVAR_ARRAYS +CFLAGS = -I. -I../src -Wall -g -DFLOATING_POINT -DVAR_ARRAYS -all: genres genlsp extract vqtrain tnlp +all: genres genlsp extract vqtrain tnlp tinterp tquant genres: genres.o ../src/lpc.o gcc $(CFLAGS) -o genres genres.o ../src/lpc.o -lm @@ -12,6 +12,11 @@ TNLP_OBJ = tnlp.o ../src/sine.o ../src/nlp.o ../src/four1.o ../src/dump.o TCONTPHASE_OBJ = tcontphase.o ../src/globals.o ../src/dump.o ../src/synth.o \ ../src/four1.c ../src/initdec.o ../src/phase.o +TINTERP_OBJ = tinterp.o ../src/interp.o + +TQUANT_OBJ = tquant.o ../src/quantise.o ../src/lpc.o ../src/lsp.o \ + ../src/dump.o ../src/four1.o + lsptest: $(LSP_TEST_OBJ) gcc $(CFLAGS) -o lsptest $(LSP_TEST_OBJ) -lm @@ -33,6 +38,12 @@ tcontphase: $(TCONTPHASE_OBJ) tmodel: tmodel.o gcc $(CFLAGS) -o tmodel tmodel.o -lm +tinterp: $(TINTERP_OBJ) + gcc $(CFLAGS) -o tinterp $(TINTERP_OBJ) -lm + +tquant: $(TQUANT_OBJ) + gcc $(CFLAGS) -o tquant $(TQUANT_OBJ) -lm + %.o : %.c $(CC) -c $(CFLAGS) $< -o $@ diff --git a/codec2/unittest/tinterp.c b/codec2/unittest/tinterp.c new file mode 100644 index 00000000..f0cf8539 --- /dev/null +++ b/codec2/unittest/tinterp.c @@ -0,0 +1,67 @@ +/*---------------------------------------------------------------------------*\ + + FILE........: tinterp.c + AUTHOR......: David Rowe + DATE CREATED: 22/8/10 + + Tests interpolation functions. + +\*---------------------------------------------------------------------------*/ + +/* + Copyright (C) 2010 David Rowe + + All rights reserved. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License version 2, as + published by the Free Software Foundation. This program is + distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. +*/ + +#include +#include +#include +#include +#include + +#include "defines.h" +#include "dump.h" +#include "interp.h" + +int main() { + MODEL prev, next, interp; + int l; + FILE *f; + + f = fopen("interp.txt","wt"); + + prev.Wo = PI/20; + prev.L = PI/prev.Wo; + prev.voiced = 1; + for(l=1;l<=prev.L; l++) + prev.A[l] = 1000.0; + + next.Wo = PI/30; + next.L = PI/next.Wo; + next.voiced = 1; + for(l=1;l<=next.L; l++) + next.A[l] = 2000.0; + + interp.voiced = 1.0; + interpolate(&interp, &prev, &next); + printf("Wo = %f\n", interp.Wo); + + for(l=1; l<=interp.L; l++) + fprintf(f, "%f\n", interp.A[l]); + + fclose(f); + return 0; +} diff --git a/codec2/unittest/tquant.c b/codec2/unittest/tquant.c new file mode 100644 index 00000000..04753776 --- /dev/null +++ b/codec2/unittest/tquant.c @@ -0,0 +1,108 @@ +/*---------------------------------------------------------------------------*\ + + FILE........: tquant.c + AUTHOR......: David Rowe + DATE CREATED: 22/8/10 + + Generates quantisation curves for plotting on Octave. + +\*---------------------------------------------------------------------------*/ + +/* + Copyright (C) 2010 David Rowe + + All rights reserved. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License version 2, as + published by the Free Software Foundation. This program is + distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. +*/ + +#include +#include +#include +#include +#include + +#include "defines.h" +#include "dump.h" +#include "quantise.h" + +int main() { + int i,c,bit; + FILE *f; + float Wo,Wo_dec, error, step_size; + char bits[WO_BITS]; + int code, nbits, code_in, code_out; + + /* output pitch quant curve for plotting */ + + f = fopen("quant_pitch.txt","wt"); + + for(Wo=0.9*(TWO_PI/P_MAX); Wo<=1.1*(TWO_PI/P_MIN); Wo += 0.001) { + nbits = 0; + encode_Wo(bits, &nbits, Wo); + code = 0; + for(i=0; i> (WO_BITS-1-i)) & 0x1; + bits[i] = bit; + } + nbits = 0; + Wo = decode_Wo(bits, &nbits); + nbits = 0; + + memset(bits, sizeof(bits), 0); + encode_Wo(bits, &nbits, Wo); + code_out = 0; + for(i=0; i (step_size/2.0)) { + printf("error: %f step_size/2: %f\n", error, step_size/2.0); + exit(0); + } + fprintf(f,"%f\n",error); + } + printf("OK\n"); + + fclose(f); + return 0; +}