-Codec2 - Open Source Low Bit Rate Speech Codec
-===============================================
-
-David Rowe, VK5DGR
-
-[[introduction]]
-Introduction
-------------
-
-Codec2 is an open source low bit rate speech codec designed for
-communications quality speech at around 2400 bit/s. Applications
-include low bandwidth HF/VHF digital radio. It fills a gap in open
-source, free-as-in-speech voice codecs beneath 5000 bit/s.
-
-The motivations behind the project are summarised in this
-link:/blog/?p=128[blog post].
-
-You can help and support Codec2 development via a <<help,PayPal donation,>>.
-
-[[status]]
-Status
-------
-
-Still in experimental/development stage - no 2400 bit/s codec
-available yet, but we are getting very close!
-
-Progress to date:
-
-1. Unquantised encoder (sinenc) and decoder (sinedec) running under
- Linux/gcc. The decoder (sinedec) is a test-bed for various
- modelling and quantisation options - these are controlled via
- command line switches.
-
-2. LPC modelling is working nicely and there is a first pass LSP
- vector quantiser working at 32 bits/frame with acceptable voice
- quality. Lots could be done to improve this (e.g. improved quality
- and reduced bit rate).
-
-3. A phase model has been developed that uses 0 bits for phase and 1
- bit/frame for voiced/unvoiced decision but delivers high quality
- speech. This works suspiciously well - codecs with a single bit
- voiced/unvoiced decision aren't meant to sound this good. Usually
- mixed voicing at several bits/frame is required.
-
-4. An experimental post filter has been developed to improve
- performance with speech mixed with background noise. The post
- filter delivers many of the advantages of mixed voicing but unlike
- mixed voicing requires zero bits.
-
-5. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple
- pitch tracker has been developed to help with some problem frames.
-
-6. An algorithm for decimating sinusoidal model parameters from the
- native 10ms rate to a 20ms rate has been developed. High quality
- speech is maintained, some small differences are only audible
- through headphones. A 20ms frame rate is required for 2400 bit/s
- coding.
-
-Current work areas:
-
-1. Reduce CPU Load of the first order phase model fit. Apart from
- that the codec runs very quickly.
-
-2. Write separate encoder and decoder functions and demo programs for
- the alpha 2400 bit/s codec. Currently most of the processing
- happens inside the sinedec program.
-
-[[source]]
-The Source Code
----------------
-
-Browse:
-
-http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/[http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/]
-
-Check Out:
-
- $ svn co https://freetel.svn.sourceforge.net/svnroot/freetel/codec2 codec2
-
-[[support]]
-The Mailing List
-----------------
-
-For any questions, comments, support, suggestions for applications by
-end users and developers please post to the https://lists.sourceforge.net/lists/listinfo/freetel-codec2[Codec2 Mailing List]
-
-[[quickstart]]
-Quick Start
------------
-
-To encode the file raw/hts1a.raw to a set of sinusoidal model
-parameters (hts1.mdl) then decode to a raw file src/hts1a_uq:
-
- $ cd codec2/src
- $ make
- $ ./sinenc ../raw/hts1a.raw hts1.mdl 300 ../pitch/hts1a.p
- $ ./sinedec ../raw/hts1a.raw hts1.mdl -o hts1a_uq.raw
- $ play -f s -r 8000 -s w ../raw/hts1a.raw
- $ play -f s -r 8000 -s w hts1a_uq.raw
-
-[[plan]]
-Development Roadmap
--------------------
-
- [X] Milestone 0 - Project kick off
- [X] Milestone 1 - Alpha 2400 bits/s codec
- [X] Spectral amplitudes modelled using LPC
- [X] Phase and voicing model developed
- [X] Pitch estimator
- [X] Spectral amplitudes quantised using LSPs
- [X] Decimation of model parameters from 20ms to 10ms
- [ ] Refactor to develop a seperate encoder/decoder functions
- [ ] Complete 2400 bits/s codec demonstrated
- [ ] Reduced complexity voicing estimator
- [ ] Test phone call over LAN
- [ ] Release 0.1 for Alpha Testing
- [ ] Milestone 2 - Beta codec for digital radio
- [ ] Gather samples from the community with different speakers,
- input filtering, and background noise conditions that break
- codec.
- [ ] Further develop algorithm based on samples above
- [ ] Design FEC scheme
- [ ] Test over digital radio links
- [ ] Milestone 3 - Fixed point port
- [ ] Milestone 4 - codec2-on-a-chip embedded DSP/CPU port
-
-[[howitworks]]
-How it Works
-------------
-
-Speech is modelled as a sum of sinusoids:
-
- for(m=1; m<=L; m++)
- s[n] = A[m]*cos(Wo*m + phi[m]);
-
-The sinusoids are multiples of the fundamental frequency Wo
-(omega-naught), so the technique is known as "harmonic sinusoidal
-coding". For each frame, we analyse the speech signal and extract a set of
-parameters:
-
- Wo, {A}, {phi}
-
-Where Wo is the fundamental frequency (also know as the pitch), { A
-} is a set of L amplitudes and { phi } is a set of L phases. L is
-chosen to be equal to the number of harmonics that can fit in a 4 KHz
-bandwidth:
-
- L = floor(pi/Wo)
-
-Wo is specified in radians normalised to 4 kHz, such that pi radians
-= 4 kHz. the fundamental frequency in Hz is:
-
- F0 = (8000/(2*pi))*Wo
-
-We then need to encode (quantise) Wo, { A }, { phi } and transmit them to
-a decoder which reconstructs the speech. A frame might be 10-20ms in
-length so we update the parameters every 10-20ms (100 to 50 Hz
-update rate).
-
-The speech quality of the basic harmonic sinusoidal model is pretty
-good, close to transparent. It is also relatively robust to Wo
-estimation errors. Unvoiced speech (e.g. consonants) are well
-modelled by a bunch of harmonics with random phases. Speech corrupted
-with background noise also sounds OK, the background noise doesn't
-introduce any grossly unpleasant artifacts.
-
-As the parameters are quantised to a low bit rate and sent over the
-channel, the speech quality drops. The challenge is to achieve a
-reasonable trade off between speech quality and bit rate.
-
-Bit Allocation
+Codec 2 README
--------------
-[grid="all"]
-`-------------------------------.----------
-Parameter bits/frame
--------------------------------------------
-Spectral magnitudes (LSPs) 32
-Low frequency LPC correction 1
-Energy 5
-Voicing 1
-Fundamental Frequency (Wo) 7
-Spare 2
--------------------------------------------
-Total 48
--------------------------------------------
+Codec 2 is an open source 2400 bit/s speech codec. For more
+information please see:
-At a 20ms update rate 48 bits/frame is 2400 bits/s.
+ http://rowetel.com.codec2.html
-[[challenges]]
-Challenges
+Quickstart
----------
-The tough bits of this project are:
-
-1. Parameter estimation, in particular pitch (Wo) detection.
-
-2. Reduction of a time-varying number of parameters (L changes with Wo
- each frame) to a fixed number of parameters required for a fixed
- bit rate. The trick here is that { A } tend to vary slowly with
- frequency, so we can "fit" a curve to the set of { A } and send
- parameters that describe that curve.
-
-3. Discarding the phases { phi }. In most low bit rate codecs phases
- are discarded, and synthesised at the decoder using a rule-based
- approach. This also implies the need for a "voicing" model as
- voiced speech (vowels) tends to have a different phase structure to
- unvoiced (constants). The voicing model needs to be accurate (not
- introduce distortion), and relatively low bit rate.
-
-4. Quantisation of the amplitudes { A } to a small number of bits
- while maintaining speech quality. For example 30 bits/frame at a
- 20ms frame rate is 30/0.02 = 1500 bits/s, a large part of our 2400
- bit/s "budget".
-
-5. Performance with different speakers and background noise
- conditions. This is where you come in - as codec2 develops please
- send me samples of it's performance with various speakers and
- background noise conditions and together we will improve the
- algorithm. This approach proved very powerful when developing
- link:oslec.html[Oslec]. One of the cool things about open source!
-
-[[help]]
-Can I help?
------------
-
-Maybe; check out the the <<plan, Development Roadmap>> above and the latest
-version of the
-http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/TODO.txt?view=log[TODO]
-list and and see if there is anything that interests you.
-
-Not all of this project is DSP. There are many general C coding tasks
-like refactoring and writing a command line soft phone application for
-testing the codec over a LAN.
-
-I will happily accept *sponsorship* for this project. For example
-research grants, or development contracts from companies interested in
-seeing an open source low bit rate speech codec.
+$ cd codec2/src
+$ make
+$ ./c2enc ../raw/hts1a.raw hts1a_c2.bit
+$ ./c2dec hts1a_c2.bit hts1a_c2.raw
+$ ../scripts/menu.sh ../raw/hts1a.raw hts1a_c2.raw
-You can also donate to the codec2 project via PayPal (which also
-allows credit card donations):
+Programs
+--------
-+++++++++++++++++++++++++++
-<form name="_xclick" action="https://www.paypal.com/cgi-bin/webscr" method="post">
-<input type="hidden" name="cmd" value="_xclick">
-<input type="hidden" name="business" value="david@rowetel.com">
-<input type="hidden" name="item_name" value="Codec2 donation">
-<input type="hidden" name="currency_code" value="USD">
-Donation in US$: <input name="amount" value="10.00">
-<input type="image" src="http://www.paypal.com/en_US/i/btn/btn_donate_LG.gif" border="0" name="submit" alt="Make payments with PayPal - it's fast, free and secure!">
-</form>
-+++++++++++++++++++++++++++
+1/ c2enc encodes a file of speech sample to a file of encoded bits.
+One bit is stored in the LSB or each byte.
-[[patents]]
-Is it Patent Free?
-------------------
+2/ c2dec decodes a file of bits to a file of speech samples.
-I think so - much of the work is based on old papers from the 60, 70s
-and 80's and the PhD thesis work [2] used as a baseline for this codec
-was original. A nice little mini project would be to audit the
-patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and
-compare.
+3/ c2sim is a simulation/development version of codec 2. It allows
+selective use of the various codec 2 algorithms. For example
+switching phase modelling or LSP quantisation on and off.
-Proprietary codecs typically have small, novel parts of the algorithm
-protected by patents. However proprietary codecs also rely heavily on
-large bodies of public domain work. The patents cover perhaps 5% of
-the codec algorithms. Proprietary codec designers did not invent most
-of the algorithms they use in their codec. Typically, the patents just
-cover enough to make designing an interoperable codec very difficult.
-These also tend to be the parts that make their codecs sound good.
-
-However there are many ways to make a codec sound good, so we simply
-need to choose and develop other methods.
-
-[[compat]]
-Is Codec2 compatible with xMBE or MELP?
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Nope - I don't think it's possible to build a compatible codec without
-infringing on patents or access to commercial in confidence
-information.
-
-[[hacking]]
-Hacking
--------
-
-If you would like to work on the Codec2 code base here are some
-notes:
-
-* src/code.sh will perform the several processing steps
- required to output speech files at various processing steps, for
- example:
-
- $ ./code.sh hts1a
-+
-will produce hts1a_uq (unquantised, i.e. baseline sinusoidal model),
-hts1a_phase0 (zero phase model), hts1a_lpc10 (10th order LPC model).
-
-* You can then listen to all of these samples (and the original)
- using:
-
- $ ./listen.sh hts1a
-
-* Specific notes about LPC and Phase modelling are below.
-
-* There are some useful scripts in the scripts directory, for example
- wav2raw.sh, raw2wav.sh, playraw.sh, menu.sh. Note that code.sh and
- listen.sh are in the src directory as thats where they get used most
- of the time.
-
-[[lpc]]
-LPC Modelling Notes
--------------------
-
-Linear Prediction Coefficient (LPC) modelling is used to model the
-sine wave amplitudes { A }. The use of LPC in speech coding is
-common, although the application of LPC modelling to frequency domain
-coding is fairly novel. They are mainly used for time domain codecs
-like LPC-10 and CELP.
-
-LPC modelling has a couple of advantages:
-
-* From time domain coding we know a lot about LPC, for example how to
- quantise them efficiently using Line Spectrum Pairs (LSPs).
-
-* The number of amplitudes varies each frame as Wo and hence L vary.
- This makes the { A } tricky to quantise and transmit. However it is
- possible to convey the same information using a fixed number of
- LPCs which makes the quantisation problem easier.
-
-To test LPC modelling:
-
- $ ./sinedec ../raw/hts1a.raw hts1a.mdl --lpc 10 - hts1a_lpc10.raw
-
-The blog post [4] discusses why LPC modelling works so well when Am
-recovered via RMS method (Section 5.1 of thesis). Equal area model of
-LPC spectra versus harmonic seems to work remarkably well, especially
-compared to sampling. SNRs up to 30dB on female frames.
-
-There is a problem with modelling the low order (e.g. m=1,
-i.e. fundamental) harmonics for males. The amplitude of the m=1
-harmonic is raised by as much as 30dB after LPC modelling as (I think)
-LPC spectra must have zero derivative at DC. This means it's poor at
-modelling very low freq harmonics which unfortunately the ear is very
-sensitive to. To correct this an extra bit has been added to correct
-LPC modelling errors on the first harmonic. When set this bit
-instructs the decoder to attenuate the LPC modelled harmonic by 30dB.
-
-[[phase]]
-Phase Modelling Notes
----------------------
-
-"Zero order" and "First order" phase models have been developed to
-synthesise phases of each harmonic at the decoder side. These are
-described in source code of
-http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/src/phase.c?view=log[phase.c].
-
-The zero phase model required just one voicing bit to be transmitted
-to the decoder, all other phase information is synthesised use a rule
-based model. It seems to work OK for most speech samples, but adds a
-"clicky" artcifact to some low picthed speakers. Also see the blog
-posts below for more discussion of phase models.
-
-To determine voicing we attempt to fit a first order phase model, then
-measure SNR of the fit. The frame is declared unvoiced if the SNR is
-beneath a threshold.
-
-Current phase modelling makes no attempt to match harmonic phases at
-frame edges. This area would be worth experimenting with, as it could
-cause roughness. Then again it might be responsible for effective
-mixed voicing modelling.
-
-Unvoiced speech can be represented well by random phases and a Wo
-estimate that jumps around randomly. If Wo is small the number of
-harmonics is large which makes the noise less periodic and more noise
-like to the ear. With Wo jumping around phase tracks are
-discontinuous between frames which also makes the synthesised signal
-more noise like and prevents time domain pulses forming that the ear
-is sensitive to.
-
-Running Phase Models
-~~~~~~~~~~~~~~~~~~~~
-
-In src/phase.c two phase models have been implemented, the algorithms
-are explained in the source code.
-
-Zero Phase model (just one voicing bit/frame):
-
- $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase 0 - hts1a_phase0.raw
-
-First order Phase model (when quantised around 13 bits/frame bits for
-pulse position, phase, and a voicing bit):
-
- $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase 1 - hts1a_phase1.raw
-
-[[octave]]
-Octave Scripts
---------------
-
-* pl.m - plot a segment from a raw file
-
-* pl2.m - plot the same segments from two different files to compare
-
-* plamp.m - menu based GUI interface to "dump" files, move back and forward
- through file examining time and frequency domain parameters, lpc
- model etc
-
- $ ./sinedec ../raw/hts1a.raw hts1a.mdl --lpc 10 --dump hts1a
- $ cd ../octave
- $ octave
- octave:1> plamp("../src/hts1a",25)
-
-* plphase.m - similar to plamp.m but for analysing phase models
-
- $ ./sinedec ../raw/hts1a.raw hts1a.mdl --phase [0|1] --dump hts1a_phase
- $ cd ../octave
- $ octave
- octave:1> plphase("../src/hts1a_phase",25)
-
-* plpitch.m - plot two pitch contours (.p files) and compare
-
-* plnlp.m - plots a bunch of NLP pitch estimator states. link:/images/codec2/tnlp_screenshot.png[Screenshot]
-
-[[directories]]
Directories
-----------
- script - shell scripts to simply common operations
- speex - LSP quantisation code borrowed from Speex for testing
+ script - shell scripts for playing and converting raw files
src - C source code
- octave - Matlab/Octave scripts
+ octave - Octave scripts used for visualising internal signals
+ during development
pitch - pitch estimator output files
raw - speech files in raw format (16 bits signed linear 8 KHz)
- unittest - Unit test source code
+ unittest - unit test source code
wav - speech files in wave file format
-[[other]]
-Other Uses
-----------
-
-The DSP algorithms contained in codec2 may be useful for other DSP
-applications, for example:
-
-* The
- http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/src/nlp.c?view=log[nlp.c]
- pitch estimator is a working, tested, pitch estimator for human
- speech. NLP is an open source pitch estimator presented in C code
- complete with a GUI debugging tool (plnlp.m
- link:/images/codec2/tnlp_screenshot.png[screenshot]). It can be run
- stand-alone using the
- http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/unittest/tnlp.c?view=log[tnlp.c]
- unit test program. It could be applied to other speech coding
- research. Pitch estimation is a popular subject in academia,
- however most pitch estimators are described in papers, with the fine
- implementation details left out.
-
-* The basic analysis/synthesis framework could be used for high
- quality speech synthesis.
-
-[[refs]]
-References
-----------
-
-[1] http://perens.com/[Bruce Perens] introducing the
- http://codec2.org/[codec2 project concept]
-
-[2] David's PhD Thesis,
- http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf["Techniques for
- Harmonic Sinusoidal Coding"], used for baseline algorithm
-
-[3] http://www.rowetel.com/blog/?p=128[Open Source Low rate Speech
- Codec Part 1 - Introduction]
-
-[4] http://www.rowetel.com/blog/?p=130[Open Source Low rate Speech
- Codec Part 2 - Spectral Magnitudes]
-
-[5] http://www.rowetel.com/blog/?p=131[Open Source Low rate Speech
- Codec Part 3 - Phase and Male Speech]
-
-[6] http://www.rowetel.com/blog/?p=132[Open Source Low rate Speech
- Codec Part 4 - Zero Phase Model]
-
int phase0;
float ex_phase[MAX_AMP+1];
- int voiced, voiced_1, voiced_2;
int postfilt;
float bg_est;
- int hand_snr;
- FILE *fsnr;
+ int hand_voicing;
+ FILE *fvoicing;
MODEL model_1, model_2, model_3, model_synth, model_a, model_b;
int transition, decimate;
prev_Wo = TWO_PI/P_MAX;
- voiced_1 = voiced_2 = 0;
model_1.Wo = TWO_PI/P_MIN;
model_1.L = floor(PI/model_1.Wo);
for(i=1; i<=model_1.L; i++) {
printf(" [--lsp]\n");
printf(" [--phase0]\n");
printf(" [--postfilter]\n");
- printf(" [--hand_snr]\n");
+ printf(" [--hand_voicing]\n");
printf(" [--dec]\n");
printf(" [--dump DumpFilePrefix]\n");
exit(0);
ex_phase[0] = 0;
}
- hand_snr = switch_present("--hand_snr",argc,argv);
- if (hand_snr) {
- fsnr = fopen(argv[hand_snr+1],"rt");
- assert(fsnr != NULL);
+ hand_voicing = switch_present("--hand_voicing",argc,argv);
+ if (hand_voicing) {
+ fvoicing = fopen(argv[hand_voicing+1],"rt");
+ assert(fvoicing != NULL);
}
bg_est = 0.0;
if (phase0) {
float Wn[M]; /* windowed speech samples */
- float ak_phase[PHASE_LPC+1]; /* autocorrelation coeffs */
- float Rk[PHASE_LPC+1]; /* autocorrelation coeffs */
+ float ak_phase[LPC_ORD+1]; /* autocorrelation coeffs */
+ float Rk[LPC_ORD+1]; /* autocorrelation coeffs */
COMP Sw_[FFT_ENC];
dump_phase(&model.phi[0], model.L);
/* determine voicing */
- snr = est_voicing_mbe(&model, Sw, W, (FS/TWO_PI)*model.Wo, Sw_, &voiced);
+ snr = est_voicing_mbe(&model, Sw, W, (FS/TWO_PI)*model.Wo, Sw_);
dump_Sw_(Sw_);
dump_snr(snr);
for(i=0; i<MAX_AMP; i++)
model.phi[i] = 0;
- if (hand_snr) {
- fscanf(fsnr,"%f\n",&snr);
- voiced = snr > 2.0;
+ if (hand_voicing) {
+ fscanf(fvoicing,"%d\n",&model.voiced);
}
- phase_synth_zero_order(&model, ak_phase, voiced, ex_phase);
+ phase_synth_zero_order(&model, ak_phase, ex_phase);
if (postfilt)
- postfilter(&model, voiced, &bg_est);
+ postfilter(&model, &bg_est);
}
/* optional LPC model amplitudes */
if (dump)
dump_off();
- if (hand_snr)
- fclose(fsnr);
+ if (hand_voicing)
+ fclose(fvoicing);
return 0;
}
/* Estimate pitch */
nlp(c2->Sn,N,M,P_MIN,P_MAX,&pitch,Sw,&c2->prev_Wo);
- prev_Wo = TWO_PI/pitch;
+ c2->prev_Wo = TWO_PI/pitch;
model.Wo = TWO_PI/pitch;
+ model.L = PI/model.Wo;
/* estimate model parameters */
dft_speech(Sw, c2->Sn, c2->w);
two_stage_pitch_refinement(&model, Sw);
estimate_amplitudes(&model, Sw, c2->W);
- est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_, &voiced1);
-
+ est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_);
+ voiced1 = model.voiced;
+
/* Second Frame - send all parameters --------------------------------*/
/* Read input speech */
/* Estimate pitch */
nlp(c2->Sn,N,M,P_MIN,P_MAX,&pitch,Sw,&c2->prev_Wo);
- prev_Wo = TWO_PI/pitch;
+ c2->prev_Wo = TWO_PI/pitch;
model.Wo = TWO_PI/pitch;
+ model.L = PI/model.Wo;
/* estimate model parameters */
dft_speech(Sw, c2->Sn, c2->w);
two_stage_pitch_refinement(&model, Sw);
estimate_amplitudes(&model, Sw, c2->W);
- est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_, &voiced2);
+ est_voicing_mbe(&model, Sw, c2->W, (FS/TWO_PI)*model.Wo, Sw_);
+ voiced2 = model.voiced;
/* quantise */
nbits = 0;
- encode_pitch(bits, &nbits, model.Wo);
+ encode_Wo(bits, &nbits, model.Wo);
encode_voicing(bits, &nbits, voiced1, voiced2);
encode_amplitudes(bits, &nbits, c2->Sn, c2->w);
assert(nbits == CODEC2_BITS_PER_FRAME);
\*---------------------------------------------------------------------------*/
-void codec2_decode(float Sn_[], char bits[])
+void codec2_decode(void *codec2_state, short speech[], char bits[])
{
CODEC2 *c2;
MODEL model;
float ak[LPC_ORD+1];
int voiced1, voiced2;
- int i, nbits, transition;
- MODEL model_fwd, model_back, model_interp;
+ int i, nbits;
+ MODEL model_interp;
assert(codec2_state != NULL);
c2 = (CODEC2*)codec2_state;
nbits = 0;
- model.Wo = decode_pitch(bits, &nbits);
+ model.Wo = decode_Wo(bits, &nbits);
+ model.L = PI/model.Wo;
decode_voicing(&voiced1, &voiced2, bits, &nbits);
decode_amplitudes(&model, ak, bits, &nbits);
assert(nbits == CODEC2_BITS_PER_FRAME);
/* First synthesis frame - interpolated from adjacent frames */
- interp(c2->prev_model, &model, &model_interp, &model_fwd, &model_back, &transition);
- phase_synth_zero_order(&model, ak, voiced1, &c2->ex_phase);
-
- postfilter(&model, voiced, &c2->bg_est);
- if (transition) {
- synthesise(Sn_,&model_a,Pn,1);
- synthesise(Sn_,&model_b,Pn,0);
- }
- else {
- synthesise(Sn_,&model,Pn,1);
- }
-
- /* Save output speech to disk */
-
- for(i=0; i<N; i++) {
- if (Sn_[i] > 32767.0)
- buf[i] = 32767;
- else if (Sn_[i] < -32767.0)
- buf[i] = -32767;
- else
- buf[i] = Sn_[i];
- }
-
+ model_interp.voiced = voiced1;
+ interp(&model_interp, &c2->prev_model, &model);
+ phase_synth_zero_order(&model_interp, ak, voiced1, &c2->ex_phase);
+ postfilter(&model_interp, voiced1, &c2->bg_est);
+ synthesise(c2->Sn_, &model_interp, c2->Pn, 1);
+
+ for(i=0; i<N; i++) {
+ if (Sn_[i] > 32767.0)
+ speech[i] = 32767;
+ else if (Sn_[i] < -32767.0)
+ speech[i] = -32767;
+ else
+ speech[i] = Sn_[i];
+ }
+
+ /* Second synthesis frame */
+
+ model.voiced = voiced2;
+ phase_synth_zero_order(&model, ak, voiced2, &c2->ex_phase);
+ postfilter(&model, voiced2, &c2->bg_est);
+ synthesise(c2->Sn_, &model, c2->Pn, 1);
+
+ for(i=0; i<N; i++) {
+ if (Sn_[i] > 32767.0)
+ speech[i+N] = 32767;
+ else if (Sn_[i] < -32767.0)
+ speech[i+N] = -32767;
+ else
+ speech[i+N] = Sn_[i];
+ }
+
+ memcpy(&c2->prev_model, model, sizeof(MODEL);
}
void *codec2_create();
void codec2_destroy(void *codec2_state);
-void codec2_encode(void *codec2_state, char bits[], short speech[]);
-void codec2_decode(float Sn_[], char bits[]);
+void codec2_encode(void *codec2_state, char bits[], short speech_in[]);
+void codec2_decode((void *codec2_state, int speech_out[], char bits[]);
#endif
typedef struct {
float Wo; /* fundamental frequency estimate in radians */
int L; /* number of harmonics */
- float v[MAX_AMP]; /* voicing measures (unused at present) */
float A[MAX_AMP]; /* amplitiude of each harmonic */
float phi[MAX_AMP]; /* phase of each harmonic */
+ int voiced; /* non-zero if this frame is voiced */
} MODEL;
#endif
fprintf(fmodel,"%f\t",model->A[l]);
for(l=model->L+1; l<MAX_AMP; l++)
fprintf(fmodel,"0.0\t");
- for(l=1; l<=model->L; l++)
- fprintf(fmodel,"%f\t",model->v[l]);
- for(l=model->L+1; l<MAX_AMP; l++)
- fprintf(fmodel,"0.0\t");
+ fprintf(fmodel,"%d\t",model->voiced);
fprintf(fmodel,"\n");
}
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
+#include <assert.h>
#include <math.h>
#include <string.h>
#include "defines.h"
#include "interp.h"
+float sample_log_amp(MODEL *model, float w);
+
/*---------------------------------------------------------------------------*\
- interp()
+ FUNCTION....: interp()
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/10
- Given two frames decribed by model parameters 20ms apart, determines the
- model parameters of the 10ms frame between them.
-
+ Given two frames decribed by model parameters 20ms apart, determines
+ the model parameters of the 10ms frame between them. Assumes
+ voicing is available for middle (interpolated) frame. Outputs are
+ amplitudes and Wo for the interpolated frame.
+
+ This version can interpolate the amplitudes between two frames of
+ different Wo and L.
+
\*---------------------------------------------------------------------------*/
-void interp(
+void interpolate(
+ MODEL *interp, /* interpolated model params */
MODEL *prev, /* previous frames model params */
- MODEL *next, /* next frames model params */
- MODEL *synth, /* interp model params for cont frame */
- MODEL *a, /* prev frame extended into this frame */
- MODEL *b, /* next frame extended into this frame */
- int *transition /* non-zero if this is a transition frame, this
- information is used for synthesis */
+ MODEL *next /* next frames model params */
)
{
- int m;
-
- if (fabs(next->Wo - prev->Wo) < 0.1*next->Wo) {
-
- /* If the Wo of adjacent frames is within 10% we synthesise a
- continuous track through this frame by linear interpolation
- of the amplitudes and Wo. This is typical of a strongly
- voiced frame.
- */
-
- *transition = 0;
-
- synth->Wo = (next->Wo + prev->Wo)/2.0;
- if (next->L > prev->L)
- synth->L = prev->L;
- else
- synth->L = next->L;
- for(m=1; m<=synth->L; m++) {
- synth->A[m] = (prev->A[m] + next->A[m])/2.0;
- }
+ int l;
+ float w,log_amp;
+
+ /* Wo depends on voicing of this and adjacent frames */
+
+ if (interp->voiced) {
+ if (prev->voiced && next->voiced)
+ interp->Wo = (prev->Wo + next->Wo)/2.0;
+ if (!prev->voiced && next->voiced)
+ interp->Wo = next->Wo;
+ if (prev->voiced && !next->voiced)
+ interp->Wo = prev->Wo;
+ }
+ else {
+ interp->Wo = TWO_PI/P_MAX;
+ }
+ interp->L = PI/interp->Wo;
+
+ /* Interpolate amplitudes using linear interpolation in log domain */
+
+ for(l=1; l<=interp->L; l++) {
+ w = l*interp->Wo;
+ log_amp = (sample_log_amp(prev, w) + sample_log_amp(next, w))/2.0;
+ interp->A[l] = pow(10.0, log_amp);
+ }
+}
+
+/*---------------------------------------------------------------------------*\
+
+ FUNCTION....: sample_log_amp()
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/10
+
+ Samples the amplitude envelope at an arbitrary frequency w. Uses
+ linear interpolation in the log domain to sample between harmonic
+ amplitudes.
+
+\*---------------------------------------------------------------------------*/
+
+float sample_log_amp(MODEL *model, float w)
+{
+ int m;
+ float f, log_amp;
+
+ assert(w > 0.0); assert (w <= PI);
+
+ m = floor(w/model->Wo);
+ f = w - m*model->Wo;
+ assert(f <= 1.0);
+
+ if (m < 1) {
+ log_amp = f*log10(model->A[1]);
+ }
+ else if ((m+1) > model->L) {
+ log_amp = (1.0-f)*log10(model->A[model->L]);
}
else {
- /*
- transition frame, adjacent frames have different Wo and L
- so set up two sets of model parameters based on prev and
- next. We then synthesise both of them and add them
- together in the time domain.
-
- The transition case is typical of unvoiced speech or
- background noise or a voiced/unvoiced transition.
- */
-
- *transition = 1;
-
- /* a is prev extended forward into this frame, b is next
- extended backward into this frame. Note the adjustments to
- phase to time-shift the model forward or backward N
- samples. */
-
- memcpy(a, prev, sizeof(MODEL));
- memcpy(b, next, sizeof(MODEL));
- for(m=1; m<=a->L; m++) {
- a->A[m] /= 2.0;
- a->phi[m] += a->Wo*m*N;
- }
- for(m=1; m<=b->L; m++) {
- b->A[m] /= 2.0;
- b->phi[m] -= b->Wo*m*N;
- }
+ log_amp = (1.0-f)*log10(model->A[m]) + f*log10(model->A[m+1]);
}
+ return log_amp;
}
#ifndef __INTERP__
#define __INTERP__
-void interp(MODEL *prev, MODEL *next, MODEL *synth, MODEL *a, MODEL *b,
- int *transition);
+void interpolate(MODEL *interp, MODEL *prev, MODEL *next);
#endif
void phase_synth_zero_order(
MODEL *model,
float aks[],
- int voiced,
float *ex_phase /* excitation phase of fundamental */
)
{
float jitter;
G = 1.0;
- aks_to_H(model,aks,G,H,PHASE_LPC);
+ aks_to_H(model,aks,G,H,LPC_ORD);
/*
Update excitation fundamental phase track, this sets the position
/* generate excitation */
- if (voiced) {
+ if (model->voiced) {
/* This method of adding jitter really helped remove the clicky
sound in low pitched makes like hts1a. This moves the onset
of each harmonic over at +/- 0.25 of a sample.
#ifndef __PHASE__
#define __PHASE__
-void phase_synth_zero_order(MODEL *model, float aks[], int voiced, float *ex_phase);
+void phase_synth_zero_order(MODEL *model, float aks[], float *ex_phase);
#endif
void postfilter(
MODEL *model,
- int voiced,
float *bg_est
)
{
of the threshold is to prevent updating during high level
speech. */
- if ((e < BG_THRESH) && !voiced)
+ if ((e < BG_THRESH) && !model->voiced)
*bg_est = *bg_est*(1.0 - BG_BETA) + e*BG_BETA;
/* now mess with phases during voiced frames to make any harmonics
*/
uv = 0;
- if (voiced)
+ if (model->voiced)
for(m=1; m<=model->L; m++)
if (20.0*log10(model->A[m]) < *bg_est) {
model->phi[m] = TWO_PI*(float)rand()/RAND_MAX;
#ifndef __POSTFILTER__
#define __POSTFILTER__
-void postfilter(MODEL *model, int voiced, float *bg_est);
+void postfilter(MODEL *model, float *bg_est);
#endif
/*---------------------------------------------------------------------------*\
FILE........: quantise.c
- AUTHOR......: David Rowe
+ AUTHOR......: David Rowe
DATE CREATED: 31/5/92
Quantisation functions for the sinusoidal coder.
}
}
+
+/*---------------------------------------------------------------------------*\
+
+ FUNCTION....: encode_Wo()
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/2010
+
+ Encodes Wo using a WO_BITS code and stuffs into bits[] array.
+
+\*---------------------------------------------------------------------------*/
+
+void encode_Wo(char *bits, int *nbits, float Wo)
+{
+ int code, bit;
+ float Wo_min = TWO_PI/P_MAX;
+ float Wo_max = TWO_PI/P_MIN;
+ float norm;
+ int i;
+
+ norm = (Wo - Wo_min)/(Wo_max - Wo_min);
+ code = floor(WO_LEVELS * norm + 0.5);
+ if (code < 0 ) code = 0;
+ if (code > (WO_LEVELS-1)) code = WO_LEVELS-1;
+
+ for(i=0; i<WO_BITS; i++) {
+ bit = (code >> (WO_BITS-i-1)) & 0x1;
+ bits[*nbits+i] = bit;
+ }
+
+ *nbits += WO_BITS;
+}
+
+/*---------------------------------------------------------------------------*\
+
+ FUNCTION....: decode_Wo()
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/2010
+
+ Decodes Wo using a WO_BITS quantiser.
+
+\*---------------------------------------------------------------------------*/
+
+float decode_Wo(char *bits, int *nbits)
+{
+ int code;
+ float Wo_min = TWO_PI/P_MAX;
+ float Wo_max = TWO_PI/P_MIN;
+ float step;
+ float Wo;
+ int i;
+
+ code = 0;
+ for(i=0; i<WO_BITS; i++) {
+ code <<= 1;
+ code |= bits[i];
+ }
+
+ step = (Wo_max - Wo_min)/WO_LEVELS;
+ Wo = Wo_min + step*(code);
+
+ *nbits += WO_BITS;
+
+ return Wo;
+}
+
+void encode_voicing(char bits[], int *nbits, int voiced1, int voiced2)
+{
+ bits[0] = voiced1;
+ bits[1] = voiced2;
+ *nbits += 2;
+}
+
+void decode_voicing(int *voiced1, int *voiced2, char bits[], int *nbits);
+{
+ *voiced1 = bits[0];
+ *voiced2 = bits[1];
+ *nbits += 2;
+}
#ifndef __QUANTISE__
#define __QUANTISE__
+#define WO_BITS 7
+#define WO_LEVELS (1<<WO_BITS)
+
void quantise_init();
-float lpc_model_amplitudes(float Sn[], float w[], MODEL *model, int order,int lsp,float ak[]);
+float lpc_model_amplitudes(float Sn[], float w[], MODEL *model, int order,
+ int lsp,float ak[]);
void aks_to_M2(float ak[], int order, MODEL *model, float E, float *snr);
float get_gmin(void);
+void encode_Wo(char bits[], int *nbits, float Wo);
+float decode_Wo(char bits[], int *nbits);
+
+void encode_voicing(char bits[], int *nbits, int voiced1, int voiced2);
+void decode_voicing(int *voiced1, int *voiced2, char bits[], int *nbits);
+
#endif
/*---------------------------------------------------------------------------*\
FILE........: sine.c
- AUTHOR......: David Rowe
- DATE CREATED: 119/8/2010
+ AUTHOR......: David Rowe
+ DATE CREATED: 19/8/2010
Sinusoidal analysis and synthesis functions.
\*---------------------------------------------------------------------------*/
-void hs_pitch_refinement(MODEL *model, COMP Sw[], float pmin, float pmax, float pstep);
+void hs_pitch_refinement(MODEL *model, COMP Sw[], float pmin, float pmax,
+ float pstep);
/*---------------------------------------------------------------------------*\
COMP Sw[],
COMP W[],
float f0,
- COMP Sw_[], /* DFT of all voiced synthesised signal for f0 */
+ COMP Sw_[] /* DFT of all voiced synthesised signal for f0 */
/* useful for debugging/dump file */
- int *voiced
)
{
int i,l,al,bl,m; /* loop variables */
snr = 10.0*log10(sig/error);
if (snr > V_THRESH)
- *voiced = 1;
+ model->voiced = 1;
else
- *voiced = 0;
+ model->voiced = 0;
return snr;
}
void dft_speech(COMP Sw[], float Sn[], float w[]);
void two_stage_pitch_refinement(MODEL *model, COMP Sw[]);
void estimate_amplitudes(MODEL *model, COMP Sw[], COMP W[]);
-float est_voicing_mbe(MODEL *model, COMP Sw[], COMP W[], float f0, COMP Sw_[], int *voiced);
+float est_voicing_mbe(MODEL *model, COMP Sw[], COMP W[], float f0, COMP Sw_[]);
void make_synthesis_window(float Pn[]);
void synthesise(float Sn_[], MODEL *model, float Pn[], int shift);
-CFLAGS = -I. -I../src -I../speex -Wall -g -DFLOATING_POINT -DVAR_ARRAYS
+CFLAGS = -I. -I../src -Wall -g -DFLOATING_POINT -DVAR_ARRAYS
-all: genres genlsp extract vqtrain tnlp
+all: genres genlsp extract vqtrain tnlp tinterp tquant
genres: genres.o ../src/lpc.o
gcc $(CFLAGS) -o genres genres.o ../src/lpc.o -lm
TCONTPHASE_OBJ = tcontphase.o ../src/globals.o ../src/dump.o ../src/synth.o \
../src/four1.c ../src/initdec.o ../src/phase.o
+TINTERP_OBJ = tinterp.o ../src/interp.o
+
+TQUANT_OBJ = tquant.o ../src/quantise.o ../src/lpc.o ../src/lsp.o \
+ ../src/dump.o ../src/four1.o
+
lsptest: $(LSP_TEST_OBJ)
gcc $(CFLAGS) -o lsptest $(LSP_TEST_OBJ) -lm
tmodel: tmodel.o
gcc $(CFLAGS) -o tmodel tmodel.o -lm
+tinterp: $(TINTERP_OBJ)
+ gcc $(CFLAGS) -o tinterp $(TINTERP_OBJ) -lm
+
+tquant: $(TQUANT_OBJ)
+ gcc $(CFLAGS) -o tquant $(TQUANT_OBJ) -lm
+
%.o : %.c
$(CC) -c $(CFLAGS) $< -o $@
--- /dev/null
+/*---------------------------------------------------------------------------*\
+
+ FILE........: tinterp.c
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/10
+
+ Tests interpolation functions.
+
+\*---------------------------------------------------------------------------*/
+
+/*
+ Copyright (C) 2010 David Rowe
+
+ All rights reserved.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License version 2, as
+ published by the Free Software Foundation. This program is
+ distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
+ License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*/
+
+#include <assert.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+
+#include "defines.h"
+#include "dump.h"
+#include "interp.h"
+
+int main() {
+ MODEL prev, next, interp;
+ int l;
+ FILE *f;
+
+ f = fopen("interp.txt","wt");
+
+ prev.Wo = PI/20;
+ prev.L = PI/prev.Wo;
+ prev.voiced = 1;
+ for(l=1;l<=prev.L; l++)
+ prev.A[l] = 1000.0;
+
+ next.Wo = PI/30;
+ next.L = PI/next.Wo;
+ next.voiced = 1;
+ for(l=1;l<=next.L; l++)
+ next.A[l] = 2000.0;
+
+ interp.voiced = 1.0;
+ interpolate(&interp, &prev, &next);
+ printf("Wo = %f\n", interp.Wo);
+
+ for(l=1; l<=interp.L; l++)
+ fprintf(f, "%f\n", interp.A[l]);
+
+ fclose(f);
+ return 0;
+}
--- /dev/null
+/*---------------------------------------------------------------------------*\
+
+ FILE........: tquant.c
+ AUTHOR......: David Rowe
+ DATE CREATED: 22/8/10
+
+ Generates quantisation curves for plotting on Octave.
+
+\*---------------------------------------------------------------------------*/
+
+/*
+ Copyright (C) 2010 David Rowe
+
+ All rights reserved.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License version 2, as
+ published by the Free Software Foundation. This program is
+ distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
+ License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+*/
+
+#include <assert.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <math.h>
+
+#include "defines.h"
+#include "dump.h"
+#include "quantise.h"
+
+int main() {
+ int i,c,bit;
+ FILE *f;
+ float Wo,Wo_dec, error, step_size;
+ char bits[WO_BITS];
+ int code, nbits, code_in, code_out;
+
+ /* output pitch quant curve for plotting */
+
+ f = fopen("quant_pitch.txt","wt");
+
+ for(Wo=0.9*(TWO_PI/P_MAX); Wo<=1.1*(TWO_PI/P_MIN); Wo += 0.001) {
+ nbits = 0;
+ encode_Wo(bits, &nbits, Wo);
+ code = 0;
+ for(i=0; i<WO_BITS; i++) {
+ code <<= 1;
+ code |= bits[i];
+ }
+ fprintf(f, "%f %d\n", Wo, code);
+ }
+
+ fclose(f);
+
+ /* check for all pitch codes we get 1:1 match between encoder
+ and decoder Wo levels */
+
+ for(c=0; c<WO_LEVELS; c++) {
+ code_in = c;
+ for(i=0; i<WO_BITS; i++) {
+ bit = (code_in >> (WO_BITS-1-i)) & 0x1;
+ bits[i] = bit;
+ }
+ nbits = 0;
+ Wo = decode_Wo(bits, &nbits);
+ nbits = 0;
+
+ memset(bits, sizeof(bits), 0);
+ encode_Wo(bits, &nbits, Wo);
+ code_out = 0;
+ for(i=0; i<WO_BITS; i++) {
+ code_out <<= 1;
+ code_out |= bits[i];
+ }
+ if (code_in != code_out)
+ printf(" Wo %f code_in %d code_out %d\n", Wo,
+ code_in, code_out);
+ }
+
+ /* measure quantisation error stats and compare to expected. Also
+ plot histogram of error file to check. */
+
+ f = fopen("quant_pitch_err.txt","wt");
+ step_size = ((TWO_PI/P_MIN) - (TWO_PI/P_MAX))/WO_LEVELS;
+
+ for(Wo=TWO_PI/P_MAX; Wo<0.99*TWO_PI/P_MIN; Wo += 0.0001) {
+ nbits = 0; encode_Wo(bits, &nbits, Wo);
+ nbits = 0; Wo_dec = decode_Wo(bits, &nbits);
+ error = Wo - Wo_dec;
+ if (error > (step_size/2.0)) {
+ printf("error: %f step_size/2: %f\n", error, step_size/2.0);
+ exit(0);
+ }
+ fprintf(f,"%f\n",error);
+ }
+ printf("OK\n");
+
+ fclose(f);
+ return 0;
+}