Introduction
------------
-Codec2 is a open source low bit rate speech codec designed for
-communications quality speech at around 2400 kbit/s. Applications
+Codec2 is an open source low bit rate speech codec designed for
+communications quality speech at around 2400 bit/s. Applications
include low bandwidth HF/VHF digital radio. It fills a gap in open
source, free-as-in-speech voice codecs beneath 5000 bit/s.
The motivations behind the project are summarised in this
link:/blog/?p=128[blog post].
+You can help and support Codec2 development via a <<help,PayPal donation,>>.
+
[[status]]
Status
------
modelling and quantisation options - these are controlled via
command line switches.
-2. LPC modelling working nicely and there is a first pass LSP vector
- quantiser working at 32 bits/frame with acceptable voice quality.
- Lots could be done to improve this (e.g. improved quality and
- reduced bit rate).
+2. LPC modelling is working nicely and there is a first pass LSP
+ vector quantiser working at 32 bits/frame with acceptable voice
+ quality. Lots could be done to improve this (e.g. improved quality
+ and reduced bit rate).
+
+3. A phase model has been developed that uses 0 bits for phase and 1
+ bit/frame for voiced/unvoiced decision but delivers high quality
+ speech. This works suspiciously well - codecs with a single bit
+ voiced/unvoiced decision aren't meant to sound this good. Usually
+ mixed voicing at several bits/frame is required.
-3. Phase model developed that uses 0 bits for phase and 1 bit/frame
- for voiced/unvoiced decision. An experimental post filter has been
- developed to improve performance for speech with background noise.
+4. An experimental post filter has been developed to improve
+ performance with speech mixed with background noise. The post
+ filter delivers many of the advantages of mixed voicing but unlike
+ mixed voicing requires zero bits.
-4. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple
+5. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple
pitch tracker has been developed to help with some problem frames.
-5. An algorithm for decimating sinusoidal model parameters from the
- native 10ms rate to a 20ms rate has been developed. A 20ms frame
- rate is required for 2400 bit/s coding.
+6. An algorithm for decimating sinusoidal model parameters from the
+ native 10ms rate to a 20ms rate has been developed. High quality
+ speech is maintained, some small differences are only audible
+ through headphones. A 20ms frame rate is required for 2400 bit/s
+ coding.
Current work areas:
Can I help?
-----------
-Maybe; check out the latest version of the
+Maybe; check out the the <<plan, Development Roadmap>> above and the latest
+version of the
http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/TODO.txt?view=log[TODO]
-list and the development roadmap above and see if there is anything
-that interests you.
-
-Not all of this project is DSP. If you can code in C there are lots
-of general processing tasks like refactoring and writing a command
-line soft phone application for testing the codec over a LAN.
-
-I will happily accept sponsorship for this project. For example
-research grants, or development contracts for companies interested in
-seeing an open source low bit rate speech codec. One interesting
-project would be funding a real time port to a single DSP/CPU chip.
+list and and see if there is anything that interests you.
+
+Not all of this project is DSP. There are many general C coding tasks
+like refactoring and writing a command line soft phone application for
+testing the codec over a LAN.
+
+I will happily accept *sponsorship* for this project. For example
+research grants, or development contracts from companies interested in
+seeing an open source low bit rate speech codec.
+
+You can also donate to the codec2 project via PayPal (which also
+allows credit card donations):
+
++++++++++++++++++++++++++++
+<form name="_xclick" action="https://www.paypal.com/cgi-bin/webscr" method="post">
+<input type="hidden" name="cmd" value="_xclick">
+<input type="hidden" name="business" value="david@rowetel.com">
+<input type="hidden" name="item_name" value="Codec2 donation">
+<input type="hidden" name="currency_code" value="USD">
+Donation in US$: <input name="amount" value="10.00">
+<input type="image" src="http://www.paypal.com/en_US/i/btn/btn_donate_LG.gif" border="0" name="submit" alt="Make payments with PayPal - it's fast, free and secure!">
+</form>
++++++++++++++++++++++++++++
[[patents]]
Is it Patent Free?
TODO for codec2
---------------
+29 July 2010, current status:
+- g729a is a bit better, but c2 is not put to shame
+- c2 betters GSM in some cases
+- c2 has a "brighter" sound compared to CELP ref codecs, more HF
+- main artifact is "clicky" and poor UV sounds
+- artefacts in bg noise in mmt1
+- lacks hoarseness of CELP ref codecs
+- pretty good for 2400 bit/s!
+
+Next Steps
+----------
+
+Would be nice to nail phase issues a bit better:
+
+- sync phase model not referenced to one onset
+- let phse track by keep phase relationship to onset constant to avoid
+ reverb
+- try removing every second harmonic
+- lower level for UV
+- pitch in UV sections - set to max Wo to best approx noise
+- start with orig phases, then subs in model for UV sounds
+
+Results
+-------
+
+1/
+
+- if we just replace phase of UV frames, it sounds great
+- but this probably is biassed towards voicing, ie low probability of error
+ on V frames
+- but it shows that when correct decn is made, we get good results
+- so perhaps trick is getting correctly detect UV
+
+2/ Drop every second harmonic during synthesis
+- this actually sounds pretty good
+- if you ignore spaceman sounds the voicing sounds great
+- not clicky!
+- less harmonics, less chance of forming a click
+- maybe we should look at tme doamin waveform
+- ears hairs can't rest in time? Something JVM said....
+
+3/ Can we look at phases of high amplitude harmonics?
+- maybe plot them as separate sine waves
+- see if high ampltude ones have similar phase
+- maybe in human speech production they are highly dispersed
+- c.f. with our phase model
+- female has more widely spaced harmonics
+- less that one per formant
+
+Blog post ideas
+---------------
+
++ Show how sum of sinusoids makes up speech. Show what it sounds like.
+
+
+After a day of listening and chasing my tail, conclusion is that 10ms frame,
+LSP quantised version is not to shabby. Thru speaker it's roughly comparable
+to g729a and Speex at 8kbit/s. Certainly not put to shame.
+
+So todo:
+ [ ] transparent 20ms frame conversion when fully quantised
+ [ ] faster voicing model
+ [ ] conversation test gadget, way to change codecs while we listen
+
+[ ] phase modelling
+ + is problem only due to very low pitch speakers (hts1, mmt1)?
+ + better chance to form a pulse
+ + could be not a problem at all for moderate pitch speakers
+ + ways to unfrom pulse for low pitch speakers
+ start with something that works:
+ + morig an hts1 and forig sound OK
+ + is it better to just release as is?
+ + still need some work on 20ms frames, but could do a 4kbit version soon
+ + need efficient voicing search
+
+[ ] Key Issues
+ [ ] LSP quantisation
+ [ ] decoupling of phase and LPC and voicing
+ + phase0 sounds OK, lpc10 sounds OK
+ + but together not so great
+
+[ ] LSP quantisation plan
+ + build an understanding of LSP quantisation
+ + using a series of experiments
+ + visualisation tools
+ + find out what matters perceptually
+ + then use this understanding to build quantisters
+ [ ] outlier thesis
+ + errors less than a certain amount don't matter
+ + quantiser overload errors have a big effect
+ + averaging SD over spectrum is wrong
+ + it's maximum deviation which counts
+ + one bad LSP can ruin an entire utterance
+ + maybe focus on just one LSP for a start
+ [ ] listen to some samples with LPC modelling
+ [ ] add RMS and overload distortion
+ [ ] view effect on Octave plots
+ [ ] come up with way of plotting cumulative errors over at utterance
+ [ ] come up with outlier objective model and distortion model
+ [ ] compare to perceptual response to averaged SD and outlier model
+ [ ] quantiser design
+ [ ] write code to visualise k-means 1-D training
+ [ ] write code to visualise quantiser errors
+ + SD plus scatter plots
+ [ ] try k++ algorithm
+ [ ] idea: linear differential quantiser
+ + single bad outlier can ruin a frames
+ + overload distortion is the biggest problem
+ + differentially quantise with fixed grid to ensure no
+ outliers
+
+[ ] decoupling of phase and lpc
+ + hook up a loud speaker as reference tranducer
+ + another way to generate dispersive phase model?
+ [ ] try various combinations
+ + uq, lpc10, phase0
+ + why drop in quality? Some coupling of models, zero phase?
+ + what the hell is dispersion - need to understand further
+ + come up with alternative model
+ + good understanding of phase and subj quality
+ + artificial signals to test, e.g. synth phase + real amps OR
+ real phase plus LPC smooted amps
+ + suggest LPC smoothing of amps or smooth phases causes clickiness
+
+[ ] 10/20ms frame rate
+ + why any quality drop with 20ms?
+ + what effect of different parametsr different update rate.
+ + is quantisation jumps between frames a big issue? How can
+ we test this? Maybe enforce smoothness
+
+ + measure frame-frame variations in amplitudes Am. Might give some feel for
+ allowable quant errors, perhaps some cripness is induced by slight variations.
+
+ + 10ms frames have very big overlap. No reason 20ms couldnt sound just as
+ good. Spectrum should change very little. Devise an experiment to tease
+ out key issues here. Maybe have extra bit to say if speech starts/ends in
+ one frame.
+
[ ] Important Open Issues
[X] Why zero phase model doesn't work for mmt1
+ error in LPC window and accidental random phase component and
+ this gets tricky with split codebooks
+ VQ could also be trained with perceptual distortion modelled
+ perhaps work on weighting function
- + try differential in time as well, voiced speech doesn't chnage much
+ + try differential in time as well, voiced speech doesn't change much
+ LPC envelope (mag spectrum) could possibly be quantised using some
other form of VQ, use LPC just to get a constant sampling rate. This
is a bit blue sky
--- /dev/null
+% load_raw.m
+% David Rowe 7 Oct 2009
+
+function s = load_raw(fn)
+ fs=fopen(fn,"rb");
+ s = fread(fs,Inf,"short");
+ plot(s)
+endfunction
% LSPs
- figure(1);
+ figure(3);
clg;
[x,y] = hist(lsp(:,1),100);
- plot(y,x,";1;");
+ plot(y*4000/pi,x,";1;");
hold on;
for i=2:c
[x,y] = hist(lsp(:,i),100);
legend = sprintf(";%d;",i);
- plot(y,x,legend);
+ plot(y*4000/pi,x,legend);
endfor
hold off;
grid;
% LSP differences
- figure(2);
+ figure(4);
clg;
subplot(211)
[x,y] = hist(lsp(:,1),100);
phi(1) = phi(1) + Wo*N;
phi(1) = mod(phi(1),2*pi);
- for m=2:L
+ for m=1:L
phi(m) = m*phi(1);
end
x = zeros(1,N);
for m=1:L
- x = x + A*cos(m*Wo*(0:(N-1)) + phi(m) + phi_off(m));
+ x = x + A*cos(m*Wo*(0:(N-1)) + phi(m));
endfor
s((f-1)*N+1:f*N) = x;
endfor
f=45;
model = load("../src/hts1a_model.txt");
+ phase = load("../src/hts1a_phase_phase.txt");
Wo = model(f,1);
P=2*pi/Wo;
L = model(f,2);
A = model(f,3:(L+2));
-
+ phi = phase(f,1:L);
phi = zeros(1,L);
- for m=1:L
- phi(m) = -m*Wo*0.3*rand(1,1)*L;
+ for m=L/2:L
+ phi(m) = 2*pi*rand(1,1);
end
+
s = zeros(1,N);
for m=1:L
- s = s + A(m)*cos(m*Wo*(0:(N-1)) + phi(m));
+ s_m = A(m)*cos(m*Wo*(0:(N-1)) + phi(m));
+ s = s + s_m;
endfor
figure(1);
do
figure(1);
clg;
+% s = [ Sn(2*(f-2)-1,:) Sn(2*(f-2),:) ];
s = [ Sn(2*f-1,:) Sn(2*f,:) ];
plot(s);
axis([1 length(s) -20000 20000]);
plot((1:L)*Wo*4000/pi, 20*log10(Am),";Am;");
axis([1 4000 -10 80]);
hold on;
+% plot((0:255)*4000/256, Sw(f-2,:),";Sw;");
plot((0:255)*4000/256, Sw(f,:),";Sw;");
if (file_in_path(".",modelq_name))
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
files=0
-items="Q-Quit "
+items="Q-Quit\n"
while [ ! -z "$1" ]
do
case "$1" in
*) files=`expr 1 + $files`;
new_file=$1;
file[$files]=$new_file;
- items="${items} ${files}-${new_file}";;
+ items="${items} ${files}-${new_file}\n";;
esac
shift
done
readchar=1
+echo -n -e "\r" $items"- "
while [ $readchar -ne 0 ]
do
- echo -n -e "\r" $items "- "
+ echo -n -e "\r -"
stty cbreak # or stty raw
readchar=`dd if=/dev/tty bs=1 count=1 2>/dev/null`
stty -cbreak