From: drowe67 Date: Wed, 11 Aug 2010 07:47:52 +0000 (+0000) Subject: experiments in phase peerception, rough TODO notes that need editing X-Git-Url: http://git.whiteaudio.com/gitweb/?a=commitdiff_plain;h=a08c0eb97edc28b165ab307422b6f0c3bb1cbd68;p=freetel-svn-tracking.git experiments in phase peerception, rough TODO notes that need editing git-svn-id: https://svn.code.sf.net/p/freetel/code@167 01035d8c-6547-0410-b346-abe4f91aad63 --- diff --git a/codec2/README.txt b/codec2/README.txt index 50c3ae88..4b17b17c 100644 --- a/codec2/README.txt +++ b/codec2/README.txt @@ -7,14 +7,16 @@ David Rowe, VK5DGR Introduction ------------ -Codec2 is a open source low bit rate speech codec designed for -communications quality speech at around 2400 kbit/s. Applications +Codec2 is an open source low bit rate speech codec designed for +communications quality speech at around 2400 bit/s. Applications include low bandwidth HF/VHF digital radio. It fills a gap in open source, free-as-in-speech voice codecs beneath 5000 bit/s. The motivations behind the project are summarised in this link:/blog/?p=128[blog post]. +You can help and support Codec2 development via a <>. + [[status]] Status ------ @@ -29,21 +31,30 @@ Progress to date: modelling and quantisation options - these are controlled via command line switches. -2. LPC modelling working nicely and there is a first pass LSP vector - quantiser working at 32 bits/frame with acceptable voice quality. - Lots could be done to improve this (e.g. improved quality and - reduced bit rate). +2. LPC modelling is working nicely and there is a first pass LSP + vector quantiser working at 32 bits/frame with acceptable voice + quality. Lots could be done to improve this (e.g. improved quality + and reduced bit rate). + +3. A phase model has been developed that uses 0 bits for phase and 1 + bit/frame for voiced/unvoiced decision but delivers high quality + speech. This works suspiciously well - codecs with a single bit + voiced/unvoiced decision aren't meant to sound this good. Usually + mixed voicing at several bits/frame is required. -3. Phase model developed that uses 0 bits for phase and 1 bit/frame - for voiced/unvoiced decision. An experimental post filter has been - developed to improve performance for speech with background noise. +4. An experimental post filter has been developed to improve + performance with speech mixed with background noise. The post + filter delivers many of the advantages of mixed voicing but unlike + mixed voicing requires zero bits. -4. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple +5. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple pitch tracker has been developed to help with some problem frames. -5. An algorithm for decimating sinusoidal model parameters from the - native 10ms rate to a 20ms rate has been developed. A 20ms frame - rate is required for 2400 bit/s coding. +6. An algorithm for decimating sinusoidal model parameters from the + native 10ms rate to a 20ms rate has been developed. High quality + speech is maintained, some small differences are only audible + through headphones. A 20ms frame rate is required for 2400 bit/s + coding. Current work areas: @@ -213,19 +224,32 @@ The tough bits of this project are: Can I help? ----------- -Maybe; check out the latest version of the +Maybe; check out the the <> above and the latest +version of the http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/TODO.txt?view=log[TODO] -list and the development roadmap above and see if there is anything -that interests you. - -Not all of this project is DSP. If you can code in C there are lots -of general processing tasks like refactoring and writing a command -line soft phone application for testing the codec over a LAN. - -I will happily accept sponsorship for this project. For example -research grants, or development contracts for companies interested in -seeing an open source low bit rate speech codec. One interesting -project would be funding a real time port to a single DSP/CPU chip. +list and and see if there is anything that interests you. + +Not all of this project is DSP. There are many general C coding tasks +like refactoring and writing a command line soft phone application for +testing the codec over a LAN. + +I will happily accept *sponsorship* for this project. For example +research grants, or development contracts from companies interested in +seeing an open source low bit rate speech codec. + +You can also donate to the codec2 project via PayPal (which also +allows credit card donations): + ++++++++++++++++++++++++++++ +
+ + + + +Donation in US$: + +
++++++++++++++++++++++++++++ [[patents]] Is it Patent Free? diff --git a/codec2/TODO.txt b/codec2/TODO.txt index 3dc705d1..dac0925d 100644 --- a/codec2/TODO.txt +++ b/codec2/TODO.txt @@ -1,6 +1,144 @@ TODO for codec2 --------------- +29 July 2010, current status: +- g729a is a bit better, but c2 is not put to shame +- c2 betters GSM in some cases +- c2 has a "brighter" sound compared to CELP ref codecs, more HF +- main artifact is "clicky" and poor UV sounds +- artefacts in bg noise in mmt1 +- lacks hoarseness of CELP ref codecs +- pretty good for 2400 bit/s! + +Next Steps +---------- + +Would be nice to nail phase issues a bit better: + +- sync phase model not referenced to one onset +- let phse track by keep phase relationship to onset constant to avoid + reverb +- try removing every second harmonic +- lower level for UV +- pitch in UV sections - set to max Wo to best approx noise +- start with orig phases, then subs in model for UV sounds + +Results +------- + +1/ + +- if we just replace phase of UV frames, it sounds great +- but this probably is biassed towards voicing, ie low probability of error + on V frames +- but it shows that when correct decn is made, we get good results +- so perhaps trick is getting correctly detect UV + +2/ Drop every second harmonic during synthesis +- this actually sounds pretty good +- if you ignore spaceman sounds the voicing sounds great +- not clicky! +- less harmonics, less chance of forming a click +- maybe we should look at tme doamin waveform +- ears hairs can't rest in time? Something JVM said.... + +3/ Can we look at phases of high amplitude harmonics? +- maybe plot them as separate sine waves +- see if high ampltude ones have similar phase +- maybe in human speech production they are highly dispersed +- c.f. with our phase model +- female has more widely spaced harmonics +- less that one per formant + +Blog post ideas +--------------- + ++ Show how sum of sinusoids makes up speech. Show what it sounds like. + + +After a day of listening and chasing my tail, conclusion is that 10ms frame, +LSP quantised version is not to shabby. Thru speaker it's roughly comparable +to g729a and Speex at 8kbit/s. Certainly not put to shame. + +So todo: + [ ] transparent 20ms frame conversion when fully quantised + [ ] faster voicing model + [ ] conversation test gadget, way to change codecs while we listen + +[ ] phase modelling + + is problem only due to very low pitch speakers (hts1, mmt1)? + + better chance to form a pulse + + could be not a problem at all for moderate pitch speakers + + ways to unfrom pulse for low pitch speakers + start with something that works: + + morig an hts1 and forig sound OK + + is it better to just release as is? + + still need some work on 20ms frames, but could do a 4kbit version soon + + need efficient voicing search + +[ ] Key Issues + [ ] LSP quantisation + [ ] decoupling of phase and LPC and voicing + + phase0 sounds OK, lpc10 sounds OK + + but together not so great + +[ ] LSP quantisation plan + + build an understanding of LSP quantisation + + using a series of experiments + + visualisation tools + + find out what matters perceptually + + then use this understanding to build quantisters + [ ] outlier thesis + + errors less than a certain amount don't matter + + quantiser overload errors have a big effect + + averaging SD over spectrum is wrong + + it's maximum deviation which counts + + one bad LSP can ruin an entire utterance + + maybe focus on just one LSP for a start + [ ] listen to some samples with LPC modelling + [ ] add RMS and overload distortion + [ ] view effect on Octave plots + [ ] come up with way of plotting cumulative errors over at utterance + [ ] come up with outlier objective model and distortion model + [ ] compare to perceptual response to averaged SD and outlier model + [ ] quantiser design + [ ] write code to visualise k-means 1-D training + [ ] write code to visualise quantiser errors + + SD plus scatter plots + [ ] try k++ algorithm + [ ] idea: linear differential quantiser + + single bad outlier can ruin a frames + + overload distortion is the biggest problem + + differentially quantise with fixed grid to ensure no + outliers + +[ ] decoupling of phase and lpc + + hook up a loud speaker as reference tranducer + + another way to generate dispersive phase model? + [ ] try various combinations + + uq, lpc10, phase0 + + why drop in quality? Some coupling of models, zero phase? + + what the hell is dispersion - need to understand further + + come up with alternative model + + good understanding of phase and subj quality + + artificial signals to test, e.g. synth phase + real amps OR + real phase plus LPC smooted amps + + suggest LPC smoothing of amps or smooth phases causes clickiness + +[ ] 10/20ms frame rate + + why any quality drop with 20ms? + + what effect of different parametsr different update rate. + + is quantisation jumps between frames a big issue? How can + we test this? Maybe enforce smoothness + + + measure frame-frame variations in amplitudes Am. Might give some feel for + allowable quant errors, perhaps some cripness is induced by slight variations. + + + 10ms frames have very big overlap. No reason 20ms couldnt sound just as + good. Spectrum should change very little. Devise an experiment to tease + out key issues here. Maybe have extra bit to say if speech starts/ends in + one frame. + [ ] Important Open Issues [X] Why zero phase model doesn't work for mmt1 + error in LPC window and accidental random phase component and @@ -64,7 +202,7 @@ TODO for codec2 + this gets tricky with split codebooks + VQ could also be trained with perceptual distortion modelled + perhaps work on weighting function - + try differential in time as well, voiced speech doesn't chnage much + + try differential in time as well, voiced speech doesn't change much + LPC envelope (mag spectrum) could possibly be quantised using some other form of VQ, use LPC just to get a constant sampling rate. This is a bit blue sky diff --git a/codec2/octave/load_raw.m b/codec2/octave/load_raw.m new file mode 100644 index 00000000..1f7868d4 --- /dev/null +++ b/codec2/octave/load_raw.m @@ -0,0 +1,8 @@ +% load_raw.m +% David Rowe 7 Oct 2009 + +function s = load_raw(fn) + fs=fopen(fn,"rb"); + s = fread(fs,Inf,"short"); + plot(s) +endfunction diff --git a/codec2/octave/lsp_pdf.m b/codec2/octave/lsp_pdf.m index a5a1b9a6..990abc8b 100644 --- a/codec2/octave/lsp_pdf.m +++ b/codec2/octave/lsp_pdf.m @@ -7,22 +7,22 @@ function lsp_pdf(lsp) % LSPs - figure(1); + figure(3); clg; [x,y] = hist(lsp(:,1),100); - plot(y,x,";1;"); + plot(y*4000/pi,x,";1;"); hold on; for i=2:c [x,y] = hist(lsp(:,i),100); legend = sprintf(";%d;",i); - plot(y,x,legend); + plot(y*4000/pi,x,legend); endfor hold off; grid; % LSP differences - figure(2); + figure(4); clg; subplot(211) [x,y] = hist(lsp(:,1),100); diff --git a/codec2/octave/phase.m b/codec2/octave/phase.m index d9bab906..7a2bc626 100644 --- a/codec2/octave/phase.m +++ b/codec2/octave/phase.m @@ -21,13 +21,13 @@ function phase(samname, F0, png) phi(1) = phi(1) + Wo*N; phi(1) = mod(phi(1),2*pi); - for m=2:L + for m=1:L phi(m) = m*phi(1); end x = zeros(1,N); for m=1:L - x = x + A*cos(m*Wo*(0:(N-1)) + phi(m) + phi_off(m)); + x = x + A*cos(m*Wo*(0:(N-1)) + phi(m)); endfor s((f-1)*N+1:f*N) = x; endfor diff --git a/codec2/octave/phase2.m b/codec2/octave/phase2.m index 5539bf13..c28b104b 100644 --- a/codec2/octave/phase2.m +++ b/codec2/octave/phase2.m @@ -8,19 +8,22 @@ function phase2(samname, png) f=45; model = load("../src/hts1a_model.txt"); + phase = load("../src/hts1a_phase_phase.txt"); Wo = model(f,1); P=2*pi/Wo; L = model(f,2); A = model(f,3:(L+2)); - + phi = phase(f,1:L); phi = zeros(1,L); - for m=1:L - phi(m) = -m*Wo*0.3*rand(1,1)*L; + for m=L/2:L + phi(m) = 2*pi*rand(1,1); end + s = zeros(1,N); for m=1:L - s = s + A(m)*cos(m*Wo*(0:(N-1)) + phi(m)); + s_m = A(m)*cos(m*Wo*(0:(N-1)) + phi(m)); + s = s + s_m; endfor figure(1); diff --git a/codec2/octave/plamp.m b/codec2/octave/plamp.m index 44bd98af..a82a8412 100644 --- a/codec2/octave/plamp.m +++ b/codec2/octave/plamp.m @@ -54,6 +54,7 @@ function plamp(samname, f) do figure(1); clg; +% s = [ Sn(2*(f-2)-1,:) Sn(2*(f-2),:) ]; s = [ Sn(2*f-1,:) Sn(2*f,:) ]; plot(s); axis([1 length(s) -20000 20000]); @@ -65,6 +66,7 @@ function plamp(samname, f) plot((1:L)*Wo*4000/pi, 20*log10(Am),";Am;"); axis([1 4000 -10 80]); hold on; +% plot((0:255)*4000/256, Sw(f-2,:),";Sw;"); plot((0:255)*4000/256, Sw(f,:),";Sw;"); if (file_in_path(".",modelq_name)) diff --git a/codec2/script/menu.sh b/codec2/script/menu.sh index 48ec3b33..75072841 100755 --- a/codec2/script/menu.sh +++ b/codec2/script/menu.sh @@ -39,7 +39,7 @@ # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. files=0 -items="Q-Quit " +items="Q-Quit\n" while [ ! -z "$1" ] do case "$1" in @@ -47,15 +47,16 @@ do *) files=`expr 1 + $files`; new_file=$1; file[$files]=$new_file; - items="${items} ${files}-${new_file}";; + items="${items} ${files}-${new_file}\n";; esac shift done readchar=1 +echo -n -e "\r" $items"- " while [ $readchar -ne 0 ] do - echo -n -e "\r" $items "- " + echo -n -e "\r -" stty cbreak # or stty raw readchar=`dd if=/dev/tty bs=1 count=1 2>/dev/null` stty -cbreak