From 30ccdf0ecf93f2f202bf9d176a9b7c455fa5cb0e Mon Sep 17 00:00:00 2001
From: drowe67 <drowe67@01035d8c-6547-0410-b346-abe4f91aad63>
Date: Fri, 9 Oct 2009 02:33:18 +0000
Subject: [PATCH] updated doco and todo

git-svn-id: https://svn.code.sf.net/p/freetel/code@75 01035d8c-6547-0410-b346-abe4f91aad63
---
 codec2/README.txt | 99 ++++++++++++++++++++++++++++++++++-------------
 codec2/TODO.txt   | 60 +++++++++++++++++++++++-----
 2 files changed, 123 insertions(+), 36 deletions(-)

diff --git a/codec2/README.txt b/codec2/README.txt
index a756cf51..50c3ae88 100644
--- a/codec2/README.txt
+++ b/codec2/README.txt
@@ -20,22 +20,39 @@ Status
 ------
 
 Still in experimental/development stage - no 2400 bit/s codec
-available yet.  Progress to date:
+available yet, but we are getting very close!  
+
+Progress to date:
 
 1. Unquantised encoder (sinenc) and decoder (sinedec) running under
-   Linux/gcc, pitch estimator untested.  The decoder (sinedec) is a
-   test-bed for various modelling and quantisation options - these are
-   controlled via command line switches.
+   Linux/gcc.  The decoder (sinedec) is a test-bed for various
+   modelling and quantisation options - these are controlled via
+   command line switches.
 
-2. LPC modelling working and first pass LSP vector quantiser working
-   at 37 bits/frame with acceptable voice quality.  Lots could be done
-   to improve this.
+2. LPC modelling working nicely and there is a first pass LSP vector
+   quantiser working at 32 bits/frame with acceptable voice quality.
+   Lots could be done to improve this (e.g. improved quality and
+   reduced bit rate).
 
 3. Phase model developed that uses 0 bits for phase and 1 bit/frame
-   for voiced/unvoiced decision.
+   for voiced/unvoiced decision.  An experimental post filter has been
+   developed to improve performance for speech with background noise.
+
+4. Non-Linear Pitch (NLP) pitch estimator working OK, and a simple
+   pitch tracker has been developed to help with some problem frames.
+
+5. An algorithm for decimating sinusoidal model parameters from the
+   native 10ms rate to a 20ms rate has been developed.  A 20ms frame
+   rate is required for 2400 bit/s coding.
+
+Current work areas:
+
+1. Reduce CPU Load of the first order phase model fit.  Apart from
+   that the codec runs very quickly.
 
-4. Non-Linear Pitch (NLP) pitch estimator working OK, could use a pitch
-   tracker to improve a few problem frames.
+2. Write separate encoder and decoder functions and demo programs for
+   the alpha 2400 bit/s codec.  Currently most of the processing
+   happens inside the sinedec program.
 
 [[source]]
 The Source Code
@@ -76,11 +93,14 @@ Development Roadmap
 
   [X] Milestone 0 - Project kick off
   [X] Milestone 1 - Alpha 2400 bits/s codec
-      [X] Spectral amplitudes modelled and quantised 
+      [X] Spectral amplitudes modelled using LPC
       [X] Phase and voicing model developed
-      [ ] Pitch estimator
-      [ ] Frame rate/quantisation schemes for 2400 bit/s developed
+      [X] Pitch estimator
+      [X] Spectral amplitudes quantised using LSPs
+      [X] Decimation of model parameters from 20ms to 10ms
       [ ] Refactor to develop a seperate encoder/decoder functions
+      [ ] Complete 2400 bits/s codec demonstrated
+      [ ] Reduced complexity voicing estimator
       [ ] Test phone call over LAN
       [ ] Release 0.1 for Alpha Testing
   [ ] Milestone 2 - Beta codec for digital radio
@@ -137,6 +157,25 @@ As the parameters are quantised to a low bit rate and sent over the
 channel, the speech quality drops.  The challenge is to achieve a
 reasonable trade off between speech quality and bit rate.
 
+Bit Allocation
+--------------
+
+[grid="all"]
+`-------------------------------.----------
+Parameter                       bits/frame
+-------------------------------------------
+Spectral magnitudes (LSPs)	32
+Low frequency LPC correction     1
+Energy                           5
+Voicing                          1
+Fundamental Frequency (Wo)       7
+Spare                            2
+-------------------------------------------
+Total                           48 
+-------------------------------------------
+
+At a 20ms update rate 48 bits/frame is 2400 bits/s.
+
 [[challenges]]
 Challenges
 ----------
@@ -179,6 +218,10 @@ http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/TODO.txt?view=log[TODO]
 list and the development roadmap above and see if there is anything
 that interests you.
 
+Not all of this project is DSP.  If you can code in C there are lots
+of general processing tasks like refactoring and writing a command
+line soft phone application for testing the codec over a LAN.
+
 I will happily accept sponsorship for this project.  For example
 research grants, or development contracts for companies interested in
 seeing an open source low bit rate speech codec.  One interesting
@@ -195,13 +238,12 @@ patents used by proprietary 2400 bit/s codecs (MELP and xMBE) and
 compare.
 
 Proprietary codecs typically have small, novel parts of the algorithm
-protected by patents.  However the designers of these codecs rely
-heavily on large bodies of existing, public domain work.  The patents
-cover perhaps 5% of the codec algorithms.  Proprietary codec designers
-did not invent most of the algorithms they use in their
-codec. Typically, the patents just cover enough to make designing an
-interoperable codec very difficult.  These also tend to be the parts
-that make their codecs sound good.
+protected by patents.  However proprietary codecs also rely heavily on
+large bodies of public domain work.  The patents cover perhaps 5% of
+the codec algorithms.  Proprietary codec designers did not invent most
+of the algorithms they use in their codec. Typically, the patents just
+cover enough to make designing an interoperable codec very difficult.
+These also tend to be the parts that make their codecs sound good.
 
 However there are many ways to make a codec sound good, so we simply
 need to choose and develop other methods.
@@ -276,9 +318,9 @@ i.e. fundamental) harmonics for males. The amplitude of the m=1
 harmonic is raised by as much as 30dB after LPC modelling as (I think)
 LPC spectra must have zero derivative at DC.  This means it's poor at
 modelling very low freq harmonics which unfortunately the ear is very
-sensitive to.  Consider automatic lowering for 20dB of this harmonic
-or maybe a few extra bits to quantise error.  Or maybe just don't
-synthesise anything beneath 200Hz.
+sensitive to.  To correct this an extra bit has been added to correct
+LPC modelling errors on the first harmonic.  When set this bit
+instructs the decoder to attenuate the LPC modelled harmonic by 30dB.
 
 [[phase]]
 Phase Modelling Notes
@@ -291,7 +333,9 @@ http://freetel.svn.sourceforge.net/viewvc/freetel/codec2/src/phase.c?view=log[ph
 
 The zero phase model required just one voicing bit to be transmitted
 to the decoder, all other phase information is synthesised use a rule
-based model.  It seems to work OK for most speech samples.
+based model.  It seems to work OK for most speech samples, but adds a
+"clicky" artcifact to some low picthed speakers.  Also see the blog
+posts below for more discussion of phase models.
 
 To determine voicing we attempt to fit a first order phase model, then
 measure SNR of the fit.  The frame is declared unvoiced if the SNR is
@@ -404,8 +448,11 @@ References
     Codec Part 1 - Introduction]
     
 [4] http://www.rowetel.com/blog/?p=130[Open Source Low rate Speech
-    Codec Part 1 - Spectral Magnitudes]
+    Codec Part 2 - Spectral Magnitudes]
     
 [5] http://www.rowetel.com/blog/?p=131[Open Source Low rate Speech
-    Codec Part 2 - Phase and Male Speech]
+    Codec Part 3 - Phase and Male Speech]
+
+[6] http://www.rowetel.com/blog/?p=132[Open Source Low rate Speech
+    Codec Part 4 - Zero Phase Model]
 
diff --git a/codec2/TODO.txt b/codec2/TODO.txt
index 3a55dfc0..3dc705d1 100644
--- a/codec2/TODO.txt
+++ b/codec2/TODO.txt
@@ -17,8 +17,10 @@ TODO for codec2
         + or develop manual pitch tracks and check estimator
           with tracker against these.
     [X] removal of LPC modelling errors for males
-        + first few haromic energies (e.g. mmt1, hts1a)  get raised
-    [ ] good quality LSP quantisation of {Am}
+        + first few haromic energies (e.g. mmt1, hts1a) get raised
+        + added a single bit to compensate for these errors, works OK
+    [X] good quality LSP quantisation of {Am}
+        + first pass at this, lots of futher work ideas below...
     [ ] conversion to 20ms frames
         + without significant distortion
 
@@ -32,18 +34,15 @@ TODO for codec2
         [X] GPL2 notice in each file
     [ ] Replace Numerical Recipes in C (NRC) four1.c and four1.h with Gnu
         Science Lib (GSL) SL FFT as NRC code has restrictive licencing
-    [ ] A way to handle m=1 harmonic for males when LPC modelling
-    [ ] Is BW expansion and Rk noise floor required before LSP quant
-    [ ] test split VQ to make sure no silly errors
-        + for example test MSE or index historgram for training data
-
+    [X] A way to handle m=1 harmonic for males when LPC modelling
+        + used a single bit to correct LPC modelling errors
+    [ ] Is BW expansion and Rk noise floor required before LSP quant?
+        + initial tests lead to large LPC modelling errors
     [ ] Go through papers referenced in thesis and credit various
         techniques to papers.
       + sure there was something about zero phase synthesis is those papers
-
     [ ] voicing errors can be clearly seen in synthesised speech using pl2.m
-
-    [ ] Voicing improvement
+    [ ] Voicing improvement idea
         + voicing tracker, if enery about the same in frame n-1,n,n+1, and n-1, n+1
           are voiced then make frame n voiced.
 
@@ -56,3 +55,44 @@ TODO for codec2
           noise to clean samples and explore effect on zero phase model
         + wrote plphase.m to start analysing this
 
+    [ ] LSP quantisation further work
+        + need codebook search to match perceptual error
+        + for example error in close LSPs has a large perceptual effect
+        + PDF-optimised quantisation may not be ideal, check this assumption
+        + test split VQ to make sure no silly errors
+        + for example test MSE or index histogram for training data
+        + this gets tricky with split codebooks
+        + VQ could also be trained with perceptual distortion modelled
+        + perhaps work on weighting function
+        + try differential in time as well, voiced speech doesn't chnage much
+        + LPC envelope (mag spectrum) could possibly be quantised using some
+	  other form of VQ, use LPC just to get a constant sampling rate.  This
+          is a bit blue sky
+
+    [ ] Blue Sky {Am} Spectral Envelope quantisation ideas #1
+        + what is really important is location of formants
+        + therefore how about we just locate peak of each format
+          and encode position and height, then interpolate between them
+        + could perhaps locate anti-formants as half way and just encode
+	  depth
+        + interesting factoid: a gentle HP of LP (3-6dB/oct) wont affect
+          subjective quality much, but would have a big impact on
+          quantisation error measures.  Maybe remove/normalise out
+          before before quantisation?  Could try quantisation
+          with/without HP filter.  
+        + envelope could be time domain LPC or some other interpolation
+	  technique (see #2 below)
+
+    [ ] Blue Sky {Am} Spectral Envelop quantisation ideas #2
+        + encode spectral envelope directly
+	+ problem is variable number of harmonics L
+        + use a pitch sync DFT, ie DFT of one pitch cycle to get continuous
+	  envelope without harmonic pitch structure
+        + then over sample this DFT to up/down sample to an appropriate rate
+	+ we effectively already have pitch-sync DFT in {Am}
+        + they could be interpolated at a non-uniform rate, like a bark scale
+        + can we somehow seperately encode shape and height?
+
+
+
+ 
\ No newline at end of file
-- 
2.25.1