From: drowe67 Date: Tue, 24 Aug 2010 09:19:35 +0000 (+0000) Subject: new wave file samples and script mods X-Git-Url: http://git.whiteaudio.com/gitweb/?a=commitdiff_plain;h=10d443dd28d2cde8bdecf41388d40a23cf410ec1;p=freetel-svn-tracking.git new wave file samples and script mods git-svn-id: https://svn.code.sf.net/p/freetel/code@187 01035d8c-6547-0410-b346-abe4f91aad63 --- diff --git a/codec2/TODO.txt b/codec2/TODO.txt deleted file mode 100644 index fd5e2223..00000000 --- a/codec2/TODO.txt +++ /dev/null @@ -1,246 +0,0 @@ -TODO for codec2 ---------------- - -Mysteries ---------- - -[ ] buzzy sound in bg noise ofr b0067 -[ ] LSP quantiser has problems for b0067 -[ ] synthetic sound for hts2a and to bg nouse on _uq models - + breakdown in sinusoidal model for noise - + MELP does better - + perhaps synth UV using white noise might work better - -29 July 2010, current status: -- g729a is a bit better, but c2 is not put to shame -- c2 betters GSM in some cases -- c2 has a "brighter" sound compared to CELP ref codecs, more HF -- main artifact is "clicky" and poor UV sounds -- artefacts in bg noise in mmt1 -- lacks hoarseness of CELP ref codecs -- pretty good for 2400 bit/s! - -Next Steps ----------- - -Would be nice to nail phase issues a bit better: - -- sync phase model not referenced to one onset -- let phse track by keep phase relationship to onset constant to avoid - reverb -- try removing every second harmonic -- lower level for UV -- pitch in UV sections - set to max Wo to best approx noise -- start with orig phases, then subs in model for UV sounds - -Results -------- - -1/ - -- if we just replace phase of UV frames, it sounds great -- but this probably is biassed towards voicing, ie low probability of error - on V frames -- but it shows that when correct decn is made, we get good results -- so perhaps trick is getting correctly detect UV - -2/ Drop every second harmonic during synthesis -- this actually sounds pretty good -- if you ignore spaceman sounds the voicing sounds great -- not clicky! -- less harmonics, less chance of forming a click -- maybe we should look at tme doamin waveform -- ears hairs can't rest in time? Something JVM said.... - -3/ Can we look at phases of high amplitude harmonics? -- maybe plot them as separate sine waves -- see if high ampltude ones have similar phase -- maybe in human speech production they are highly dispersed -- c.f. with our phase model -- female has more widely spaced harmonics -- less that one per formant - -Blog post ideas ---------------- - -+ Show how sum of sinusoids makes up speech. Show what it sounds like. - - -After a day of listening and chasing my tail, conclusion is that 10ms frame, -LSP quantised version is not to shabby. Thru speaker it's roughly comparable -to g729a and Speex at 8kbit/s. Certainly not put to shame. - -So todo: - [ ] transparent 20ms frame conversion when fully quantised - [ ] faster voicing model - [ ] conversation test gadget, way to change codecs while we listen - -[ ] phase modelling - + is problem only due to very low pitch speakers (hts1, mmt1)? - + better chance to form a pulse - + could be not a problem at all for moderate pitch speakers - + ways to unfrom pulse for low pitch speakers - start with something that works: - + morig an hts1 and forig sound OK - + is it better to just release as is? - + still need some work on 20ms frames, but could do a 4kbit version soon - + need efficient voicing search - -[ ] Key Issues - [ ] LSP quantisation - [ ] decoupling of phase and LPC and voicing - + phase0 sounds OK, lpc10 sounds OK - + but together not so great - -[ ] LSP quantisation plan - + build an understanding of LSP quantisation - + using a series of experiments - + visualisation tools - + find out what matters perceptually - + then use this understanding to build quantisters - [ ] outlier thesis - + errors less than a certain amount don't matter - + quantiser overload errors have a big effect - + averaging SD over spectrum is wrong - + it's maximum deviation which counts - + one bad LSP can ruin an entire utterance - + maybe focus on just one LSP for a start - [ ] listen to some samples with LPC modelling - [ ] add RMS and overload distortion - [ ] view effect on Octave plots - [ ] come up with way of plotting cumulative errors over at utterance - [ ] come up with outlier objective model and distortion model - [ ] compare to perceptual response to averaged SD and outlier model - [ ] quantiser design - [ ] write code to visualise k-means 1-D training - [ ] write code to visualise quantiser errors - + SD plus scatter plots - [ ] try k++ algorithm - [ ] idea: linear differential quantiser - + single bad outlier can ruin a frames - + overload distortion is the biggest problem - + differentially quantise with fixed grid to ensure no - outliers - -[ ] decoupling of phase and lpc - + hook up a loud speaker as reference tranducer - + another way to generate dispersive phase model? - [ ] try various combinations - + uq, lpc10, phase0 - + why drop in quality? Some coupling of models, zero phase? - + what the hell is dispersion - need to understand further - + come up with alternative model - + good understanding of phase and subj quality - + artificial signals to test, e.g. synth phase + real amps OR - real phase plus LPC smooted amps - + suggest LPC smoothing of amps or smooth phases causes clickiness - -[ ] 10/20ms frame rate - + why any quality drop with 20ms? - + what effect of different parametsr different update rate. - + is quantisation jumps between frames a big issue? How can - we test this? Maybe enforce smoothness - - + measure frame-frame variations in amplitudes Am. Might give some feel for - allowable quant errors, perhaps some cripness is induced by slight variations. - - + 10ms frames have very big overlap. No reason 20ms couldnt sound just as - good. Spectrum should change very little. Devise an experiment to tease - out key issues here. Maybe have extra bit to say if speech starts/ends in - one frame. - -[ ] Important Open Issues - [X] Why zero phase model doesn't work for mmt1 - + error in LPC window and accidental random phase component and - effect of background noise on interformant harmonics - [X] residual noise on zero phase model - + "navy" on hts1a, "frog" on morig - + perhaps due to mis-alignment of phases at frame boundaries? - + or possibly pitch errors - + need a way to research this - [X] Pitch errors on mmt1, morig - + we may need a tracker - + suggest manually step through each file, check for - pitch errors, design tracker and check again - + or develop manual pitch tracks and check estimator - with tracker against these. - [X] removal of LPC modelling errors for males - + first few haromic energies (e.g. mmt1, hts1a) get raised - + added a single bit to compensate for these errors, works OK - [X] good quality LSP quantisation of {Am} - + first pass at this, lots of futher work ideas below... - [ ] conversion to 20ms frames - + without significant distortion - -[ ] Planned tasks and half formed ideas for codec2..... - [X] Convert files from DOS to Unix - [X] Get sinenc and sinedec to build and run under gcc - [ ] refactor - [ ] each source file has it's own header - [ ] no globals - [ ] Consistent file headers - [X] GPL2 notice in each file - [ ] Replace Numerical Recipes in C (NRC) four1.c and four1.h with Gnu - Science Lib (GSL) SL FFT as NRC code has restrictive licencing - [X] A way to handle m=1 harmonic for males when LPC modelling - + used a single bit to correct LPC modelling errors - [ ] Is BW expansion and Rk noise floor required before LSP quant? - + initial tests lead to large LPC modelling errors - [ ] Go through papers referenced in thesis and credit various - techniques to papers. - + sure there was something about zero phase synthesis is those papers - [ ] voicing errors can be clearly seen in synthesised speech using pl2.m - [ ] Voicing improvement idea - + voicing tracker, if enery about the same in frame n-1,n,n+1, and n-1, n+1 - are voiced then make frame n voiced. - - [ ] mmt1 with zero phase model - + male with lots of background noise - + zero order synth model sounds more "clicky" on mmt1, this suggests less - dispersion - + could be due to IRS filtering or background noise - + exploring this could lead to better understanding on phase, e.g. add - noise to clean samples and explore effect on zero phase model - + wrote plphase.m to start analysing this - - [ ] LSP quantisation further work - + need codebook search to match perceptual error - + for example error in close LSPs has a large perceptual effect - + PDF-optimised quantisation may not be ideal, check this assumption - + test split VQ to make sure no silly errors - + for example test MSE or index histogram for training data - + this gets tricky with split codebooks - + VQ could also be trained with perceptual distortion modelled - + perhaps work on weighting function - + try differential in time as well, voiced speech doesn't change much - + LPC envelope (mag spectrum) could possibly be quantised using some - other form of VQ, use LPC just to get a constant sampling rate. This - is a bit blue sky - - [ ] Blue Sky {Am} Spectral Envelope quantisation ideas #1 - + what is really important is location of formants - + therefore how about we just locate peak of each format - and encode position and height, then interpolate between them - + could perhaps locate anti-formants as half way and just encode - depth - + interesting factoid: a gentle HP of LP (3-6dB/oct) wont affect - subjective quality much, but would have a big impact on - quantisation error measures. Maybe remove/normalise out - before before quantisation? Could try quantisation - with/without HP filter. - + envelope could be time domain LPC or some other interpolation - technique (see #2 below) - - [ ] Blue Sky {Am} Spectral Envelop quantisation ideas #2 - + encode spectral envelope directly - + problem is variable number of harmonics L - + use a pitch sync DFT, ie DFT of one pitch cycle to get continuous - envelope without harmonic pitch structure - + then over sample this DFT to up/down sample to an appropriate rate - + we effectively already have pitch-sync DFT in {Am} - + they could be interpolated at a non-uniform rate, like a bark scale - + can we somehow seperately encode shape and height? - - - - diff --git a/codec2/script/raw2wav.sh b/codec2/script/raw2wav.sh index 010a1b9d..a05efb72 100755 --- a/codec2/script/raw2wav.sh +++ b/codec2/script/raw2wav.sh @@ -1,3 +1,3 @@ #!/bin/sh # Converts 16 bit signed short 8 kHz raw (headerless) files to wave -sox -t raw -r 8000 -s -w -c 1 $1 $2 +sox -r 8000 -s -2 $1 $2 diff --git a/codec2/wav/hts1a_g729a.wav b/codec2/wav/hts1a_g729a.wav new file mode 100644 index 00000000..f475775b Binary files /dev/null and b/codec2/wav/hts1a_g729a.wav differ diff --git a/codec2/wav/hts2a_g729a.wav b/codec2/wav/hts2a_g729a.wav new file mode 100644 index 00000000..7709a1cd Binary files /dev/null and b/codec2/wav/hts2a_g729a.wav differ