F I N D L I N E 
       
                                   by SÖren Molander
                              soren.molander@bredband.net
                              http://hem.bredband.net/sormol
                              Copyright Sören Molander, 2002


An experimental program for spectral line fitting, developed for supernova spectra for Supernova Group 
at Stockholm University. The following people have helped in developing the program: 

- Gabriele Garavini (ideas, criticism)
- Ariel Goobar      (ideas and support)
- Robert Davies     (The matrix library)

----------------------------------------------------------------------------------------------------------
1.0 Introduction
----------------
1.1 Purpose
This C++/Linux software contains algorithms to analyze min-max of spectra and perform automatic 
spectral-line fits and give quality assessments of the fits. It also fits a Planck-curve to the data.
When compiled it yields a single executable program "findline". It is best run using a script, 
which you will find in the distribution. The programs should be compilable under windows but scripts would
have to be replaced with a wrapper of some kind, since windows is notorious for its lack of support for
automation. 


To be useful the "gnuplot" program should be used to inspect the results (see www.rpmfind.net)
The distributions contains two parts: 

- A nifty matrix library, newmat10-src (by Robert Davies, http://webnz.com/robert/. Works under windows). 
- The application code (by me). 

Tested environments: Linux (SuSE and RedHat 8.x, 9.x) and Cygwin. Compiles under windows (without script support),
if you have any problems, let me know. 

1.2 
To install: 
1. Edit the file "make.top" and enter the home directory for the code. Note that it has to be
   an absolute path!
2. Type "make all" at this level. This will create the directories lib and bin, 
   compile and copy the matrix libraries, and finally compile the main code
3. If you have doxygen, type "make docs". This will create a html browsable code documentation
   in "doc/html"
4. "make clean" removes the html doc and ".o" files.


The code is still very experimental, and any hints on improvements on readability or methods are welcome.
If you decide to modify the code, please feel free to do so. Depending on time/interest I will maintain the code
so I would appreciate if you let me know about any major changes you have made. 

There is an example script in <dir>/data "dofiles" that will produce data in in <dir>/data/output. Just
type "./dofiles", go to "./output" and type "gnuplot *.dem". The text files with results are used by gnuplot
so be careful in manipulating them manually!


2.0 Some of the algorithms used
-------------------------------
The general principle here is to perform fits to normalized and smoothed data. The (supernova) data are usually
noisy and the fits would never converge on the raw data. 

2.1 Maxima-Minima detection
   Max/minima detection (example for minimum)
   Sweep a user-defined window over the signal. For each point where the derivative is close to 
   0, place the window and extract the values in the window. Sort this list. If the lowest
   value is below the current minimum, then it is accepted. If you increase the window, fewer extrem values are found. 
   For maxima the same algorithm is used on the negative signal. 

2.2 Spectral line model
  The model used here is y = a + b*x + c*exp(-(x-d)² * s). The code uses two methods for the fitting: 
  - Gauss-Newton with line search (by Robert Davies)
  - Levengerg-Marquardt (by me)
 
  If one method does not converge (a common case), the other method is tried. 

  In both cases the Jacobian is calulated analytically. The output is written  to individual files ready to 
  inspect in gnuplot. A formfactor defined as 

   f = sqrt(2/s)/height(spectra_line) 

  which is written to a file for each fitted line.  In the files you also find data and model versions of 
  the form factor. The idea is that any larger deviations from the fit will show up in large differences 
  between the model and data parameters. 


2.3 Quality
 The criterion for extracting "good" parts to be fitted is based on max-min amplitude of the 
 "balanced" line. A balanced line is a line where the end max (or min) points have been "balanced"
  down to the other end point. The rationale is that it only makes sense
  to fit points not too skew, although this is a part of the model (the b-term). You can choose the amount
  of balancing. 
  
  Empirical experience shows that for noisy spectra, it make sense to keep the skew parameter low; otherwise
  the fitting may not converge.

  As of recently (see below) an empirical quality assessment is done when fitting the spectral line. 
  it based on the following properties. 
  - the difference beyween the location of the model minimum and data minimum. 
  - the difference betweeen the estimated modle height and actual heigth from data.
  - the normalized rms error. 

  All errors - except the normalized rms error - are squared and summed, and passed through the squashing function
 
  quality = (1+tanh(k*sqrt(rms_error) - m))/2           (quality in [0..1])
  
  where m and k are constants (see specdata.cpp) determined empirically. A "good" line has a quality > 0.5 and 
  and rms error > 0.005 (a constant). It should be possible to determine the constants using staistics and your "eyes"
  and data mining. 

 
3. Fitting of black-Body Planck spectrum

  The strategy here is to derive the envelope of the maximum points of the spectrum and feed this
  into the same Gauss-Neston procedure as above. The results have to interpreted with care, but may serve as
  an initial estimate of the temperature. 

  The Hilbert Transform envelope detection for better estimates of Planck, does not seem to 
  make any sense here. Feel free to experiment with it anyway (see signal.h). 

4. Filters used
  It is well known that most FIR filters in standard text books in signal processing are useless in practice in
  spectrosopy. I am not an expert in the field, so based on the recommendations of others I have used 2 approches: 

  - Adaptive Triangular FIR filtering (ATF) based on a user-defined SNR ratio
  - Savizky-Golay filters (SM), essentially an polynomial LMS fit in filter form (see Numerical Recipes)
 
  The SM filters retain the spectral shapes better than the ATF filters. I usually use a filterlength of 25-33 which
  seems to work fine for the data here. 

  The latter ATF retain the spectral signal to noise ratio and remove noise more efficiently. 
  The ATF has a varying length determined by the error bars in the data: 

  length = desired_snr / (signal / noise) = desired_snr / snr 

  For the supernove data I have seen the snr is in the range [15,30], so I usually put the desired snr to 250-300. 


------------------------------------------------------------------------------
Changes: 
2002-05-13: - Added choice of wavelength interval in parameter list in findline

2002-05-14: - Added simpson integration in signal.cpp. 
            - Added energy ratio between normalized spectrum and planck curve (in <file>-planck.txt)
            - Cleaned up fineline and put procedures as methods in specfind. 

2002-05-15: - Now write energy ratio only if planck model converged. Also
              added energy window params in findline. 

2002-05-21: - Added Levenberg-Marquardt method for fitting, seems to converge more often than (??)
              Conjugated gradient. It resides under a special namespace, LMS (see specfind.cpp)
            - Split files into specdata.cpp/h and specfind.cpp/h
            - findline has an extra option (method), 0 = NEWMAT, 1 = Levenberg-Marquardt.

2002-05-23: - Added annotation in output, now write spectral scale in "-norm.txt"

2002-05-24: - Added separate choice of fitting methods for both planck and line fits.
            - Added script "doplot" to generate gnuplot files, currently only for "-norm" files
            - Changed form factor measure to sqrt(2/s)/h
            - Added automatic generation of gnuplot files

2002-05-25: - Fixed crashbug in levmarq, now use real C++ exception handling
            - Fixed problem in trying to plot empty planck files
            - Added centralized names of files ("-norm" etc) in specdata.cpp. 
            - Added namespaces and std::cerrfor better compatibility with windows

2002-05-28: - Fixed bug in calculation of normalized error bars. 
            - Removed error-weights in one step of LM iteration
            - Changed criterion for LM-convergence. Now MUCH more liberal and produces good results
            - Added statistics (rms-error and DOF) in all "-lineparam" files. 

2002-05-30: - Now perform Matrix inversion in 2 step seems to prevent NEWMAT from hanging and core dumps.
            - Only write Planck/spectrum energy ratio if values reasonable

2002-05-31: - Now scale spectral values and error bars with the same factor
            - Added model and data spectral minima (lambda) and corresponding form factors for
              both LM and CG methods.
2002-06-09: - Added model and data versions of the form factors in the parameter files. These give an indicator
              if the fit makes amy sense or not.
            - Removed the "min_deriv" and "max_deriv" and use "0" as default values. 
            - Added 2 parameters: 
                "fact_keep": The amount of spectral trough to use in fit. Might be useful for P-cygni profiles. 
                "min_nofp" : Only do the above if there is sufficient number of data points.

2002-06-14: - Added recovery when one of LM or Gauss-Newton fails; try the other method first before signal a failure. 
            - Added Triangular weigthing in moving average. Much better spectral properties, and variance reduction only
              slightly worse (4/3N instead of 1/N). 
            - Now write most of the messages to log file instead of stderr. 

2002-06-16: - Added adaptive filtering based on the normalized errors. At every point the spectrum is averaged, the length
              of which is determined by the ratio between the desired and current signal-to-noise ratio. Updated scripts
              accordingly. 

2002-06-18: - Replaced moving average by a Savritzky-Golay (n,4) filter which preserves spectral peaks better than normal
              averaging. The adaptive filtering is still performed with the triangular filter, but the non-snr filter is a
              SG-filter.

2002-06-29: - Added a first attempt to determine the quality of the spectral fit (see specdata.cpp, SpectralLineProp).
              The fit is always done and the quality is written into dem and result files. 

2002-09-03: - Should now work for the new gcc3 compiler series. 

2003-08-15: - removed using namespace in .h files, upgraded to newmat11

Sören Molander, August 15, 2003