F I N D L I N E by SÖren Molander soren.molander@bredband.net http://hem.bredband.net/sormol Copyright Sören Molander, 2002 An experimental program for spectral line fitting, developed for supernova spectra for Supernova Group at Stockholm University. The following people have helped in developing the program: - Gabriele Garavini (ideas, criticism) - Ariel Goobar (ideas and support) - Robert Davies (The matrix library) ---------------------------------------------------------------------------------------------------------- 1.0 Introduction ---------------- 1.1 Purpose This C++/Linux software contains algorithms to analyze min-max of spectra and perform automatic spectral-line fits and give quality assessments of the fits. It also fits a Planck-curve to the data. When compiled it yields a single executable program "findline". It is best run using a script, which you will find in the distribution. The programs should be compilable under windows but scripts would have to be replaced with a wrapper of some kind, since windows is notorious for its lack of support for automation. To be useful the "gnuplot" program should be used to inspect the results (see www.rpmfind.net) The distributions contains two parts: - A nifty matrix library, newmat10-src (by Robert Davies, http://webnz.com/robert/. Works under windows). - The application code (by me). Tested environments: Linux (SuSE and RedHat 8.x, 9.x) and Cygwin. Compiles under windows (without script support), if you have any problems, let me know. 1.2 To install: 1. Edit the file "make.top" and enter the home directory for the code. Note that it has to be an absolute path! 2. Type "make all" at this level. This will create the directories lib and bin, compile and copy the matrix libraries, and finally compile the main code 3. If you have doxygen, type "make docs". This will create a html browsable code documentation in "doc/html" 4. "make clean" removes the html doc and ".o" files. The code is still very experimental, and any hints on improvements on readability or methods are welcome. If you decide to modify the code, please feel free to do so. Depending on time/interest I will maintain the code so I would appreciate if you let me know about any major changes you have made. There is an example script in /data "dofiles" that will produce data in in /data/output. Just type "./dofiles", go to "./output" and type "gnuplot *.dem". The text files with results are used by gnuplot so be careful in manipulating them manually! 2.0 Some of the algorithms used ------------------------------- The general principle here is to perform fits to normalized and smoothed data. The (supernova) data are usually noisy and the fits would never converge on the raw data. 2.1 Maxima-Minima detection Max/minima detection (example for minimum) Sweep a user-defined window over the signal. For each point where the derivative is close to 0, place the window and extract the values in the window. Sort this list. If the lowest value is below the current minimum, then it is accepted. If you increase the window, fewer extrem values are found. For maxima the same algorithm is used on the negative signal. 2.2 Spectral line model The model used here is y = a + b*x + c*exp(-(x-d)² * s). The code uses two methods for the fitting: - Gauss-Newton with line search (by Robert Davies) - Levengerg-Marquardt (by me) If one method does not converge (a common case), the other method is tried. In both cases the Jacobian is calulated analytically. The output is written to individual files ready to inspect in gnuplot. A formfactor defined as f = sqrt(2/s)/height(spectra_line) which is written to a file for each fitted line. In the files you also find data and model versions of the form factor. The idea is that any larger deviations from the fit will show up in large differences between the model and data parameters. 2.3 Quality The criterion for extracting "good" parts to be fitted is based on max-min amplitude of the "balanced" line. A balanced line is a line where the end max (or min) points have been "balanced" down to the other end point. The rationale is that it only makes sense to fit points not too skew, although this is a part of the model (the b-term). You can choose the amount of balancing. Empirical experience shows that for noisy spectra, it make sense to keep the skew parameter low; otherwise the fitting may not converge. As of recently (see below) an empirical quality assessment is done when fitting the spectral line. it based on the following properties. - the difference beyween the location of the model minimum and data minimum. - the difference betweeen the estimated modle height and actual heigth from data. - the normalized rms error. All errors - except the normalized rms error - are squared and summed, and passed through the squashing function quality = (1+tanh(k*sqrt(rms_error) - m))/2 (quality in [0..1]) where m and k are constants (see specdata.cpp) determined empirically. A "good" line has a quality > 0.5 and and rms error > 0.005 (a constant). It should be possible to determine the constants using staistics and your "eyes" and data mining. 3. Fitting of black-Body Planck spectrum The strategy here is to derive the envelope of the maximum points of the spectrum and feed this into the same Gauss-Neston procedure as above. The results have to interpreted with care, but may serve as an initial estimate of the temperature. The Hilbert Transform envelope detection for better estimates of Planck, does not seem to make any sense here. Feel free to experiment with it anyway (see signal.h). 4. Filters used It is well known that most FIR filters in standard text books in signal processing are useless in practice in spectrosopy. I am not an expert in the field, so based on the recommendations of others I have used 2 approches: - Adaptive Triangular FIR filtering (ATF) based on a user-defined SNR ratio - Savizky-Golay filters (SM), essentially an polynomial LMS fit in filter form (see Numerical Recipes) The SM filters retain the spectral shapes better than the ATF filters. I usually use a filterlength of 25-33 which seems to work fine for the data here. The latter ATF retain the spectral signal to noise ratio and remove noise more efficiently. The ATF has a varying length determined by the error bars in the data: length = desired_snr / (signal / noise) = desired_snr / snr For the supernove data I have seen the snr is in the range [15,30], so I usually put the desired snr to 250-300. ------------------------------------------------------------------------------ Changes: 2002-05-13: - Added choice of wavelength interval in parameter list in findline 2002-05-14: - Added simpson integration in signal.cpp. - Added energy ratio between normalized spectrum and planck curve (in -planck.txt) - Cleaned up fineline and put procedures as methods in specfind. 2002-05-15: - Now write energy ratio only if planck model converged. Also added energy window params in findline. 2002-05-21: - Added Levenberg-Marquardt method for fitting, seems to converge more often than (??) Conjugated gradient. It resides under a special namespace, LMS (see specfind.cpp) - Split files into specdata.cpp/h and specfind.cpp/h - findline has an extra option (method), 0 = NEWMAT, 1 = Levenberg-Marquardt. 2002-05-23: - Added annotation in output, now write spectral scale in "-norm.txt" 2002-05-24: - Added separate choice of fitting methods for both planck and line fits. - Added script "doplot" to generate gnuplot files, currently only for "-norm" files - Changed form factor measure to sqrt(2/s)/h - Added automatic generation of gnuplot files 2002-05-25: - Fixed crashbug in levmarq, now use real C++ exception handling - Fixed problem in trying to plot empty planck files - Added centralized names of files ("-norm" etc) in specdata.cpp. - Added namespaces and std::cerrfor better compatibility with windows 2002-05-28: - Fixed bug in calculation of normalized error bars. - Removed error-weights in one step of LM iteration - Changed criterion for LM-convergence. Now MUCH more liberal and produces good results - Added statistics (rms-error and DOF) in all "-lineparam" files. 2002-05-30: - Now perform Matrix inversion in 2 step seems to prevent NEWMAT from hanging and core dumps. - Only write Planck/spectrum energy ratio if values reasonable 2002-05-31: - Now scale spectral values and error bars with the same factor - Added model and data spectral minima (lambda) and corresponding form factors for both LM and CG methods. 2002-06-09: - Added model and data versions of the form factors in the parameter files. These give an indicator if the fit makes amy sense or not. - Removed the "min_deriv" and "max_deriv" and use "0" as default values. - Added 2 parameters: "fact_keep": The amount of spectral trough to use in fit. Might be useful for P-cygni profiles. "min_nofp" : Only do the above if there is sufficient number of data points. 2002-06-14: - Added recovery when one of LM or Gauss-Newton fails; try the other method first before signal a failure. - Added Triangular weigthing in moving average. Much better spectral properties, and variance reduction only slightly worse (4/3N instead of 1/N). - Now write most of the messages to log file instead of stderr. 2002-06-16: - Added adaptive filtering based on the normalized errors. At every point the spectrum is averaged, the length of which is determined by the ratio between the desired and current signal-to-noise ratio. Updated scripts accordingly. 2002-06-18: - Replaced moving average by a Savritzky-Golay (n,4) filter which preserves spectral peaks better than normal averaging. The adaptive filtering is still performed with the triangular filter, but the non-snr filter is a SG-filter. 2002-06-29: - Added a first attempt to determine the quality of the spectral fit (see specdata.cpp, SpectralLineProp). The fit is always done and the quality is written into dem and result files. 2002-09-03: - Should now work for the new gcc3 compiler series. 2003-08-15: - removed using namespace in .h files, upgraded to newmat11 Sören Molander, August 15, 2003