# CASSIS/LR pipeline documentation

The CASSIS pipeline is automatic from the image cleaning to the spectral extraction itself and the creation of the output products. Please read this page entirely to make sure the spectra fit your quality criteria. Spectra are expected be of publishable quality. If you consider this is not the case, please notify us with the problematic AORkeys so that next versions can be improved.

This documentation applies to the CASSIS spectra versions LR4, LR5, and LR6. Specific information is given for each version if needed.

The following is a more detailed version of the paper accompanying the online low-resolution database: "Cornell Atlas of Spitzer/IRS Spectra", Lebouteiller, V., Barry, D.J., Spoon, H.W.W., Bernard-Salas, Sloan, G.C., J., Houck, J.R., Weedman, D.W., ApJS, 196, 8. (ADS services)

Spectra are available via the online database where they can be downloaded along with various other products:

--- CASSIS online database ---

Here is a tree of the first part of the pipeline (until the flux calibrated spectra). The main and default branch is highlighted.

## 1. Input data

The input data for the CASSIS pipeline are the BCD images, more exactly the single exposure images also referred to as DCE images, for Data Collection Events. BCD images were processed by the SSC pipeline which removes the electronic and optical artifacts, including dark current, droop effect, non-linearity, radhit, jail-bar pattern, stray light, and flat-fielding. Exposure images are carried through the extraction pipeline along with the corresponding uncertainty image file and the mask of flagged pixels (BMASK).

The first public version of CASSIS/LR used data created with the SSC pipeline release S18.7.0. Internally, this version corresponded to the 4th iteration of the pipeline, so that products are labelled ”LR4”. The latest and final S18.18.0 calibration files are used for version LR5 and LR6 of CASSIS. New public versions are expected as minor modifications in the algorithm warrant a new release. The online database is updated accordingly.

## 2. Image cleaning

The BMASK image contains the flag value of each pixel. CASSIS considers that a pixel is bad if the mask value is greater than 256. Pixels with such flag values have the following issues identified by the SSC BCD pipeline: The flat field could not be applied, the stray-light removal or crosstalk correction could not be applied, pixel is saturated beyond correctable non-linearity in sample(s) along ramp, data missing in downlink in sample(s) along ramp, only one or no usable plane, pixel identified as permanently bad. We refer to the Spitzer/IRS documentation for more details on the flags. Although pixels with BMASK values above 256 could still be considered as usable data, it is important that the pipeline remains conservative in order to minimize potential artifacts.

Another group of bad pixels, referred to as ”rogue” pixels, are pixels that misbehave over long periods of time, randomly switching states on much shorter timescales. They cannot be properly calibrated. The SSC has released a series of masks identifying the long-term rogues for each IRS campaign. All the bad pixels, including rogue pixels, are cleaned using the IRSCLEAN tool. The badfix method is used with the BMASK file along with a conservative ”super-rogue” mask that combines rogue pixels from the relevant observation campaign and earlier campaigns. We refer to the IRSCLEAN documentation for more details on the cleaning process.

Note that when several bad pixels are contiguous, the pixel replacement algorithm badfix will not fix the pixel(s) in the middle of the cluster. Further pixel verifications are performed in the next CASSIS pipeline steps to attempt a correction.

The uncertainty file is cleaned using the same mask as for the data image. The BMASK values are updated, i.e., the pixels that were cleaned successfully are changed to a null mask value. Check the error propagation page for more details.

## 3. Combination of exposures

Before co-adding the individual exposure images (DCEs), CASSIS checks the dispersion of the reconstructed coordinates over the observation. If the dispersion is significant, co-adding the images would result in blurring the source spatial profile so that the optimal extraction could become unreliable. For this reason, CASSIS co-adds the exposures only when the coordinate dispersion is lower than a certain threshold (currently more than 0.1 pixel in the detector).

Several cases can be distinguished for the image co-addition depending on the number of exposures available.

1 exposure. Single exposures are not modified and simply transferred to the next step.

2 exposures. Rather than performing a simple combination (such as an error-weighted average), the pipeline takes advantage of having 2 exposures to flag bad pixels that were not cleaned or could not be cleaned to produce a better result. First, pixels are compared using their BMASK value and if one pixel has a higher value than the other, it is ignored. Further bad pixel flagging is achieved by analyzing each column of the image separately. A column can be seen almost as a spectrum, since the wavelength axis is almost parallel to the detector y-axis. Outliers are then identified using their deviation from the local flux variance (see figures below). When the 2 pixels are equally bad, the average is taken. In other cases when one pixel dominates the difference to the local variance, only the pixel from the other image is used. All the other pixels that were not flagged as outliers have error-weighted average fluxes.

The final uncertainty on the combined pixel is the sum of the average of the individual errors and of the flux difference between the 2 pixels (to account for the fact that RMS errors in the input images might be underestimated in some cases). Check the error propagation page for more details.

Fig: Pixels are flagged in the difference function between the 2 columns using their deviation from the local variance.

More than 2 exposures. For each pixel, the median over all the images is averaged with the error-weighted-average. This is done to minimize systematic errors due to unreliable uncertainty values.

Uncertainties are combined accordingly, i.e., taking the average between the error on the median flux and the average error. Check the error propagation page for more details.

In a few cases (representing 0.35% of the observations) there is a significant dispersion over the DCEs (0.1 pixels, i.e., about 0.2'' in SL and 0.5'' in LL) that requires the pipeline to performs steps in a different order:

• The low-level rogue pixels are removed using all the available background images (see next section). For this reason, co-added images are always calculated, regardless of the coordinate dispersion.
• The individual exposures are extracted separately.
• The corresponding spectra are combined, resulting in one spectrum per module, order, and nod position. Nod spectra are then combined.

At this point of the pipeline, there is one image per module, order and nod position. Note that for staring cluster observations, there can be several cycles with several exposures for a given position. In this case, the spectra corresponding to the various cycles are first combined.

## 4. Removal of the background emission and of low-level rogue pixels

Although images were cleaned, low-level rogue pixels can remain. It is possible to remove their contribution to the intrinsic source emission by subtracting a background image or a set of images from the same module. This step also allow to subtract the large-scale background emission (mostly dominated by zodiacal dust emission).

Two methods are used:

• Subtraction by order. The two nod images corresponding to the other order are (error-weighted) averaged. Contamination at the location of the current nod extraction is also checked (see above). The by order background being more distant to the science source than the by nod background, it is possible that the by order background does not fully remove the large-scale background emission (or in some cases it overcorrects). The residual background is weak but important to take into account for faint sources. While in LR4 and LR5 no further correction was performed other than the regular background subtraction, the v6 pipeline removes the residual background that remains after the background image subtraction. The residual background is calculated by the median of the spatial profile (ignoring the source) for each row (i.e., each wavelength). The resulting background is then smoothed as a function of wavelength, assuming that the residual background is due to the zodiacal dust (or due to an extended dust component in the source).
• Subtraction by nod. The (single) image corresponding to the other nod position is used for the difference. The SMART-AdOpt algorithm first checks if there is a point-like source or partially extended source at the location of the current nod extraction. The contaminating source detection and spatial extent are registered. Very extended emission is not a problem since it does not modify significantly the source's profile, and it can be removed during extraction. For very faint sources, the background removed by differencing the 2 nod images could result in a significant residual background offset. In LR7, the local background is removed for nod-subtracted images, similarly to what was done for order-subtracted images in v6 (see above).

When the image difference is performed, uncertainties are combined quadratically. Check the error propagation page for more details.

In some cases, there are contaminating sources in both the by nod and by order images, so that a third method is used, referred to as "in situ", which simply removes the extended emission during optimal extraction, i.e., without removing the low-level rogue pixels. This is the last resort to remove extended emission. The by nod and by order usually produce far better results.

The spectral extraction using the various background subtraction methods can be checked in the webpage. The best background subtraction is chosen automatically for the default spectrum (see Section 8), but all the spectra are available per request.

## 5. Pre-extraction diagnostic

### 5.1 Spatial extent

All the images (exposure-combined, background-subtracted) go through the SMART- AdOpt? program, which first estimates the source spatial extent. The source extent diagnostic is essential to determine what extraction method is suitable. For point-like sources, the preferred extraction is the optimal extraction (i.e., the PSF is used to scale the spatial profile of the source) which produces the best SNR. For partially-extended sources, a tapered column extraction is preferred (i.e., the flux is integrated in a given spatial window) since it recovers most of the source’s flux. For very extended sources, a full slit extraction is more appropriate.

In order to derive the spatial extent, AdOpt? derives the ratio between the full width at half maximum (FWHM) of the source spatial profile and the PSF profile. Both profiles are collapsed on the 20 detector rows corresponding to the shortest wavelengths, i.e., where the PSF is the narrowest. The resulting ratio is directly related to the intrinsic spatial extent of the source:

S =  P \sqrt{x^2-1},

where S is the intrinsic FWHM of the source, P is the FWHM of the PSF, and x is the ratio between the FWHM of the observed profile and the FWHM of the PSF. Note that this method assumes that the source can be thought as a broadened PSF so it is reliable only for partially extended sources.

The global extent of a source is calculated using the average of the S values derived for each module and order. Weights are applied to each value based on the detection level and assuming a systematic 5% uncertainty on the spatial profile FWHM determination.

Note that values of S can be different for each module measurement because the intrinsic source extent might vary with wavelength.

### 5.2 Multiple sources (applies to version 4 of CASSIS)

Before the extraction of the intended source is performed, the presence of another source in the slit is checked. Depending on the relative brightness between the intended source and the contaminating source, it is possible that the local background emission is not well determined. If there is a positive detection of a contaminating source within the slit, the extraction continues but a flag is carried through for the corresponding module/order spectrum and appears in the result page of the online interface. In such cases the users can check the diagnostics plots provided by the interface to judge the quality of the extraction.

Versions LR5 and up: the residual background subtraction is removed differently as compared to version LR4. While in version LR4 the residual emission was estimated simultaneously with the PSF scaling factor, in versions LR5 and up, the residual emission is removed before the extraction (Section 4).

## 6. Optimal extraction

Optimal extraction is performed with the Smart-AdOpt algorithm which makes use of an empirical super- sampled PSF (Lebouteiller et al. 2010). It is necessary to use a super-sampled PSF since it can be shifted and resampled anywhere along the slit, making the optimal extraction valid for any source’s position. For this reason the AdOpt? optimal extraction is an ideal choice for extracting the full sample of IRS observations.

Since the CASSIS atlas is initially meant to provide the spectra of targeted sources (as opposed to serendipitous sources), the source finder is limited to positions around the requested nod, with a range of +/-2 px to account for slight mispointings (changed to +/-3 px in version 5). The source position is then fine tuned to an accuracy of better than a tenth of a pixel.

For version LR4: Depending on the geometry, local extended emission might still be present despite the background image subtraction. This is because the background image does not correspond exactly to the background at the source's position. For this reason, a 0-degree polynomial background is calculated on-the-fly by AdOpt?. It is important that other bright sources are not contaminating the observation so that the polynomial background is well estimated. Note that when a serendipitous (contaminating) source is found in the slit, the 0-order polynomial subtraction is skipped because it might introduce systematic errors. In some cases, only 1 nod is affected by this problem, while for the other one a local background subtraction is performed. This implies that a flux offset could be observed between the 2 nod spectra. This problem was solved in version LR5 and up.

In order to calculate the flux at a given wavelength, SMART-AdOpt performs a multiple linear regression (Lebouteiller et al. 2010). Pixels from the input image pixels have weights defined as 1/_sigma_ where sigma is the propagated uncertainty.

The uncertainty on the extracted flux is given by the (1_sigma_) error output from the multiple linear regression algorithm. Check the error propagation page for more details.

Important caveats for version LR4: The flux uncertainties were underestimated in version LR4. The problem was fixed in version LR5.

A quantitative detection level is calculated by the AdOpt? algorithm:

• In CASSIS LR4, the detection level represents the percentage improvement between the initial image and the residual image in which the source has been subtracted. For simplicity, the detection level is recoded in an integer value illustrating the quality of the detection:
• 0 for sources not detected, 1 for tentatively detected, 2 for barely detected, 3 for well detected, 4 for very well detected.
• The global detection level given in the 'Source Properties' box is the maximum level among the module/order/nod spectra.
• In CASSIS LR5 and up, the detection level approximates the detection in sigma units, averaged over the wavelength range for a given module/order/nod.
• The global detection level given in the 'Source Properties' box is the maximum detection, in sigma, among the module/order/nod spectra, and ignoring order 3 spectra.
• Here's a table making the correspondance between the detection level (DL) and the local SNR per pixel:  # mean of the median SNR per pixel module DL1 DL2 DL3 DL4 SL2 2.75 13.45 22.73 38.85 SL1 2.29 14.54 22.69 49.53 LL2 2.87 6.86 26.13 85.44 LL1 2.51 6.20 16.81 116.00

Important: For partially extended sources or extended sources, the detection level will be low, because even by removing a point-like source, the residuals can be significant - if not larger - than the point-like source flux.

### 6.1 Wavelength grid

As explained in Lebouteiller et al. (2010), AdOpt? extracts detector rows instead of pseudo-rectangles (corresponding to a zone in the image where pixels have the same wavelength). As a result, the wavelength grid depends on the exact source position. The wavelength grid can also be interpolated on a common reference grid (the SSC "wavesamp" calibration file). By default, CASSIS uses the interpolated spectra in order to provide wavelength grids compatible with the tapered column extractions. For convenience, the observed (exact) wavelength grid is also provided in the later steps. (see the Smart online documentation for more details)

By keeping the observed wavelength grid, the spectral resolution is slightly better than what can be achieved with the optimal extraction of the SSC SPICE software at the expense of SNR. In contrast, the SNR when interpolating the wavelength grid, is similar to the optimal extraction in SPICE. It must be kept in mind however that the wavelength grid interpolation results in a smoother spectrum but it can make two bad pixels out of one (which is also an undesired effect of pseudo-rectangle extractions).

The CASSIS webpage provides access to the plots of spectra obtained for the various wavelength gris explained above. The spectra themselves are available per request.

### 6.2 Nod spectra combination

Although the spectrum of each nod is available as a product (per request), CASSIS provides by default a single, nod-combined, spectrum. The combination process depends on whether the spectra were interpolated or not (Section 6.1). We discuss both methods separately.

• Reference ("wavesamp") wavelength grid. The two nod spectra are on the same reference wavelength grid. By default, the co-added spectrum is the error-weighted average of the two nod spectra. The error function (difference between the spectra) is calculated, smoothed using a multi-resolution algorithm, and used to identify outliers in the individual nod spectra. Pixels are corrected accordingly to their relative discrepancy, i.e., if the pixels in the two nod spectra are outliers, the error-weighted average is used, but if only one pixel is an outlier, the other nod spectrum is used.
The errors are interpolated the same way as the flux. Check the error propagation page for more details.

• Observed wavelength grid. The two nod spectra are first interleaved and aligned (see Figure 4 in Lebouteiller et al. 2010). Alignment is performed by calculating the smoothed error function (difference between the spectra). The result is equally split to each nod to align the spectra. Outliers are identified essentially the same way as for the interpolated nod spectra. The only difference is that a pixel identified as an outlier is flagged and given a not a number (NaN?) value. The final spectrum (one per module: SL, LL) has twice more points than the input nod spectra. It is referred to as the "fully sampled" spectrum.

Fig: Pixels are flagged in the difference function between the 2 nods using their deviation from the local variance.

• Another version of the final spectrum is calculated by interpolating the interleaved spectrum on the reference wavelength grid. Note that this is different from the co-addition of interpolated nod spectra. The interpolation of the interleaved spectrum includes more points and is usually more accurate.

The CASSIS webpage provides access to the plots of spectra obtained for the various wavelength gris explained above. The spectra themselves are available per request.

### 6.3 Flux calibration

Important note on flux calibration: The first public version (internally "v4") used S18.7.0 data. CASSIS spectra version LR5 and up use the final calibration, i.e., S18.18.0.

The flux density is converted from e-/sec to Jy using the default option in Smart-AdOpt, i.e., the use of a relative spectral response function (RSRF) derived from the observations of calibration stars. For S18.7.0 data, a "super-RSRF" was created from 3 different calibration stars (HR6348, HD173511, and HR7341).

The errors are also multiplied by the RSRF. Check the error propagation page for more details.

The flux calibration for optimal extraction assumes that the source is point-like. If the source is partially extended, then the tapered column extraction should be used instead. Using the point-like source calibration on a partially extended source would result in an underestimated flux density and it could slightly modify the spectrum slope and the relative fluxes of features at different wavelengths.

Additional calibration is required in cases when the source is not centered in the slit in the dispersion direction. In such cases, a fraction of the PSF lies outside the slit so that the regular optimal extraction fails to fit the proper profile, providing an underestimated flux. Although the manual extraction of AdOpt? can solve this problem by modifying on-the-fly the PSF profile as a function of the shift in the dispersion direction, this is not possible automatically. Instead, CASSIS checks the PTGDIFFY header keyword which gives the pointing error in the dispersion direction between the coordinates requested by the user and the field of view coordinates (effective coordinates at the nod position in the current module and order). This is by no means a definite proof of a genuine offset as the MIR centroid might not coincide with the requested coordinates. In any case, the data is flagged if PTGDIFFY is significantly large.

For each background subtraction method, the pipeline tree continues as follows. The main and default branch is highlighted.

## 7. Defringing

Fringes are common in infrared detectors (e.g., Kester et al. 2003). For Spitzer/IRS, spectra from the LL1 module are the most affected. The IRSFRINGE algorithm is used by CASSIS to remove the fringes in that module only. Since defringing can produce undesired artifacts for sources with a low SNR, only sources with LL1 spectra with SNR>5 are considered.

Since defringing is a complex process that uses several hypotheses and parameters, we also make the uncorrected spectra available as a product per request. Contact the CASSIS team for more information.

The fluxes and errors are corrected the same way when removing the fringes. Check the error propagation page for more details.

## 8. Creating the default optimal spectrum

The default spectrum is calibrated and defringed.

The main parameter that will decide the default spectrum is the way the background was subtracted (Section 4). In practice, no subtraction method is always better than the other. For this reason, a choice is made between the method that uses these constraints in order of priority:

• no contaminating source in the background image. A contaminating source is defined as being point-like or partially-extended, with a detection level equal or above 2 (at least barely detected), and with a signal-to-noise at least 10% of that of the main source.
• signal-to-noise is greater than the other methods

## 9. Default extraction method

Depending on the source spatial extent, the tapered column extraction might be preferred (see Other extraction methods). While the choice was left to the user in LR4, with diagnostic messages as help, the extraction method is now chosen automatically in LR5 and up. A message accompanies the spectrum displayed on the webpage to explain the choice.

The alternative extraction method and corresponding products are always available through the webpage. We encourage users to verify the choice made by CASSIS by checking the spatial profiles and by comparing the optimal extraction to the tapered column spectra.

## 10. Website

We provide in the following some help on the information displayed in the website. More information will be added as needed.

Observation panel

• The object name is the name given by the observer. The resolved name is given in the pipeline panel (below the spectrum, unroll the twisty called "Click here to access the information provided by the pipeline...".)
• The coordinates correspond to the position where the spectral extraction was performed. It may differ, sometimes significantly, from the coordinates requested by the observer. It usually corresponds to cases when the observer chose peak-up acquisition to point the telescope to the source. In such cases, the target was chosen as the infrared brightest source in the field.

Pipeline panel

• The CASSIS website provides the redshift (when available) using the extracted coordinates as constraints to query NED.

Log file

• Click on the twisty "Click here to display the full log file" to open the log file. This is a long text file describing all the pipeline steps. Warnings, if present, are also copied in the pipeline information section. The log file may be made of several consecutive pipeline runs, some of them starting at various starting points within the pipeline.