Private Database

It is possible within specdb to generate a private database that can be used in tandem with the public database(s). The following document describes the procedure.

Notebooks

Setup

Directory Tree

The main input is a simple directory tree containing the FITS files of individual spectra. Each branch off the main tree generates a unique group in the database. It is also expected (although not required) that each branch contains FITS files from a single instrument. One is allowed to have sub-folders in a branch, although this is also not recommended.

Here is an example of a directory tree (from the test dataset in specdb):

├── test_privateDB
|  ├── testDB_ztbl.fits
|  ├── ESI
|  |  ├── ESI_meta.json
|  |  ├── ESI_meta.ascii
|  |  ├── SDSSJ220758.30+125944.3_F.fits
|  |  ├── SDSSJ220758.30+125944.3_E.fits
|  ├── LRIS
|  |  ├── LRIS_meta.json
|  |  ├── SDSSJ230044.36+015541.7_r600_F.fits
|  |  ├── SDSSJ230044.36+015541.7_b400_F.fits
|  ├── COS
|  |  ├── COS_meta.json
|  |  ├── COS_ssa.json
|  |  ├── J095240.17+515250.03.fits.gz
|  |  ├── J095243.05+515121.15.fits.gz

Meta parameter file

Every group in specdb includes a meta table and there is a required set of columns (see Meta Data) for the list and descriptions.

The code can automatically generate the meta table from the data files alone, but it is highly recommended that you guide this process by providing a meta parameter file in each branch of the tree. It must be a JSON file and it must end in _meta.json.

Here is an example file:

{
   "maxpix": 60000,
   "meta_dict": {
      "TELESCOPE": "HST"
   },
   "parse_head": {
      "DATE-OBS": "DATE",
      "GRATING": "OPT_ELEM",
      "INSTR": "INSTRUME",
      "R": true
   }
}

This example sets the maximum number of pixels in any given file within the branch (default: 10000). The values in meta_dict are also set for all the files in the branch. The items in parse_head indicate which header keyword to use for retrieving the corresponding values for each individual file of the branch. These values may be different from file to file and this is a convenient way to get them directly from the headers of the FITS files. The spectral resolution (R) may be dynamically calculated within specdb. That is the default and it is asserted with True in this example.

Meta table

It is recommended that one attempt to extract as much of the meta data from the spectra files as possible. However, it is possible to include additional meta data (or to over-ride the meta data in the spectra) by including a meta data table. The format is either ASCII or FITS with a .ascii or .fits extension that must be read by astropy.table.Table.read().

SPEC_FILE is a required column which gives the name of the spectral file to match against the meta data.

Here is an example from the test suite (ESI_meta.ascii):

SPEC_FILE                         tGRB
SDSSJ172524.66+303803.9_F.fits    2009-11-23:10:12:13.2
SDSSJ220758.30+125944.3_F.fits    2007-08-13:10:22:23.3

This will add the time of the GRB to the meta data table.

SSA info

specdb includes software to enable SSA queries of your database. For this to work, however, one must provide a few additional fields for each data group. These are provided with a JSON file in each branch with extension _ssa.json.

The required keys are:

Key Type Description
Title str Title for the data group
flux str Sets units and ucd for the flux. Allowed values are flambda, normalized
fxcalib str Sets Calibration field. Allowed values are NORMALIZED, ABSOLUTE, RELATIVE

See the COS_ssa.json file in the test suite for an example.

One is also required to include a Publisher value. This is defaulted to ‘Unknown’, but can be set in the call to mk_db() or with the –publisher keyword in the script.

Redshift table

Each unique source in your database (RA, DEC) is required to also have a redshift. This must be supplied as a separate Table with at least the following columns:

Key Type Description
RA float RA in degrees
DEC float DEC in degrees
ZEM float Redshift value
ZEM_SOURCE str Name of the source (e.g. SDSS, BOSS)

One can generate one’s own table or specify any of the public specdb databases (e.g. igmspec). If you generate your own, place in the top-level of the database tree and give it an extension _ztbl.fits (see example of tree structure above).

Spectra

Spectra will be ingested provided they can be read with linetools.spectra.io.readspec. You can test whether this is the case by running:

lt_xspec name_of_spectrum

on any of your files. The default is to take any FITS file in the branch (and sub-folders) except those files with these extensions: ‘c.fits’, ‘C.fits’, ‘e.fits’, ‘E.fits’, ‘N.fits’, ‘old.fits’.

Another option is to feed the ingest_spectra() method an XSpectrum1D object with all the spectra, aligned to the input meta Table.

Quick go

Script

The database construction is intended to be run in one go with a single command from the command line. One uses the specdb_privatedb script. Here is the current usage:

usage: specdb_privatedb [-h] [--ztbl ZTBL] [--zspecdb ZSPECDB]
                        [--version VERSION] [--fname]
                        db_name tree_path outfile

Generate a private specdb DB

positional arguments:
   db_name            Name of your private DB
   tree_path          Path to the directory tree of spectral files
   outfile            Filename for the private DB HDF5

optional arguments:
   -h, --help         show this help message and exit
   --ztbl ZTBL        Name of data file containing redshift info
   --zspecdb ZSPECDB  Name of specdb DB to use for redshifts
   --version VERSION  Version of the DB; default is `v00`
   --fname            Parse RA/DEC from filename?

And here is an example of running it on the test DB:

cd specdb/specdb/data/
specdb_privatedb testDB test_privateDB tst_DB.hdf5

This will create a private DB called testDB from the directory tree test_privateDB; the database itself is contained in a single .hdf5 named tst_DB.hdf5

Within Python

Here is a call for the test database in one go from within Python:

from specdb.build import privatedb as pbuild
# Read z table
ztbl = Table.read(specdb.__path__[0]+'/data/test_privateDB/testDB_ztbl.fits')
# Go
tree2 = specdb.__path__[0]+'/data/test_privateDB'
pbuild.mk_db(tree2, 'testDB', 'tst_DB.hdf5', ztbl, fname=True)

If ztbl == ‘igmspec’, the code will attempt to load the IgmSpec database and use the quasars catalog for redshifts.

Step by Step in Python

It is possible that you will need to customize things further. This section describes the step-by-step approach from within Python. Note that there is additional code within mk_db() that may be required.

Get Started

Start the main catalog and set your private ID_KEY:

id_key = 'TEST_ID'
maindb, tkeys = spbu.start_maindb(id_key)

This sets a global variable within specdb.build.utils

Grab Files

The grab_files() method searches through a given branch to find all FITS files and a meta parameter file. By default, the code ignores any files with the following extensions: ‘c.fits’, ‘C.fits’, ‘e.fits’, ‘E.fits’, ‘N.fits’, ‘old.fits’.

Here is an example call:

branch = specdb.__path__[0]+'/data/test_privateDB/ESI/'
flux_files, meta_file, custom_meta_table = pbuild.grab_files(branch)

Meta

From the list of FITS files, a META table is generated. The redshift table must be supplied (as an astropy Table). Here is an example call:

meta = pbuild.mk_meta(flux_files, ztbl, fname=True, mdict=mdict, parse_head=pdict, skip_badz=True)

The fname flag indicates that the RA/DEC are to be parsed from the FITS filename. The skip_badz flag allows the code to skip sources that are not cross-matched to redshift table (instead of terminating).

Add Groups and IDs

Update the main catalog and fiddle about with a few tags in the meta table. Also update the group dict:

gdict = {}
flag_g = spbu.add_to_group_dict('COS', gdict)
maindb = pbuild.add_ids(maindb, meta, flag_g, tkeys, id_key, first=(flag_g==1))

The group dict is eventually written to the HDF5 file.

Ingest Spectra

The ingest_spectra() method loops through the spectral files, reads each, and populates a hdf5 dataset. It also converts the meta table into a separate, parallel hdf5 dataset.

Here is an example:

hdf = h5py.File('tmp.hdf5','w')
pbuild.ingest_spectra(hdf, 'test', meta, max_npix=50000)

Finish

Before writing, the code tests whether the meta data tables can be stacked using the specdb.utils.clean_vstack() method. This may be required for queries of the meta data and spectra extraction. The code will hit a pdb.set_trace() if this fails.

Write the catalog and close the HDF5 file.:

zpri = [str('SDSS'), str('BOSS')]
pbuild.write_hdf(hdf, 'TEST_DB', maindb, zpri, gdict, 'v01')

Also writes the creation date and sets the version. zpri is a list of strings indicating the priority given to redshift assignment.