Private Database¶

It is possible within specdb to generate a private database that can be used in tandem with the public database(s). The following document describes the procedure.

Notebooks¶

Private

Setup¶

Directory Tree¶

The main input is a simple directory tree containing the FITS files of individual spectra. Each branch off the main tree generates a unique group in the database. It is also expected (although not required) that each branch contains FITS files from a single instrument. One is allowed to have sub-folders in a branch, although this is also not recommended.

Here is an example of a directory tree (from the test dataset in specdb):

├── test_privateDB
|  ├── testDB_ztbl.fits
|  ├── ESI
|  |  ├── ESI_meta.json
|  |  ├── ESI_meta.ascii
|  |  ├── SDSSJ220758.30+125944.3_F.fits
|  |  ├── SDSSJ220758.30+125944.3_E.fits
|  ├── LRIS
|  |  ├── LRIS_meta.json
|  |  ├── SDSSJ230044.36+015541.7_r600_F.fits
|  |  ├── SDSSJ230044.36+015541.7_b400_F.fits
|  ├── COS
|  |  ├── COS_meta.json
|  |  ├── COS_ssa.json
|  |  ├── J095240.17+515250.03.fits.gz
|  |  ├── J095243.05+515121.15.fits.gz

Meta parameter file¶

Every group in specdb includes a meta table and there is a required set of columns (see Meta Data) for the list and descriptions.

The code can automatically generate the meta table from the data files alone, but it is highly recommended that you guide this process by providing a meta parameter file in each branch of the tree. It must be a JSON file and it must end in _meta.json.

Here is an example file:

{
   "maxpix": 60000,
   "meta_dict": {
      "TELESCOPE": "HST"
   },
   "parse_head": {
      "DATE-OBS": "DATE",
      "GRATING": "OPT_ELEM",
      "INSTR": "INSTRUME",
      "R": true
   }
}

This example sets the maximum number of pixels in any given file within the branch (default: 10000). The values in meta_dict are also set for all the files in the branch. The items in parse_head indicate which header keyword to use for retrieving the corresponding values for each individual file of the branch. These values may be different from file to file and this is a convenient way to get them directly from the headers of the FITS files. The spectral resolution (R) may be dynamically calculated within specdb. That is the default and it is asserted with True in this example.

Meta table¶

It is recommended that one attempt to extract as much of the meta data from the spectra files as possible. However, it is possible to include additional meta data (or to over-ride the meta data in the spectra) by including a meta data table. The format is either ASCII or FITS with a .ascii or .fits extension that must be read by astropy.table.Table.read().

SPEC_FILE is a required column which gives the name of the spectral file to match against the meta data.

Here is an example from the test suite (ESI_meta.ascii):

SPEC_FILE                         tGRB
SDSSJ172524.66+303803.9_F.fits    2009-11-23:10:12:13.2
SDSSJ220758.30+125944.3_F.fits    2007-08-13:10:22:23.3

This will add the time of the GRB to the meta data table.

SSA info¶

specdb includes software to enable SSA queries of your database. For this to work, however, one must provide a few additional fields for each data group. These are provided with a JSON file in each branch with extension _ssa.json.

The required keys are:

Key	Type	Description
Title	str	Title for the data group
flux	str	Sets units and ucd for the flux. Allowed values are flambda, normalized
fxcalib	str	Sets Calibration field. Allowed values are NORMALIZED, ABSOLUTE, RELATIVE

See the COS_ssa.json file in the test suite for an example.

One is also required to include a Publisher value. This is defaulted to ‘Unknown’, but can be set in the call to mk_db() or with the –publisher keyword in the script.

Redshift table¶

Each unique source in your database (RA, DEC) is required to also have a redshift. This must be supplied as a separate Table with at least the following columns:

Key	Type	Description
RA	float	RA in degrees
DEC	float	DEC in degrees
ZEM	float	Redshift value
ZEM_SOURCE	str	Name of the source (e.g. SDSS, BOSS)

One can generate one’s own table or specify any of the public specdb databases (e.g. igmspec). If you generate your own, place in the top-level of the database tree and give it an extension _ztbl.fits (see example of tree structure above).

Spectra¶

Spectra will be ingested provided they can be read with linetools.spectra.io.readspec. You can test whether this is the case by running:

lt_xspec name_of_spectrum

on any of your files. The default is to take any FITS file in the branch (and sub-folders) except those files with these extensions: ‘c.fits’, ‘C.fits’, ‘e.fits’, ‘E.fits’, ‘N.fits’, ‘old.fits’.

Another option is to feed the ingest_spectra() method an XSpectrum1D object with all the spectra, aligned to the input meta Table.

Quick go¶

Script¶

The database construction is intended to be run in one go with a single command from the command line. One uses the specdb_privatedb script. Here is the current usage:

usage: specdb_privatedb [-h] [--ztbl ZTBL] [--zspecdb ZSPECDB]
                        [--version VERSION] [--fname]
                        db_name tree_path outfile

Generate a private specdb DB

positional arguments:
   db_name            Name of your private DB
   tree_path          Path to the directory tree of spectral files
   outfile            Filename for the private DB HDF5

optional arguments:
   -h, --help         show this help message and exit
   --ztbl ZTBL        Name of data file containing redshift info
   --zspecdb ZSPECDB  Name of specdb DB to use for redshifts
   --version VERSION  Version of the DB; default is `v00`
   --fname            Parse RA/DEC from filename?

And here is an example of running it on the test DB:

cd specdb/specdb/data/
specdb_privatedb testDB test_privateDB tst_DB.hdf5

This will create a private DB called testDB from the directory tree test_privateDB; the database itself is contained in a single .hdf5 named tst_DB.hdf5

Within Python¶

Here is a call for the test database in one go from within Python:

from specdb.build import privatedb as pbuild
# Read z table
ztbl = Table.read(specdb.__path__[0]+'/data/test_privateDB/testDB_ztbl.fits')
# Go
tree2 = specdb.__path__[0]+'/data/test_privateDB'
pbuild.mk_db(tree2, 'testDB', 'tst_DB.hdf5', ztbl, fname=True)

If ztbl == ‘igmspec’, the code will attempt to load the IgmSpec database and use the quasars catalog for redshifts.

Step by Step in Python¶

It is possible that you will need to customize things further. This section describes the step-by-step approach from within Python. Note that there is additional code within mk_db() that may be required.

Get Started¶

Start the main catalog and set your private ID_KEY:

id_key = 'TEST_ID'
maindb, tkeys = spbu.start_maindb(id_key)

This sets a global variable within specdb.build.utils

Grab Files¶

The grab_files() method searches through a given branch to find all FITS files and a meta parameter file. By default, the code ignores any files with the following extensions: ‘c.fits’, ‘C.fits’, ‘e.fits’, ‘E.fits’, ‘N.fits’, ‘old.fits’.

Here is an example call:

branch = specdb.__path__[0]+'/data/test_privateDB/ESI/'
flux_files, meta_file, custom_meta_table = pbuild.grab_files(branch)

Meta¶

From the list of FITS files, a META table is generated. The redshift table must be supplied (as an astropy Table). Here is an example call:

meta = pbuild.mk_meta(flux_files, ztbl, fname=True, mdict=mdict, parse_head=pdict, skip_badz=True)

The fname flag indicates that the RA/DEC are to be parsed from the FITS filename. The skip_badz flag allows the code to skip sources that are not cross-matched to redshift table (instead of terminating).

Add Groups and IDs¶

Update the main catalog and fiddle about with a few tags in the meta table. Also update the group dict:

gdict = {}
flag_g = spbu.add_to_group_dict('COS', gdict)
maindb = pbuild.add_ids(maindb, meta, flag_g, tkeys, id_key, first=(flag_g==1))

The group dict is eventually written to the HDF5 file.

Ingest Spectra¶

The ingest_spectra() method loops through the spectral files, reads each, and populates a hdf5 dataset. It also converts the meta table into a separate, parallel hdf5 dataset.

Here is an example:

hdf = h5py.File('tmp.hdf5','w')
pbuild.ingest_spectra(hdf, 'test', meta, max_npix=50000)

Finish¶

Before writing, the code tests whether the meta data tables can be stacked using the specdb.utils.clean_vstack() method. This may be required for queries of the meta data and spectra extraction. The code will hit a pdb.set_trace() if this fails.

Write the catalog and close the HDF5 file.:

zpri = [str('SDSS'), str('BOSS')]
pbuild.write_hdf(hdf, 'TEST_DB', maindb, zpri, gdict, 'v01')

Also writes the creation date and sets the version. zpri is a list of strings indicating the priority given to redshift assignment.

Table of Contents