Details of the GSAS2CIF program

National Institute of Standards and Technology

Details of the GSAS2CIF program
Please note, the software documented on these web pages is slightly out of date with respect to the current GSAS distribution. The web pages will be updated soon.
Template files
As will be discussed further, below, the GSAS2CIF program utilizes three CIF template files:

template_publ.cif
A template file with publication information and descriptive information about how a refinement was performed.
template_phase.cif
A template file with descriptive information about a chemical phase.
template_instrument.cif
A template file with descriptive information about a diffraction instrument used for powder diffraction data collection.

Initialization

The standard GSAS header file, DISAGLCM.FOR, is used to define the PARNAMS array, needed later for the call to RDCOVAR. The standard GSAS header file, ARRAYSZE.FOR, defines the array sizes used in DISAGLCM.FOR. The first two subroutine calls (STRTRN & PROGNAM) in GSAS2CIF are the same as are found in every GSAS program
pdCIFs have a unique feature, a block id, which is used to make references between blocks. The block id is intended to be a unique string that will never appear in any other CIF, so for this reason, it is typically composed of several items, including:
date & time,
author name,
instrument name &
project name.
The next step in the code sets variable EXPRNAME to the name based on the GSAS experiment file. This is used as the project name for both the block_id and the data block name. Note that EXPRNAME variable is restricted to 20 characters. Subroutine VSTRNG is used to make sure that the project name is valid (printable ASCII characters, no spaces & no vertical bars ("|")]. Subroutine LENCH is used to determine the length of an ASCII string.
The file for CIF output is then opened.
The date & time is obtained in the CIF format using the GSAS routine GSDATE. Note that this code is somewhat compiler-specific.
The author name is read from the GSAS experiment file. If it is not present, it is requested from the user and is then saved in the experiment file. The name is saved in two forms, AUTHOR, as entered and SAUTHOR, without spaces & special characters, for use in block_id's.
At this stage the number of histograms and phases are counted and several flags are set:

IFPWDR
True if one or more powder histograms are present
IFSNGL
True if one or more single crystal histograms are present
NUMPHAS
Number of phases
NPWDHIST
Number of powder histograms
ONEBLOCK
True if the CIF can be a single block -- one phase and one histogram

For each histogram a check is made to see if a name exists for the instrument and if not the user is requested to input the name. This name is used in the block_id for data blocks. Ideally this instrument name is read from the instrument parameter file associated with the original raw data for the histogram.
To generate uncertainties on coordinates, the variance-covariance matrix is read, written by program GENLES, using the standard GSAS routine RDCOVAR. If the .CMT file, which contains this information, is out of date, noted because it does not match the cycle number in the current experiment file, RDCOVAR generates a warning message and sets variable NUMPAR to zero. If this happens, or the file cannot be read, then the user is consulted to see if the program should continue, as uncertainies may not be needed when a CIF will be created for export of coordinates to a plotting programs, but will be needed to fully document a result for publication.
The cycle number and the most recent sum of squares of differences (SUMDSQ1) is read from the .EXP file. A file of interatomic distances and angles (.DIS file), written by program DISAGL is opened. Both the cycle number and sum of squares of differences must match the file contents. If the file does not match or cannot be read, as before the user must choose between exiting and continuing.

Overall CIF Information

The writing to the CIF file then starts. The first (or only, if ONEBLOCK is true) data name is created, from variable EXPRNAME (which is restricted to 20 characters or less to avoid too large data names). Then a block_id is created and written. In the single-block (ONEBLOCK is true) case, the block_id includes the instrument name. In the multi-block case, the first block will have information relevant to all blocks, but the histogram(s) and phase(s) will be in separate blocks. Then a few audit records are created. In the cases where it is unclear if quotes will be needed, subroutine WRVAL is used. This in turn calls subroutine ADDQUOTE to check the string to be written and to add quotes as needed.
The publication information template is then copied using subroutine CPTMPLTE, as is described further below. The overall template is read from file EXPNAM_publ.cif, where EXPNAM.EXP is the name of the GSAS experiment file. If file EXPNAM_publ.cif does not exist, then then it is created using the contents of file template_publ.cif, which is read from the current directory, or if not present, from the distribution version in the GSAS data directory. The template contents is also copied directly into the the output CIF file as well.
Results that pertain to the overall refinement are then written using subroutine OVERALL. Subroutine OVERALL creates CIF entries that describe how the refinement has progressed. For example, _refine_ls_shift/su_max describes the maximum parameter shift in the last cycle of refinement. Powder profile R-factors are written later, when the powder diffraction histograms are written, in subroutine WRPOWDHIST, but if more than one powder histogram is computed, the GSAS also computes overall powder R-factors, for all histograms combined and these overall R-factor values are written out here. Note that in thise case, a multi-block CIF will be created.

Phase Information

GSAS2CIF then loops over phases. Note that if there is is more than one phase, the information for each phase must be placed in a separate block. This is also true if more than one histogram present. Thus, if more than one phase or more than one histogram is present (or both), then the phase information and histogram information will be in separate blocks. However, in the case where there is one histogram and one phase, then variable ONEBLOCK is set to true and both the phase information and histogram information will be included in the same CIF block. This is why a data block is started and a block_id created in this loop only for multiblock CIFs.
The phase information template is then copied using subroutine CPTMPLTE (described further below). This template is read from file EXPNAM_phaseN.cif, where EXPNAM.EXP is the name of the GSAS experiment file and N is the phase number. If file EXPNAM_phaseN does not exist, then then it is created using the contents of file template_phase.cif, which is read from the current directory, or if not present, from the distribution version in the GSAS data directory. The template contents is also copied directly into the the output CIF file as well.
The next step is to write out the phase information. This is done using subroutine WRITEPHASE, discussed further below.

Histogram Information

After phase processing is complete, then processing of histograms starts. First, the instrument name, input earlier is read from the .EXP file. The program then processes powder diffraction histograms differently from single crystal histograms.

Powder Histograms

The first step in processing powder diffraction histograms is to begin a data block and create a block_id, unless a single block CIF is being created.
The next step is to insert the histogram template file. This is done by creating two file names, EXPNAM_instnameNN.cif and instname.cif, where instname is the instrument name that was input before. Subroutine CPTMPLTE (see below) first attempts to read from file EXPNAM_instnameNN.cif in the current directory. If this file is not found, it is created and filled with the more generic histogram template file. If the EXPNAM_instnameNN.cif file is not found, subroutine CPTMPLTE attempts to read file instname.cif first from the current directory, or if not present, from the GSAS data directory. The instname.cif file is intended as a template file that has been customized for a particular instrument. If this file cannot be found, then file template_instrument.cif is read from the current directory, or if not present there, from the distribution version of this file in the GSAS data directory.
Parameters and powder data are written in subroutine WRPOWDHIST, as is described further below
Finally, the reflections are listed using subroutine WRREFLIST.
Single-Crystal Histograms

For single crystal histograms, the only output that is generated is that the reflections are listed using subroutine WRREFLIST.

Copying of Template Files

Subroutine CPTMPLTE is used to copy a template file into a CIF. The strategy is that descriptive information to be included in the output CIF will be placed in a set of project-specific templates files, rather than added directly to the CIF. In this way, GSAS2CIF can be rerun at any point and the descriptive information will be included in the output CIF. The project-specific template files are named similarly to the GSAS experiment file. If any of these project-specific template files are not found, they are created using either customized template files or if not found using a standard version distributed with GSAS. This allows a user to reuse customizes CIF template files, so that, for example, the instrument description can be reused.
Subroutine CPTMPLTE first attempts to read a version of the template file that has been customized for the current project from the current directory. The name of this file is passed to CPTMPLTE in variable LOCALCOPY. If this file is not found, a second file name, found in variable TEMPLATE1, is tried (if this name is non-blank). The subroutine looks first in the current directory and if not there, in the GSAS data directory, which is determined by an environment variable (gsas). If the TEMPLATE1 file is opened, the file LOCALCOPY is created and opened for output. If neither the LOCALCOPY nor the TEMPLATE1 file is found, a third file name, found in variable TEMPLATE2 is opened. The subroutine looks first in the current directory and if not there, in the GSAS data directory, which is determined by an environment variable (gsas). If this file is not found, the program stops, as this implies that the environment variable or required files are not properly installed. If the TEMPLATE2 file is opened, the file LOCALCOPY is created and opened for output.
After either file LOCALCOPY, TEMPLATE1 or TEMPLATE2 is opened, it is copied one line at a time. All lines are copied to the LOCALCOPY file, if TEMPLATE1 or TEMPLATE2 is being read. Each line is checked for a string starting with "data_", lines following the data flag are copied into the output CIF. The template file should not have any lines greater than 80 characters, so if any are noted, a warning message is produced.

Subroutine WRITEPHA

Subroutine WRITEPHA is used to write information about a phase into the CIF output. This information includes the unit cell parameters, symmetry, atomic parameters and refinement parameters that are phase-specific.
The first step in subroutine WRITEPHA is to call the standard GSAS routine DSGREAD, which reads in the coordinates and their uncertainties, as well as unit cell parameters and symmetry information. Note that much of is read into common blocks.
The GSAS phase name is read from the .EXP file and is written out as CIF item _pd_phase_name. Unit cell parameters are then read. from the .EXP file. GSAS subroutine BMATRX is used to compute the reciprocal unit cell parameters for later use. The unit cell parameters are then written out, where only the unique parameters (i.e. a & c for a tetragonal cell are given with uncertainties. The unit cell volume is computed (alas, without an uncertainty estimate at present) using GSAS subroutine CELVOL and the unit cell type is written by translating the Laue class.
The space group is written in exactly the same format as used by GSAS, except that the trailing "R" flag, which is used by GSAS to indicate a rhombohedral setting, is removed if present.
The symmetry operations are then written from the matrices generated by subroutine DSGREAD. This requires a bit of extra work, as GSAS does not generate the symmetry operations corresponding to a center of symmetry or lattice centering, if present. Note that offsets applied to symmetry operations to bring them into agreement with the International Tables, for example, after a -x,1/2+y,-z is operated on by body center +1/2,+1/2,+1/2, the resulting symmetry operation, 1/2-x,1+y,1/2-z, is conventionally written 1/2-x,y,1/2-z. The offsets applied to symmetry operations are saved in array OFFSET, for later use with interatomic distance and angle listings.
Each symmetry operation is assigned a code (_symmetry_equiv_pos_site_id) which is later referenced in the interatomic distance and angle listings. This corresponds to the GSAS symmetry element number, plus 100 times the centering operation number and multiplied by -1, for elements generated by a center of symmetry. Note that centric space groups in GSAS always have their origin at the center of symmetry (Origin 2, where a choice is offered). So, the center of symmetry operation is always -x,-y,-z.
Atoms are then processed. First, counters used for unit cell contents are zeroed. Then the atom table loop headers are written, and the atom labels are checked, to make sure that all atom labels are unique, since this is required by the CIF standard. It would be confusing if GSAS2CIF changed atom labels, so if any atom have the same labels, a warning message is generated. Users are given the option to produce a CIF that contains duplicate atom labels since few, if any programs that read CIFs will even notice.
The atom table is generated. GSAS subroutine SYTSYM is used to compute the site multiplicity for each atom. The composition of the unit cell is then noted using arrays COMPTBL and FRACTBL, where FRACTBL is used for atoms that have partial occupancy. Note that if no atoms are written a series of "?" values are written to match the table header. If any atoms with anisotropic displacement parameters are noted, a second atom table is generated with the anisotropic U_ij values.
The number of atoms of each type are then listed. Note that due to categorization rules, these numbers of atoms and the scattering factor values can only appear in the same loop, if in the same block. Thus, if a single block CIF will be created, this loop is skipped and these numbers are reported in subroutine WRPOWDHIST. information must appear this loop must be combined with the
A value for Z is determined (_cell_formula_units_Z) by dividing the unit composition for all of the fully occupied atoms by 2 and 3 as many times as is possible, without resulting in non-integer values. The chemical formula (_chemical_formula_sum) and the mass (not weight!) of a formula unit (_chemical_formula_weight) are then computed by dividing the total values for a unit cell using the value of Z. Note that the determination of Z is sometimes a matter of style and on occasion users may decide to edit the resulting CIF file to change Z. If done, be sure to change _chemical_formula_sum and _chemical_formula_weight accordingly.
GSAS offers two types of preferred orientation corrections, the traditional March-Dollase correction and a spherical harmonic expansion representation of the orientation distribution function. The March-Dollase terms are set by histogram and phase, while each phase has a single set of spherical harmonic terms for all histograms. In the case where a multiblock CIF is being written, the spherical harmonic terms are written in subroutine WRITEPHASE. In the single block case, the these terms are written in WRPOWDHIST.
Interatomic distances are then written. This is done by reading through the .DIS file and then writing out distances matching the current phase. Note that each pair of atoms has a code that identifies the symmetry operations needed to generate the site from the coordinates in the list. These codes are written by program DISAGL into the .DIS file, but must be corrected with the offsets generated previously. Note that no operations are applied to the first atom, so that its site code is always ".".
Interatomic angles are then written. This is done as before by reading through the .DIS file and then writing out angles matching the current phase. Note that the no operations are applied to the central atom, so that its site code is always ".", but the two outer atoms each have a code that identifies the symmetry operations needed to generate the sites from the coordinates in the list. These codes are written by program DISAGL into the .DIS file, but must be also corrected with the offsets generated previously.

Subroutine WRPOWDHIST

Subroutine WRPOWDHIST is used to write histogram-related information into the output CIF. This information includes the powder data, as well the computed pattern, as well as the many parameters used within GSAS in order to reproduce the experimental data.
Subroutine WRPOWDHIST starts by counting the number of phases present in the histogram and by calling GSAS subroutine OPNPRF, which opens the binary file containing the observed and computed pattern.
In preparation for writing the preferred orientation parameters, the number of March-Dollase & (when needed) spherical harmonic terms are counted. The March-Dollase terms are stored as IMD and the spherical harmonic terms are stored as IODF. The treatment of these preferred orientation parameters is a bit complex, since there are n x m March-Dollase terms, but only n spherical harmonic terms, where there are n phases and m histograms. If a multiblock CIF will be created, the spherical harmonic terms are included in the phase data block(s), while the March-Dollase terms are included in the histogram data block(s). CIF only defines one term for recording the preferred orientation correction, so in the single-block case, care is taken to make sure that both sets of terms are output together, should both ever be used together. It makes little sense for both types of corrections to be used together, but the goal is that the CIF should reflect how the refinement was performed.
For a multi-block CIF, a phase table is written as the first information recorded in the CIF by WRPOWDHIST for multi-block CIFs. Some of the items contained in the phase table are:

_pd_phase_block_id
a pointer to the block that defines the phase
_pd_phase_mass_%
the percentage of the current phase
_pd_proc_ls_profile_function
the profile function and terms, described as a text item. Much of the text is generated in subroutine LISTPRF.
_pd_proc_ls_pref_orient_corr
the March-Dollase correction, when needed.

Alternately, in the single-block case, the unit cell contents are determined, so that unit cell contents can be included with the scattering factors. A table of atoms, with scattering factors or scattering lengths is then written, optionally with the unit cell contents, is then written.
The next section writes information about the probe species: x-ray vs. neutron, wavelength(s), polarization & other calibration information. Note that in the case where two wavelengths are present, these values must be placed in a loop and are labeled with _diffrn_radiation_type. This creates a violation of the CIF categorization rules, as the category of _diffrn_radiation_type differs from the _diffrn_radiation_wavelength data items. Alas, there is no other way at present to solve this.
Subsequent sections of subroutine WRPOWDHIST write out different types of histogram information. R-factors are read from the .EXP file and are written to the CIF. Background terms are then written. These terms are written as a text field, as there are no formal definitions for expressing these values yet. Absorption corrections are then written, again as a text field. Then, the maximum and minimum extinction & absorption corrections are written. At present, CIF does not define such terms for extinction, so _gsas_exptl_extinct_corr_T_min and _gsas_exptl_extinct_corr_T_max are used.
While the preferred orientation correction was written in previously described sections, of subroutine WRITEPHASE and WRPOWDHIST for the case of multi-block CIFs, it has not been processed in the case of single-block CIFs. Preferred orientation corrections are written for March-Dollase and/or spherical harmonic terms Likewise, profile terms were written in previously for the case of multi-block CIFs, profile terms for a single block CIF are written here.
The file is then given a time-stamp and the calculation method is defined as the Rietveld method. There is no particular reason to do this here or anywhere else in this subroutine.

Listing of Powder Data
In the final section of subroutine WRPOWDHIST, the observed and computed data are written. These data are written first to a scratch file and are then read back and are written to the CIF. In this way the numbers can be aligned in columns. There is no requirement within CIF to do this, but it looks nice and makes the numbers much easier to peruse.
However, before the data can be read, a number of flags are set to determine how the data will be stored. The pdCIF dictionary defines two different two-theta data items, one for data with a fixed step size and the other for variable step sizes. In the case of constant wavelength the two-theta values are checked if they are in constant steps. Note that the data are retrieved from the binary histogram file using subroutine READPRF. Also a flag, FIXEDBKG, is set if fixed background points are used to define the background for the pattern.
In certain circumstances, GSAS does not include all the observed diffraction data in the binary histogram file. This can happen when data points are skipped or are averaged. This also happens when sections of the observed histogram at the beginning or end of the pattern are not used. At present, this latter condition is not tested. However, when data points are skipped or are averaged, the initial unprocessed histogram is written in a separate loop from the processed observed and computed patterns. The unprocessed histogram data is written by subroutine WRITERAWDATA. Note that this subroutine uses the _pd_meas_ data items, so if this routine is called, noted by variable MOREOBS set to true, the later part of WRPOWDHIST uses the _pd_proc_ data items.
If the x-axis corresponds to two-theta values in in constant steps, the starting, ending and step values are written to the CIF as _pd_meas_2theta_range_ CIF items, unless variable MOREOBS is true, in which case _pd_proc_2theta_range_ items are used. In the latter case, zero corrections are applied to the values.
Depending on settings previously determined, different options are used to write the header for the observed and computed pattern. Then the data are written, using care to only write the items matching the header entries to the scratch file. Finally, the values are read from the scratch file and are written to the CIF.

Subroutine FESD

Subroutine FESD is used to format numbers for CIF in a variation of crystallographic notation. Note that if the uncertainty value is negative, the uncertainty is not printed, but rather, the uncertainty determines the number of significant digits. This routine does not currently handle numbers in exponential notation.

Subroutine LISTPRF

Subroutine LISTPRF is used to describe the current peak profile function and list some of the profile parameter values.

Subroutine WRITERAWDATA

Subroutine WRITERAWDATA is used to copy the contents of a GSAS raw data file (sometimes named .RAW or .GSAS) directly to a CIF file. Data are read using GSAS subroutine READHST. If the uncertain values match the square root of the intensity values at each point, then it is assumed the intensity values are counts so that uncertanties are not specified. Note that if points are two-theta values and are in constant steps, the _pd_meas_2theta_range CIF items are used in place of _pd_meas_2theta_scan.

Subroutine WRREFLIST

Subroutine WRREFLIST is used to write a table of reflections for both single crystal and powder histograms. The first step in WRREFLIST is to determine if the reflection table corresponds to a powder or single-crystal histogram. This makes a difference when writing the reflection loop header, as a wavelength id, _pd_refln_wavelength_id, is written for multichromatic powder diffraction data, a phase id, _pd_refln_phase_id, is written when more than one phase is present, as well as the reflection d-space, _refln_d_spacing, and a local data item, that defines the relative reflection intensity, _gsas_i100_meas.
The reflection values are written on a scratch file, so that the data can be written into the CIF in nice neat columns. Again, this is not needed, but makes the data more easily read by humans.

Comments, corrections or questions: crystal@NIST.gov
Last modified 04-April-2003 by website owner: NCNR (attn: Craig Brown) $Revision: $ $Date: $