Taxometrics

The program code to perform taxometric analyses is now available in the RTaxometrics package (Ruscio & Wang, 2021). The old code, available through mid-2017 in many updated versions of the TaxProg.R file, was reworked from scratch. This page provides a brief introduction to the RTaxometrics package. If you have any comments, questions, or suggestions regarding the RTaxometrics package, please be in touch.

I strongly urge you to download, install, and use the RTaxometrics package rather than using the program code it replaces. If you need a copy of the latter, however, I can send that to you on request.

The RTaxometrics Package

The code was prepared with a number of features and improvements in mind:

  • Ease of use. Each of the functions that users might run (see below for an overview) is designed to be as simple as possible to execute. For example, rather than running separate functions to perform each taxometric procedure, a main function is now used that allows users to choose all procedures to include in the analysis.
  • Checks. The main functions for running taxometric analyses both incorporate a number of checks to assure that the data and program parameter specifications are acceptable. If any changes are necessary, they are reported.
  • Efficiency. The code has been written in ways that save on memory usage and run time. For example, rather than generating populations of comparison data each time a new taxometric procedure is performed, the new functions save the populations for use with each procedure.
  • Status updates. Once a command is run, progress is reported as various actions are taken. This includes preliminary checks of the data and program parameter specifications as well as analyses of empirical and comparison data.
  • Streamlined output. The text and graphical output has been simplified to help users focus on the most important results and incorporate them into their documents. For example, a single graph sheet is created with the results from all taxometric procedures performed.
  • CCFI profiles. In addition to running procedures using a single population of categorical comparison data, it’s now possible to use a large number of populations of categorical comparison data that vary systematically in the taxon base rate. This yields a profile of CCFI values that can be helpful for differentiating categorical from dimensional data and, if the results support categorical structure, estimating the taxon base rate.
  • Readability. The code is written and documented in ways consistent with contemporary R style.
  • Modularity. Redundancy in the code is eliminated by creating special-purpose functions for repeated actions.
  • Ease of modification. Data and program parameters necessary for many functions are passed as a single list object rather then lengthy, special-purpose lists for each function.

Functions in RTaxometrics

There are only five functions in the package that users should access directly; all others are called, as needed, by higher-order functions. Below is a brief overview of these five functions, with sample commands you can try to see how they work. Full details, including the available options for each function, are provided in the reference manual and in Wang and Ruscio (in press).

CreateData()

This function creates an artificial data set, either categorical or dimensional. It returns a data set with each variable in its own column plus, in the final column, a classification variable. For categorical data this contains actual group membership and for dimensional data cases are assigned to groups using the base-rate classification method. The results should be assigned to an object which can then be submitted to other functions. For example, the following commands would generate data sets that are categorical and dimensional, respectively:

test.cat <- CreateData(“cat”)

test.dim <- CreateData(“dim”)

ClassifyCases()

This function assigns cases to groups using the base-rate classification method. It takes a data set as input and returns the data plus, in the final column, a new classification variable. Its results should be assigned to an object which can then be submitted to other functions. For example, the following command would classify the cases in data matrix “x” using a taxon base rate of .25 and store the results in object x.groups:

x.groups <- ClassifyCases(x, p = .25)

* Note that any data set created using the CreateData() function (e.g., “test.cat” or “test.dim”) will already have a classification variable in the final column, so ClassifyCases() should not be used.

CheckData()

This function checks to see whether the data are appropriate for taxometric analysis. It takes as input a data set (with the final column containing a classification of cases to putative groups), checks a number of characteristics, and provides text output that includes notes expressing any concerns that a user should consider (e.g., small sample size, low between-group validity, large within-group correlations). Any data that will be submitted for taxometric analysis should be screened first using this function. For example, the following command would check the “test.cat” data set:

CheckData(test.cat)

RunTaxometrics()

This is the main function to perform taxometric analyses. It takes as input a data set (with the final column containing a classification of cases to putative groups), allows the specification of many program parameters (e.g., which taxometric procedures to perform), and produces text output plus a graph sheet. Any data submitted to this function should first be screened using the CheckData() function described above. For example, the following command would perform taxometric analyses using the “test.cat” data set:

RunTaxometrics(test.cat)

Note that this function will plot graph(s) plus return a single list object with all of the program specifications and numerical output. If you call it as shown above, the list object will be displayed. Alternatively, you can assign this to an object to store the results:

output <- RunTaxometrics(test.cat)

RunCCFIProfile()

This is the main function to construct a CCFI profile by performing all required taxometric analyses. Ruscio et al. (2017; accepted draft available below) found that a CCFI profile can be used to differentiate between categorical and dimensional data with accuracy equal or superior to other CCFI-based techniques and/or to estimate the taxon base rate with less bias and greater precision than using the conventional formulas. This function takes as input a data set (without any classification of cases to putative groups), allows the specification of many program parameters (e.g., which taxometric procedures to perform), and produces text output plus a graph sheet. Any data submitted to this function should first be screened using the CheckData() function described above. For example, the following command would construct a CCFI profile using the “test.cat” data set:

RunCCFIProfile(test.cat[, 1:4])

As with the RunTaxometrics() function, the RunCCFIProfile() function returns a single list object with all of the program specifications and numerical output. If you call it as shown above, the list object will be displayed. Alternatively, you can assign this to an object to store the results:

output <- RunCCFIProfile(test.cat[, 1:4])

References

Ruscio, J., & Carney, L. M., Dever, L., Pliskin, M., & Wang, S. B. (2018). Using the Comparison Curve Fit Index (CCFI) in taxometric analyses: Averaging curves, standard errors, and CCFI profiles. Psychological Assessment, 30(6), 744-754. PDF

Ruscio, J., & Wang, S. B. (2022) Taxometric analysis. In G. J. G. Asmundson (Ed.), Comprehensive Clinical Psychology (2nd Ed., Vol. 3, pp. 148-175). New York: Elsevier. PDF

Ruscio, J., & Wang, S. B. (2023). RTaxometrics: Taxometric analysis. R package version 3.2.1. Available at https://CRAN.R-project.org/package=RTaxometrics.