Help
Glossary
Categorical – refers to data in which variable values are mutually exclusive, such as nominal or ordinal
Continuous – refers to data in which variable values are numerically related, such as interval or ratio
Interval – values are related by sequence and imply equal degree or distance between each value such as on a scale
Nominal – data that are measured as having presence or absence of some quality and are unrelated numerically
Ordinal – data that are measured as related in sequence only such as rank ordered
Ratio – data that are measured as related by sequence, are fractionally possible, and have an absolute zero on a scale
Scoring - the process of converting raw data from a data collection instrument into a matrix of numbers based on the instrument used to collect the data, the level of data measurement, the type of statistics planned for analyzing the data, and the requirements of the statistical program used for analysis
Significance criterion – a ratio typically set at .05 in the social sciences which represents the reciprocal of 95% confidence that results of a statistical test are not due to chance, reported in the results as Sign. or p-value
References
Note: Links to code sources are available in the scripts provided for each test demonstration. Additional sources may be found on the relevant modules in Stat-Tree.
American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). https://doi.org/10.1037/0000165-000
Andrews, F. M., Klem, L., Davidson, T. N., O'Malley, P. M., & Rodgers, W. L. (1981). A guide for selecting statistical techniques for analyzing social science data (2nd ed.). The University of Michigan: Institute for Social Research.
Babbie, E. (2002). The basics of social research (2nd ed.). Wadsworth.
Bostrom, R. N. (1998). Communication research. Waveland.
Brewer, J., & Hunter, A. (1989). Multimethod research: A synthesis of styles. Sage.
Christensen, L. B., & Stoup, C. M. (1991). Introduction to statistics for the social and behavioral sciences (2nd ed.). Brooks/Cole.
Doornik, J. A., & Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics, 70(Supplement). https://doi.org/10.1111/j.1468-0084.2008.00537.x
Frey, L. R., Botan, C. H., & Kreps, G. L. (2000). Investigating communication: An introduction to research methods (2nd ed.). Allyn and Bacon.
Hemphill, J. F. (2003). Interpreting the magnitudes of correlation coefficients. American Psychologist, 58(1), 78-79. https://doi.org/10.1037/0003-066X.58.1.78
Keyton, J. (2018). Communication research: Asking questions, getting answers (5th ed.). McGraw-Hill.
Knapp, T. R. (1978). Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85(2), 410-416. https://doi.org/10.1037/0033-2909.85.2.410
Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 74, 150–156. https://doi.org/10.1016/j.jesp.2017.09.011
Mertler, C.A. & Vannatta, R.A. (2005). Advanced and multivariate statistical methods: Practical application and interpretation (3rd ed.). Pyrczak.
Nimon, K. F. (2012). Statistical assumptions of substantive analyses across the general linear model: A mini-review. Frontiers in Psychology, 3. Article 322. https://doi.org/10.3389/fpsyg.2012.00322
Pearson, K. (1895, June 20). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, Series 5, 50(302), 157-175. https://doi.org/10.1080/14786440009463897
Reinard, J. C. (2006). Communication research statistics. Sage.
Salkind, N. J. (2000). Statistics for people who (think they) hate statistics. Sage.
Salkind, N. J. (2010). Omega squared. In N. J. Salkind (Ed.), Encyclopedia of research design (p. 289). Sage. https://doi.org/10.4135/9781412961288.n289
Sommer, B., & Sommer, R. (2002). A practical guide to behavioral research: Tools and techniques (5th ed.). Oxford University.
Student. (1908). The probable error of a mean. Biometrika, 6(1), 1-25. https://doi.org/10.2307/2331554
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (5th ed.). Pearson/Allyn & Bacon.
Verardi, V., & Dehon, C. (2010). Multivariate outlier detection in Stata. The Stata Journal, 10(2), 259-266.
Links
Below are some helpful links
DataNovia (https://www.datanovia.com/en/)
Laerd Statistics (https://statistics.laerd.com/)
LibreTexts™ Statistics (https://stats.libretexts.org/)
Rosane Rech: Statistics and design of experiments (https://statdoe.com/)
Statology (https://www.statology.org/tutorials/)
Statistical Methods (https://stat-methods.com/)
STHDA: Statistical tools for high-throughput data analysis (http://www.sthda.com/english/)
UCLA Advanced Research Computing (https://stats.oarc.ucla.edu/)
What is R? (https://www.datacamp.com/blog/all-about-r)
Python for Beginners (https://www.python.org/about/gettingstarted/)
The Julia Language (https://julialang.org/)
About
Stat Tree™ began as a project to show students in my undergraduate research methods class how to choose the best statistical test for a given research hypothesis by answering a few simple questions. The first draft was created as a visual flowchart in Microsoft Excel™ in October 2006. In creating the flowchart, I consulted most directly the flowchart created by Andrews, Klem, Davidson, O'Malley, and Rodgers (1981). However, their guide was created for statisticians and statistics students. What was needed was a simplified decision tree for non-statisticians and students learning research design for the first time. Thus began the development of what would become Stat Tree™.
For years, a flat-file statistics decision tree was made available to students in my classes in PDF format based on my edits to the original Excel flowchart. In Fall 2007, I taught my first graduate-level research methods class, and began adding more sophisticated statistical techniques to the flowchart. As students’ research hypotheses diversified, the need to add more techniques, including non-parametric statistics, became more necessary.
In Fall 2014, I set about developing my first hybrid online course in research methods. It became apparent that what was needed in this teaching modality was an online interactive statistics decision tree. Instructional Designer, Michael Brand of the University of Texas at San Antonio suggested that I try using a software program designed for making interactive web-based puzzles, Quandary, to create the decision tree. Over the period of the next few months, I used this program to create the infrastructure for the Statistics Decision Tree. I edited the first html prototype using Adobe Dreamweaver and other tools and added in content from the research methods courses that I had authored over the years. The original goal making a simplified version for undergraduates had morphed into a need to create two separate interactive Statistics Decisions Trees, one for the undergraduate class and a more advanced version for the graduate class. My projects caught the attention of the UTSA Office of Online Learning, and I was invited to present my project at the Innovations of Online Learning Conference in May 2015.
In Spring 2018, my project caught the attention of the UTSA Office of Commercialization who encouraged me to participate in regional National Science Foundation Innovation Corps (Southwest NSF i-Corps™) training in Houston, May 2018. I put together a team including Les Doss and David Cortez and travelled to Houston for a 3-week training program. The training was a success and resulted in a recommendation from the Regional i-Corps™ to submit a proposal to the National Science Foundation, based on the Statistics Decision Tree prototype. That proposal was awarded a $50,000 NSF i-Corps™ grant, Spring 2019 (Award ID: 1925391). The Stat Tree™ team was born.
The Stat Tree™ team travelled to Nashville for national training in customer discovery and commercial development of the Statistics Decision Tree prototype. Over the course of seven weeks, the Stat Tree™ team travelled all over the country to discover needs of potential users. What was confirmed in these interviews (128 interviews conducted) was that a strong need existed for an interactive Statistics Decision Tree tool outside the classroom in multiple industries, as reported in the peer-reviewed Journal of Strategic Innovation and Sustainability. The most common challenge in these industries was the need to make quick decisions related to statistics for hypothesis testing among non-statisticians. Additionally, interviews revealed that users needed a tool that would demonstrate how to conduct statistical testing using multiple scripting languages used in the most common statistical packages, including R, SAS™, Stata™ and SPSS™.
This latest iteration of Stat Tree™ provides a statistics decision tree covering 30 different parametric and non-parametric bivariate and multivariate tests with scripting samples for all tests in R, SAS™, Stata™, SPSS™, and now Julia and Python. Stat Tree™ also provides demonstrations for several statistical diagnostics and univariate and multivariate descriptive statistics including normality testing and outliers detection.
Stat Tree was featured in the UTSA News. Watch the StoryTellers Movement podcast on YouTube for a discussion about Stat Tree development.
H. Paul LeBlanc III, Ph.D., (ORCID iD: 0000-0001-5053-0403, Google Scholar, LinkedIn)
Founder and CEO
Stat Tree™, LLC
What's New
Stat Tree™ has added demonstrations for common univariate, bivariate, and multivariate statistical tests in Julia and Python, with scripts and outputs. See Tests for a complete list of statistical tests demonstrated in Stat-Tree™. Also, check out the new and improved Help!
Version 5.1 (released December 10, 2024)
Provided demonstrations for an additional six tests in Python and Julia (Mann-Whitney U, Kruskal-Wallis H, Wilcoxon Signed-Rank, Spearman and Biserial Correlation, and Multiple Linear Regression).
Provided demonstrations for univariate descriptive statistics (in SPSS™, SAS™, Stata™, R, Python, and Julia) including:
Categorical descriptives and data visualization
Ordinal descriptives and data visualization
Expanded demonstrations for univariate descriptive statistics in Excel including copyable formulas.
Provided demonstrations for multivariate descriptive statistics in Python and Julia including:
Multivariate normality test
Multivariate outlier detection tests
Revised help section includes:
Reshaping data from long to wide format (in SPSS™, SAS™, Stata™, R, Python, and Julia)
Creating new environments (in R, Python, and Julia)
Version 5.0 (released September 1, 2024)
Provided demonstrations for the original six tests (Chi-Square, Paired and Independent Samples t-Tests, One-way ANOVA, Pearson Correlation, and Simple Linear Regression in Python and Julia.
Provided demonstrations for univariate descriptive statistics in Julia and Python including:
Measures of central tendency: Means and Standard Deviations.
Skewness and Kurtosis.
Scatterplots and Histograms.
Univariate Normality and Outlier Detection tests.
Included demonstrations (in SPSS™, R, SAS,™ Stata™, Julia, and Python) for Levene’s Test for Homogeneity of Variance under Descriptive Statistics: Assumption Testing.
Revised help section to include:
Working with command prompts for terminal-based programming in R, Julia, and Python.
Importing data in Julia and Python.
Installing packages in R, Julia, and Python.
Restarting the command prompt kernel in R, Julia, and Python.
Version 4.1 (August 15, 2023), First public release.
Fixed broken links revealed following posting to online server.
Posted to online server.
Version 4.0 (January 2023)
Provided demonstrations for all tests in SPSS™, SAS™, Stata™, and R.
Added branches for descriptions of Exploratory and Confirmatory Factor Analysis, Path Analysis, and Structural Equation Modelling.
Version 3.0 (May 2019), Revised prototype for NSF i-Corps™.
Provided demonstrations in SPSS™ for:
Canonical Correlation,
Discriminant Analysis, and
Logistic Regression.
Version 2.0 (2016), Revised Statistics Decision Tree for graduate students.
Provided demonstrations in SPSS™ for factorial, multivariate, and Repeated Measures ANOVA.
Provided demonstrations in SPSS™ for multiple Regression.
Provided demonstrations in SPSS™ for non-parametric equivalents of presented parametric tests.
Version 1.0 (2014), First iteration of the Statistics Decision Tree for undergraduate students.
Provided demonstrations in SPSS™ for Chi-Square, Paired and Independent Samples t-Tests, One-way ANOVA, Pearson Correlation, and Simple Linear Regression.
Provided demonstrations of univariate descriptive statistics and visualizations in SPSS™.
Contact
Stat-Tree™