The complete package for data analysis, data management and graphics


stata 16 box 200Stata is a complete and powerful statistical package that is intended for researchers in all disciplines. With Stata, you get everything you need in one comprehensive package. Thanks to Stata easy to use environment, which is built around the concept of point-and-click, intuitive command language and online help; you will quickly be able to start using this advanced tool. All analysis and results can then be reproduced and documented for your publications.

In Stata, you have hundreds of statistical tools and methods available. These include everything from basic to advanced statistical methods and analysis. Stata also contains a wide range of commands for handling data files and large data sets. You will also find many useful methods to produce graphs and statistical charts of high quality, which you can directly use in your publications and articles.

Stata is available for Windows, Mac and Unix / Linux.

Contact us for a demo at This email address is being protected from spambots. You need JavaScript enabled to view it. or call +44 (0) 203 695 7810


Stata 16 was announced the 26th of June 2019! 

Please click the link below to read more about the new features in Stata 16:

New features in Stata 16


Stata's new lasso tools let you extract real features from mountains of data. With those features, you can do the following:

  • Predict outcomes
  • Characterize groups and patterns in your data
  • Search over highly nonlinear potential relationships
  • Perform inference on covariates of interest
  • Handle endogenous covariates or unobserved confounders
  • We give you the tools to be sure you are finding real features and not just artifacts in a particular sample.

Why so many uses?

Lasso has its roots in:

  • machine-learning
  • statistics
  • econometrics

Which is to say, it has established capabilities in real-world applications, the rigour of known statistical properties, and the promise of yet more applications.

Truly reproducible reporting

Stata's commands for report generation allow you to create complete Word®, Excel®, PDF, and HTML documents that include formatted text, as well as summary statistics, regression results, and graphs produced by Stata.


Meta-analysis combines the results of multiple studies that answer similar research questions. Does exercise prolong life? Does lack of sleep increase the risk of cancer? Does daylight saving save energy? And more. Many studies attempt to answer such questions, and some report inconclusive or even conflicting results. Meta-analysis helps aggregate the information, often overwhelming, from many studies in a principled way into one unified final conclusion or provides the reason why such a conclusion cannot be reached.

Stata has a long history of meta-analysis methods contributed by Stata researchers, for instance, Palmer and Sterne (2016). Stata now offers the new suite of commands, meta, to perform meta-analysis. The new suite is broad, yet one of its strengths is its simplicity.

Choice models

Finally, answers to real-world and real-research questions.

Prior to Stata 16, the nonlinearities and extra correlations in most choice models made it difficult to answer truly interesting questions. You could easily test whether a covariate was significant and positive but not measure its effect on the probability of a choice. Either you accepted answers to limited questions or you derived solutions to your specific questions and programmed them by hand.

That all changes with Stata 16. Even with complicated models such as multinomial probit or mixed logit, you can now get the answers to truly interesting questions. Your favorite restaurant introduces a new chicken entree? How does that affect demand for its other chicken entrees? Its beef entrees? Its fish entrees?

With Stata 16, answers to such questions, including tests and confidence intervals, are a simple command away.

Python integration

In Stata 16, you can embed and execute Python code from within Stata. Stata's new python command provides a suite of subcommands allowing you to easily call Python from Stata and output Python results within Stata.

You can invoke Python interactively or in do-files and ado-files so that you can leverage Python's extensive language features. You can also execute a Python script file (.py) directly through Stata.

Additionally, the Stata Function Interface (sfi) Python module is included. It provides a bidirectional connection between Stata and Python. It allows you to interact Python's capabilities with Stata's core features. Within the module, classes are defined to provide access to Stata's current dataset, frames, macros, scalars, matrices, value labels, characteristics, global Mata matrices, and more.

Stata supports both Python 2 and Python 3 starting from Python 2.7. You can choose which one to bind to from within Stata.

New in Bayesian analysis

Stata 16 offers extensive additions to Stata's Bayesian suite of commands, which include

  • Multiple chains
  • Gelman–Rubin convergence diagnostics
  • Bayesian predictions
  • Posterior summaries of simulated values
  • MCMC replicates
  • Posterior predictive p-values

In addition, bayes: and bayesmh support new priors pareto(), dirichlet(), and geometric() for specifying, respectively, Pareto, multivariate beta (Dirichlet), and geometric prior distributions. Pareto is a power-law-based distribution. Dirichlet can be used for specifying priors for probability vector parameters. Geometric priors are suitable for modeling count parameters.

Last but not least is that bayes: with multilevel models such as bayes: mixed now runs faster!

Extended regression models for panel-data/multilevel models

Stata's Extended Regression Models (ERMs) now support panel data. ERMs were added last release to Stata. They fit models with problems.

By models, we mean linear regression and interval regression for continuous outcomes, probit for binary outcomes, and ordered probit for ordered outcomes.

By problems, we mean any combination of endogenous and exogenous sample selection, endogenous covariates (unobserved confounders), and nonrandom treatment assignment.

New this release is that ERMs handle yet another problem—panel data (also known as longitudinal data or two-level multilevel data).

Random effects are included in each equation by default. Random effects are correlated; you can omit specific random effects, and you can test the correlations.

Other commands in Stata can fit models with any of the problems listed. ERMs can handle any combination of the above problems and can fit models with continuous, interval, binary, and multiple outcomes.

Import data from SAS and SPSS

One of the first tasks of any research project is reading in data. import sas allows us to import SAS® data from version 7 or higher into Stata. We can import the entire dataset or only a subset of it. With import sas we may also import value labels. Dates, value labels, and missing values are all converted properly from SAS to Stata format.

import spss allows us to bring IBM® SPSS® files (version 16 or higher) and compressed IBM SPSS files (version 21 or higher) into Stata. We can import the entire dataset or only a subset of it. Dates, value labels, and missing values are all converted properly from SPSS to Stata format.

Nonparametric series regression

Nonparametric series regression (NPSR) estimates mean outcomes for a given set of covariates, just like linear regression. Unlike linear regression, NPSR is agnostic about the functional form of the outcome in terms of the covariates, which means that NPSR is not subject to misspecification error.

Multiple datasets in memory

This is about changing the way you work. Datasets in memory are stored in frames, and frames are named. When Stata launches, it creates a frame named default, but there is nothing special about it, and the name has no special or secret meaning. You can rename it.

Precision and sample-size analysis for CIs

The new ciwidth command performs precision and sample-size analysis for confidence intervals (CIs). The goal is to optimally allocate study resources when CIs are to be used for inference or, said differently, to estimate the sample size required to achieve the desired precision of a CI.

ciwidth also lets you investigate the precision in various scenarios, which is useful at the planning stage. You can investigate the tradeoffs among sample size, required CI width, and the probability that the actual CI width will be less than required. And you can examine how each varies with other parameters.

Results can be presented in a table or graph.

Panel-data mixed logit

Mixed logit models are models for choice outcomes. Choices might be modes of transportation, car insurance providers, or types of vacations.

Sometimes individuals make the same decision repeatedly:

  • You choose whether to bike or take a car to work each day.
  • You choose your car insurance provider each year.
  • You choose to to vacation at the beach, mountains, or city each summer.

When data contain repeated choices, we have panel data.

With Stata 16's new cmxtmixlogit command, you can fit panel-data mixed logit models.

Nonlinear DSGE

Dynamic stochastic general equilibrium (DSGE) models are used in macroeconomics to describe the structure of the economy. These models consist of systems of equations that are derived from economic theory. In these models, expectations play an important role in determining the values of variables today. What distinguishes the DSGE model from other time-series models is its close connection to theory and the appearance of expectations in the model.

Macroeconomists use DSGEs to evaluate the impact of policy on outcomes such as output growth, inflation, and interest rates. A DSGE model can nest multiple theories. Researchers can then use estimated parameter values to determine which theory better fits the data.

Stata's new dsgenl command estimates the parameters of DSGEs that are nonlinear in both the parameters and variables by using a first-order approximation to the model's equations at the steady state.

Multiple-group IRT models

Many researchers study cognitive abilities, personality traits, attitudes, quality of life, patient satisfaction, and other attributes that cannot be measured directly. To quantify these types of latent traits, researchers develop instruments–questionnaires or tests consisting of binary, ordinal, or categorical items–to determine individuals' levels of the trait.

Item response theory (IRT) models can be used to evaluate the relationships between a latent trait and items intended to measure the trait. With IRT models, we can determine which test items are more difficult and which ones are easier. We can determine which test items provide much information about the latent trait and which ones provide only a little.

Stata's existing IRT suite fits IRT models.

In Stata 16, we can now make comparisons across groups. This means we can evaluate whether an instrument measures a latent trait in the same way for different subpopulations.


Heckman selection models adjust for bias when some outcomes are missing not at random. Imagine modeling income. The problem is that income is observed only for those who work. Missingness is not random.

Stata fits Heckman selection models and, new in Stata 16, Stata can fit them with panel (two-level) data.

You want to fit the model


where yit is sometimes missing. The equation that determines which yit are not missing is


In these equations, αi, εit, vi, and uit will not be estimated. Their correlations with each other, however, will be estimated along with β and γ.

The above model can be fit even though income is not observed for everyone and even if their employment status changes over time.

Why fit a selection model? Because it is possible that people who work and whose income is therefore observed systematically differ from those who do not, and those differences are for unobserved reasons.

For instance, if more productive people work, their income will be higher than those who do not work. Or, if the income of the less productive is lower, they might need to work more. Allowing for selection allows for either of the above alternatives and other alternatives too. After estimation, we can test whether selection matters.

NLME models with lags, leads, and differences: Growth models, multiple-dose PK models, and more

Existing command menl has new features for fitting nonlinear mixed-effects models (NLMEMs) that may include lag, lead (forward), and difference operators. One important class of such models is the class of pharmacokinetic (PK) models and, specifically, multiple-dose PK models. menl's new features can also be used to fit other models, such as certain growth models and time-series nonlinear multilevel models.

Heteroskedastic ordered probit models

Point sizes for graphics

In Stata 16, you can now specify sizes of graph elements in printer points, inches, and centimetres. Simply add a unit suffix to the size: pt for printer points, in for inches, cm for centimetres, and rs for relative size.

Numerical integration

Stata provides statistical solutions developed by StataCorp, and it provides programming tools for those who want to develop their own solutions. There are two Stata programming languages: ado, which is easy to use, and Mata, which performs numerical heavy lifting. And Stata is integrated with Python.

Linear programming

Stata provides statistical solutions developed by StataCorp, and it provides programming tools for those who want to develop their own solutions. There are two Stata programming languages: ado, which is easy to use, and Mata, which performs numerical heavy lifting. And Stata is integrated with Python.

New features in the Do-file Editor

Stata's Do-file Editor provided syntax highlighting for Stata. It still does. In Stata 16, it also provides syntax highlighting for Python and Markdown.

And Stata 16's Do-file Editor has autocompletion. The editor autocompletes words that already exist in the document, autocompletes Stata commands, and autocompletes quotes, parentheses, braces, and brackets.

Last but not least, you can now use spaces for indentation as well as tabs.

Dark Mode and tabbed windows for Mac

Stata in Korean

Read more about all the new features in Stata 16:


Stata 16 - Why use Stata?

Stata 16 - Features

System requirements

  • Windows 8, 7, Vista, XP, 2012 Server, 2008 Server, 2003 Server (32/64-bit)
  • Mac OSX 10.7 or later, 64-bit, minimum Intel Core 2 Duo processor
  • Linux: Any 64-bit (x86-64 or compatible) or 32-bit (x86 or compatible) running Linux
  • Memory: 512 MB, Hard drive: 600 MB, DVD-ROM
  • Stata for Unix requires a video card that can display thousands of colors or more (16-bit or 24-bit color)

License options

Four editions of Stata

  • Stata IC
    Standard Edition of Stata (max 2047 variables, 2.14 billions of observations).
  • Stata SE
    Stata for large datasets (up to 32 767 variables, 2.14 billions of observations).
  • Stata MP
    Faster version that is designed to utilize today's machines with multicore processors and parallel processing. Delivered by number of cores such as MP2, MP4, MP8, etc. 20 billion observations (limited by hardware memory).

Learn more about the different Stata editions!

Licensing options

  • Single User
  • Network
  • Floating Network licenses
  • Volume Licensing

Subscription single user and volume/network licenses for all license options.

Contact us today for a quote!


We provide technical support for all our customers of Stata.

Please describe your problem as detailed as possible when contacting our support. Remember to always inform about your product version and your operating system (both platform and version).
You find instructions for support here

We also recommend these online supportpages and resources: