20th German Stata Conference

Friday, June 16, 2023 at the Humboldt University Berlin

The 20th German Stata Conference will be held on Friday, 16th June 2023 in Berlin the Humbold University’s Jacob-and-Wilhelm-Grimm-Zentrum. We would like to invite everybody from everywhere who is interested in using Stata to attend this meeting. The academic program of the meeting is being organized by Johannes Giesecke (Humboldt University Berlin), and Ulrich Kohler (University of Potsdam). The conference language will be English due to the international nature of the meeting and the participation of non-German guest speakers.


8:15 – 8:45 Registration

8:45 – 9:00 Welcome
Johannes Giesecke

9:00 – 10:00 Drivers of COVID-19 deaths in the United States: A two-stage modelling approach
Christopher F. Baum, Kit Baum, Andrés Garcia-Suaza, Miguel Henry, Jesús Otero

10:00 – 10:30 Discrete-Time Multistate Regression Models in Stata
Daniel C. Schneider

10:30 – 10:45 mfcurve: Visualizing Results From Multifactorial Designs
Daniel Krähmer

10:45 – 11:15 Coffee

11:15 – 11:45 Estimating the Price Elasticity of Gasoline Demand in Correlated Random Coeffcient Models
Michael Bates and Seolah Kim

11:45 – 12:15 Influence Analysis with Panel Data using Stata
Annalivia Polselli

12:15 – 12:45 nopo: An implementation of a matching-based decomposition technique with postestimation
Maik Hamjediers and Maximilian Sprengholz

12:45 – 13:45 Lunch Break

13:45 – 14:45 Linking frames in Stata
Jeff Pitblado, StataCorp

14:45 – 15:45 Causal inference and treatment-effect decomposition with Stata
Joerg Luedicke, StataCorp

15:45 – 16:15 Coffee

16:15 – 16:45 lgrgtest: Lagrange-Multiplier Test after Constrained Maximum-Likelihood Estimation using
Harald Tauchmann

16:45 – 17:00 Assessing the fit of Generalized Structural Equation Models with Stata? Theory and practical
Wolfgang Langer

17:00 – 17:30 Power boost or source of bias? Monte Carlo evidence on ML covariate adjustment in randomized
Lukas Fervers

17:30 – 18:00 Open panel discussion with Stata developers

18:00 End of meeting

9:00–10:00 Drivers of COVID-19 deaths in the United States: A two-stage modeling approach Kit Baum (Boston College), Andrés Garcia-Suaza (University del Rosario), Miguel Henry (Greylock McKinnon Associates), Jesús Otero (Universidad del Rosario)

Abstract: We offer a two-stage (time-series and cross-section) econometric modeling approach to examine the drivers behind the spread of COVID-19 deaths across counties in the United States. Our empirical strategy exploits the availability of two years (January 2020 through January 2022) of daily data on the number of confirmed deaths and cases of COVID-19 in the 3,000 U.S. counties of the 48 contiguous states and the District of Columbia. In the first stage of the analysis, we use daily time-series data on COVID-19 cases and deaths to fit mixed models of deaths against lagged confirmed cases for each county. Because the resulting coefficients are county specific, they relax the homogeneity assumption that is implicit when the analysis is performed using geographically aggregated cross-section units. In the second stage of the analysis, we assume that these county estimates are a function of economic and sociodemographic factors that are taken as fixed over the course of the pandemic. Here we employ the novel one-covariate-at-atime variable-selection algorithm proposed by Chudik et al. (2018) to guide the choice of regressors.

10:00–10:30 Discrete-Time Multistate Regression Models Daniel C. Schneider (MPI for Demographic Research, Rostock)

Abstract: Multistate life tables (MSLTs), or multistate survival models, have become a widely used analytical framework among epidemiologists, social scientists, and demographers. MSLTs can be cast in continuous time or discrete time. While the choice between the two approaches depends on the concrete research question and available data, discrete-time models have a number of appealing features: They are easy to apply; the computational cost is typically low; and today’s empirical studies are frequently based on regularly spaced longitudinal data, which naturally suggests modelling in discrete time. Despite these appealing features, Stata add-on packages have so far only been developed for continuous- time models (Crowther and Lambert, 2017; Metzger and Jones, 2018) or for traditional demographic life table calculations that do not allow for covariate adjustment (Muniz 2020). This presentation introduces the recently published Stata package -dtms- which seeks to fill the gap in software availability for discrete-time multistate model estimation. The -dtms- package provides a well- documented and easy-to-apply set of commands that cover a large set of discrete-time MSLT techniques that currently exist in the literature. It also features inference based on newly derived asymptotic covariance matrices as well as inference on group contrasts.

Crowther, M.J., Lambert, P.C., 2017. Parametric multistate survival models: Flexible modelling allowing transition-specific distributions with application to estimating clinically useful measures of effect differences. Statistics in Medicine 36, 4719–4742. https://doi.org/10.1002/sim.7448

Metzger, S.K., Jones, B.T., 2018. mstatecox: A package for simulating transition probabilities from semiparametric multistate survival models. The Stata Journal 18, 533–563.

Muniz, J.O., 2020. Multistate Life Tables Using Stata. The Stata Journal 20(3):721–45. doi: 10.1177/1536867X20953577.

10:30–10:45 mfcurve: Visualizing Results From Multifactorial Designs Daniel Krähmer (Ludwig-Maximilians-University, Munich)

Abstract Multifactorial designs are used to study the (joint) impact of two or more factors on an outcome. They typically occur in conjoint, choice, and factorial survey experiments but have recently gained increasing popularity in field experiments, too. Technically, they allow researchers to investigate moderation as an instance of treatment heterogeneity by crossing multiple treatments.

Naturally, multifactorial designs quickly spawn a spiraling number of distinct treatment combinations: Even a moderately complex design of two factors with three levels each yields 32 unique combinations. For more elaborate setups, full factorials can easily produce dozens of distinct combinations, rendering the visualization of results difficult.

This presentation introduces the new Stata command mfcurve as a potential remedy. Mimicking the ap- pearance of a specification curve, mfcurve produces a two-part chart: The graph’s upper panel displays average effects for all distinct treatment combinations; its lower panel indicates the presence or absence of any level given the respective treatment condition. Unlike existing visualization techniques, this enables researchers to plot and inspect results from multifactorial designs much more comprehensively. Highlighting potential applications, the presentation will demonstrate mfcurve’s most important fea- tures and options, which currently include replacing point estimates by box plots, and testing results for statistical significance.

11:15–11:45 Estimating the Price Elasticity of Gasoline Demand in Correlated Random Coefficient Models with Endogeneity Michael Bates (University of California, Riverside) and Seolah Kim (University of California, Riverside)

We propose a per-cluster instrumental variables approach (PCIV) for estimating correlated random coefficient models in the presence of contemporaneous endogeneity and two-way fixed effects. We use variation across clusters to estimate coefficients with homogeneous slopes (such as time effects) and within-cluster variation to estimate the cluster-specific heterogeneity directly. We then aggregate them to population averages. We demonstrate consistency, showing robustness over standard estimators, and provide analytic standard errors for robust inference. Basic implementation is straightforward using standard software such as Stata. In Monte Carlo simulation, PCIV performs relatively well against pooled 2SLS and fixed effects IV (FEIV) with a finite number of clusters or finite observations per-cluster. We apply PCIV in estimating the price elasticity of gasoline demand using state fuel taxes as instrumental variables. PCIV estimation allows for greater transparency of the underlying data. In our setting, we provide evidence of correlation between heterogeneity in the first and second stages, violating a key assumption underpinning consistency of standard estimators. We see significant divergence in the implicit weighting when applying FEIV from the natural weights applied in PCIV. Overlooking effect heterogeneity with standard estimators is consequential. Our estimated distribution of elasticities reveals significant heterogeneity and meaningful differences in estimated averages.

11:45–12:15 Influence Analysis with Panel Data using Stata Annalivia Polselli (Essex University)

Abstract The presence of anomalous cases in a data set (i.e., vertical outliers, good and bad leverage points) can severely affect least squares estimates (coefficients and/or standard errors) that are sensitive to extreme cases, by construction. Cook (1979)’s distance is usually used to detect such anomalies in cross- sectional data. This metrics may fail to flag multiple atypical cases (Atkison, 1985; Chatterjee and Hadi, 1988: Rousseeuw and Van Zomeren, 1990) while a local approach overcomes this limit (Lawrance,1995). I formalise statistical measures to quantify the degree of leverage and outlyingness of units in a panel data framework. I hence develop a unit-wise method to visually detect the type of anomaly, quantify its joint and conditional influence, and the direction of the enhancing and masking effects. I conduct the proposed influence analysis using two user-written commands. First, xtinfluence calculates the joint and conditional influence of unit i on unit j, and the relative enhancing and masking effects. A twoway scatter plot or the ssc heatplot can be used to visualize the influence exerted by each unit in the sample. Second, xtlvr2plot (a panel-data version for lvr2plot) produces unit-wise plots displaying the average individual influence and the average normalised squared residual of unit i.

Atkinson, A. C. (1985). Plots, transformations and regression; an introduction to graphical methods of diagnostic regression analysis. Technical report.

Chatterjee, S. and Hadi, A. S. (1988). Impact of simultaneous omission of a variable and an observation on a linear regression equation. Computational Statistics & Data Analysis, 6(2):129–144.

Cook, R. D. (1979). Influential observations in linear regression. Journal of the American Statistical Association, 74(365):169–174.

Lawrance, A. (1995). Deletion influence and masking in regression. Journal of the Royal Statistical Society: Series B (Methodological), 57(1):181–189.

Rousseeuw, P. J. and Van Zomeren, B. C. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411):633–639

12:15–12:45 nopo: An implementation of a matching-based decomposition technique with postestimation commands Maik Hamjediers (HU-Berlin) and Maximilian Sprengholz (HU-Berlin)

Abstract Ñopo (2008) proposed a non-parametric decomosition technique based on matching, which decomposes the observed gap in an outcome between groups into four components. Among the matched sample, the (1) explained component is the part of the gap attributed to compositional differences between groups in predictors of the outcome, and the (2) unexplained component is the part of the gap which would remain if these compositional differences were eliminated. Two additional components capture how unmatched individuals in (3) group A and (4) group B contribute to the gap in the outcome. Ñopo’s technique directly addresses the issue of lacking common support between groups that can bias linear-regression-based decompositions, exhibits a general robustness against functional form mis-specification, and allows to evaluate gaps over the full distribution of the outcome. However, high dimensionality means that there is always a trade-off between the detail of the matching set (to achieve balance between groups) and common support (the share of matches), particularly in small samples. Extending the user written Stata command nopomatch (Atal et al., 2010), our command nopo provides a comprehensive implementation of Ñopo’s matching, including different matching procedures. Postestimation commands allow to investigate the balance after matching, to explore the lack of common support, and to visualize the unexplained component over the outcome distribution. We highlight the merit of this approach and our command by comparing matching to regression-based techniques using a simulation and observational data.

Ñopo, H. (2008). Matching as a Tool to Decompose Wage Gaps. The Review of Economics and Statistics, 90, 290–299.

Atal, J. P., Hoyos, A., Ñopo, H. (2010). NOPOMATCH: Stata module to implement Nopo’s decomposition. Statistical Software Components S457157, Boston College Department of Economics.

13:45–14:45 Linking Frames in Stata Jeff Pitblado (Executive Director of Statistical Software at StataCorp)

Abstract Forthcoming

14:45–15:45 Causal inference and treatment-effect decomposition with Stata Joerg Luedecke, (Senior Social Scientist and Software Developer at StataCorp)

Abstract Forthcoming

16:15–16:45 Lagrange-Multiplier Test after Constrained Maximum-Likelihood Estimation using Stata Harald Tauchmann (FAU Erlangen-Nürnberg)

Abstract Besides the Wald and the likelihood-ratio test, the Lagrange-multiplier test (Rao, 1948; Aitchison and Silvey, 1958; Silvey, 1959)—also known as score test—is the third canonical approach to testing hypotheses after maximum-likelihood estimation. While the Stata commands test and lrtest implement the former two, real Stata does not comprise a general command for imple- menting the latter. This paper introduces the new community-contributed Stata postestimation command lgrgtest that allows for straightforwardly using Lagrange-multiplier test after con- strained maximum-likelihood estimation. lgrgtest is intended to be compatible with all Stata estimation commands that use maximum likelihood and allow for the options constraints(), iterate(), and from(), and obey Stata’s standards for the syntax of estimation commands. lgrgtest can also be used after cnsreg. lgrgtest draws on Stata’s constraint command and the accompanying option constraints(), which only allows for imposing linear restrictions on a model. This results in the limitation of lgrgtest being confined to testing linear constraints only. A (partial) replication of Egger et al. (2011) illustrates the use of lgrgtest in applied empirical work.

Aitchison, J., and S. D. Silvey (1958): Maximum-Likelihood Estimation of Parameters Subject to Restraints. The Annals of Mathematical Statistics 29(3): 813–828.

Egger, P., M. Larch, K. E. Staub, and R. Winkelmann (2011): The Trade Effects of Endogenous Preferential Trade Agreements. American Economic Journal: Economic Policy 3(3): 113–43.

Rao, C. R. (1948): Large Sample Tests of Statistical Hypotheses concerning Several Parameters with Applications to Problems of Estimation. Mathematical Proceedings of the Cambridge Philosophical Society 44(1): 50–57.

Silvey, S. D. (1959): The Lagrangian Multiplier Test. The Annals of Mathematical Statistics 30(2): 389–407.

16:45–17:00 Assessing the fit of Generalized Structural Equation Models with Stata? Theory and practical application Wofgang Langner (University of Halle-Wittenberg)

Abstract nlike Structural Equation Model implementation Stata does not provide any measure of practical significance for the Generalized Structural Equation Model implementated with the gsem command. I propose a proportional reduction coefficient of determination for each structural equation. Using a single factor comparison model I estimate the variance of the latent factor which correspondes to the error set one. For each dependent factor Stata estimates its residual variance corresponding to the error set two. Using the formula of the proportional reduction of error I calculate the coefficient of determination, the PRE-R2. It has a very clear interpretation: How many percentage of the variance of the dependent factor can be explained by the exogenous variables? I demonstrate this with an empirical example and discuss this application for the Rasch and Partial Credit Models.

17:00–17:30 Power boost or source of bias? Monte Carlo evidence on ML covariate adjustment in randomized trials in education Lukas Fervers (University of Cologne and Leibniz-Centre for Life-Long Learning, Bonn)

Statistical theory makes ambiguous predictions about covariate adjustment in randomized trials. While proponents highlight possible efficiency gains, opponents point to possible finite sample bias, a loss of precision in case of many and/or weak covariates as well as the increasing danger of false-positive results due to repeated model specification. This theoretical reasoning suggests that machine-learning (variable selection) methods may be a promising tool to keep the advantages of covariate adjustment, at the same time protecting against its downsides. In this paper, I rely on recent developments of machine learning methods for causal effects and their implementation in STATA to assess the performance of ML methods in randomized trials. I rely on real-world data and simulate treatment effects on a wide range of different data structures, including different outcomes and sample sizes. (Preliminary) results suggests that ML adjusted estimates are unbiased and show considerable efficiency gains compared to unadjusted analysis. The results are fairly similar between different data structures used, as well as robust to the choice of tuning parameters of the ML estimators. These results tend to support the more optimistic view on covariate adjustment and highlight the potential of ML methods in this field.

17:30–18:00 Open panel discussion with Stata developers

Abstract Contribute to the Stata community by sharing your feedback with StataCorp’s developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for you.

Workshop: Stata meets Python
Thursday, June 15, 2023; 9:00 – 17:00
Presenter: Nikos Askitas – Institute of Labor Economics (IZA)
Room: Westpool

On the day before the conference, there will be a one-day workshop on „Stata meets Python“. It introduces how to use Python from within Stata, and how to use Stata from within Python.

Scientific Organizers

Johannes Giesecke
Humboldt University Berlin


Ulrich Kohler
University of Potsdam


Logistics Organizer

DPC Software GmbH (dpc-software.de), the distributor of Stata in several countries, including Germany, the Netherlands, Austria, the Czech Republic, and Hungary.

You can enroll by contacting Natascha Hütter by email or by writing or phoning.

Natascha Hütter
DPC Software GmbH
Phone: +49-212-224716 -21
E-Mail: natascha.huetter@dpc-software.de

Registration fee

Included will be the lunch, coffee and soft drinks in the morning and afternoon break and also pens and books at the Live Event.

Meeting fees (all prices are incl. VAT)Price
Meeting only: Professionals44,99€
Meeting only: Students35€
Workshop only65€
Workshop only: Students50€
Workshop + Meeting85€
Workshop + Meeting: Students70€

Registration for the 2023 German Stata Conference – June 16, 2023 (binding)


  • Workshop - Only
  • Stata Conference - Only
  • Workshop and Stata Conference
(pay by yourself)

All fields marked with an asterisk (*) are mandatory.

A free cancellation is no longer possible 14 days after registration. In case of cancellation after the 14 day cancellation period 100% of the booked amount will be charged.


Pay by PayPal or Bank transfer

Pay by PayPal

PAY Stata Conference and/or Workshop

Pay by Bank transfer

Pay till 10. June 2023!

Konto Inhaber: DPC Software GmbH

Konto Nummer: 237 689 17
Bankleitzahl: 720 200 70
IBAN: DE26 7202 0070 0023 7689 17

Use as Usage: Stata Conference 2023


Natascha Hütter

Phone:+49-212-224716 -21

E-Mail: natascha.huetter@dpc-software.de