# German Stata Users Group meeting 2019

23. and 24. May 2019 in Munich

## Announcement and Program

### Meeting

The German Stata Users Group Meeting will be held at the Seidlvilla Munich on Friday, May 24 2019. Everybody from anywhere who is interested in using Stata is invited to attend this meeting. The meeting will include presentations about causal models, general statistics, and data management, both by researchers and by StataCorp staff. The meeting will also include a „wishes and grumbles“ session, during which you may air your thoughts to Stata developers.

### Workshop

On the day before the conference, Jan-Paul Heisig, from the Social Science Research Center Berlin (WZB) will hold a workshop on „Multiple Imputation“. Details about the workshop are given below the program.

### Conference Dinner

There is (at additional cost) the option of an informal meal at a restaurant in Munich on Friday evening. Details about accommodations and fees are given below the program

### Language

The conference language will be English because of the international nature of the meeting and the participation of non-German guest speakers.

### Time table

8:30–9:00 | Registration |

9:00–9:15 | Welcome |

Katrin Auspurg/Josef Brüderl | |

9:15–10:15 | On the shoulders of giants, or not reinventing the wheel |

Nicholas J. Cox | |

10:15–10:45 | Stata export for metadata documentation |

Anne Balz, Klaus Pforr, Florian Thirolf | |

10:45–11:00 | Coffee |

11:00–12:00 | Agent Based Models in Mata |

Maarten Buis | |

12:00–12:30 | How to use Stata’s sem command with nonnormal data? |

Wolfgang Langer | |

12:30–13:15 | Lunch |

13:15–13:45 | xtoaxaca: Extending Oaxaca-Blinder-Decomposition to longitudinal data |

Hannes Kröger/Jörg Hartmann: | |

13:45–14:15 | Linear Discrete-Time Hazard Estimation using Stata |

Harald Tauchmann | |

14:15–14:45 | Heat (and hexagon) plots in Stata |

Ben Jann | |

14:45–15:00 | Coffee |

15:00–15:30 | Extending the label commands (cont’d) |

Daniel Klein | |

15:30–16:00 | The production process of the Global MPI |

Nicolai Suppa | |

16:00–16:15 | Coffee |

16:15–17:00 | Performing and interpreting discrete choice analyses in Stata |

Joerg Luedicke | |

17:00–17:30 | Wishes and Grumbles |

17:30 | End of meeting |

### Time table – Workshop

10:00 | Registration |

10:30 | Begin |

18:30 (expected) | End |

## How to get to the venue

##### Google Route Planer: http://www.seidlvilla.de/kontakt.html

## from the airport

Take S1 or S8 (direction does not matter) to station „Marienplatz“. Change to U3 (direction „Moosach“) or U6 (direction „Fröttmaning“) and leave at „Giselastraße“ (3rd station). Leave the subway-station. Follow „Leopoldstraße“ north (about 200m), turn right into „Nikolaistraße“. After another 100m you reach „Nicolaiplatz“.

### from the railway main station

Take any S-Bahn direction „Marienplatz“. At „Marienplatz“ change to U3 (direction „Moosach“) or U6 (direction „Fröttmaning“) and leave at „Giselastraße“ (3rd station). Leave the subway-station. Follow „Leopoldstraße“ north (about 200m), turn right into „Nikolaistraße“. After another 100m you reach „Nicolaiplatz“.

## Registration

Participants are asked to travel at their own expense. The meeting fee covers costs for refreshments and lunch.

Meeting fees (all prices are incl. VAT) | Price |
---|---|

Meeting only: Professionals | 45€ |

Meeting only: Students | 35€ |

Workshop only | 65€ |

Workshop + meeting | 85€ |

There will also be an optional informal meal at a restaurant in Konstanz on Friday evening at an additional cost.

Elena Tsittser

Phone: +49-212-26066-51

E-Mail: elena.tsittser@dpc-software.de

# Abstract – Stata Users Group meeting

## 9:00–9:15 Welcome

Katrin Auspurg (Ludwigs-Maximilians-Universtiy Munich), Josef Brüderl (Ludwigs-Maximilians-Universtiy Munich)

## 9:15–10:15 On the shoulders of giants, or not reinventing the wheel

Nicholas J. Cox (Department of Geography, University of Durham, UK)

njcoxstata@gmail.com

*Abstract:* Part of the art of coding is writing as little as possible to do as much as possible. In this presentation, I expand on this truism and give examples of Stata code to yield tables and graphs in which most of the real work is delegated to workhorse commands. In tabulations and listings, the better known commands sometimes seem to fall short of what you want. However, some preparation commands (such as generate, egen, collapse or contract) followed by list, tabdisp, or tab can get you a long way. In graphics, a key principle is that graph twoway is the most general command, even when you do not want rectangular axes. Variations on scatter and line plots are precisely that, variations on scatter and line plots. More challenging illustrations include commands for circular and triangular graphics, in which x and y axes are omitted with an inevitable but manageable cost in re-creating scaffolding, titles, labels, and other elements. The examples range in scope from a few lines of interactive code to fully developed programs. This presentation is thus pitched at all levels of Stata users.

## 10:15–10:45 Stata export for metadata documentation

Anne Balz (GESIS–Leipniz Institute for the Social Sciences), Klaus Pforr (GESIS–Leipniz Institute for the Social Sciences), Florian Thirolf (GESIS–Leipniz Institute for the Social Sciences)

Klaus.Pforr@gesis.org

*Abstract:* Precise and detailed data documentation is essential for the secondary analysis of scientific data, whether survey or official microdata. Among the most important metadata in this perspective are variable and category labels as well as frequency distributions and descriptive statistics. To generate and publish these metadata from Stata datafiles, an efficient export interface is essential. It must be able to handle large and complex data sets, take into account the specifics of different studies and generate flexible output formats (depending on the requirements of the documentation system). As a solution to the problem described above, we present the process developed in the GML (German Microdata Lab) at GESIS. In the first step, we show how an aggregated file with all required metadata can be generated from the microdata. In the second step, this file is transformed into a standardized DDI format. Additionally, we will present the implementation for MISSY (the metadata information system for official microdata at GESIS), which includes some practical additions (e.g. communication with the MISSY database to retrieve existing element identifiers, writing an output tailored to the MISSY data model).

## 10:45–11:00 Coffee

## 11:00–12:00 Agent Based Models in Mata

Maarten Buis (University of Konstanz)

maarten.buis@uni-konstanz.de

*Abstract:* An Agent Based Model is a simulation in which agents, that each follow simple rules, interact with one another and thus produce a often surprising outcome at the macro level. The purpose of an ABM is to explore mechanisms through which actions of the individual agents add up to a macro outcome, by varying the rules that agents have to follow or varying with whom the agent can interact (i.e. varying the network).

A simple example of an ABM is Schelling’s segregation model, in which he showed that one does not need racists to produce segregated neighbourhoods. The model starts with 25 red and 25 blue agents, each of which live in a cell of a chessboard. They can have up to 8 neighbours. In order for an agent to be happy she needs to have some, e.g. 30%, agents in the neighbourhood of the same color. If the agent is unhappy, she will move to another empty cell that will make her happy. If we repeat this till everybody is happy or nobody can move, we will often end up with segregated neighbourhoods.

Implementing a new ABM will always require programming, but a lot of the tasks will be similar across ABMs. For example, in many ABMs the agents live on a square grid (like a chessboard), and can only interact with their neighbours. I have created a set of Mata functions that will do those task, and that someone can import in her or his own ABM. In this talk I will illustrate how to build an ABM in Mata with these functions.

## 12:00–12:30 How to use Stata’s -sem- command with nonnormal data? A new nonnormality correction for the RMSEA and incremental Fit Indices, CFI and TLI

Wolfgang Langer (University of Luxembourg and Martin-Luther-Universität Halle-Wittenberg)

wolfgang.langer@soziologie.uni-halle.de

*Abstract:* Traditional fit measures like RMSEA, TLI or CFI are based on noncentral chi-square distribution assuming the multinormal distribution of the observed indicators (Jöreskog 1970). If this assumption is violated programs like Stata, EQS or LISREL calculate the fit indices using the Sattora-Bentler correction. It rescales the Likelihood-Ratio-chi2-test statistics of the baseline and the hypothesized model (Satorra & Bentler 1994, Newitt & Hancock 2000). Brosseu-Liard et al. (2012, 2014) and Savalei (2018) showed in their simulation studies with nonnormal data two results: Firstly, they demonstrated that the ad-hoc nonnormality corrections of the fit indices provided by the SEM software made the fit worse, better or unchanged as compared to their uncorrected counterparts. Secondly, the authors proposed new robust versions of RMSEA, CFI and TLI which performed very well in their simulation studies. They systematically varied the sample size, the extend of misspecification and nonnormality. Therefore the same rule of thumb or criteria which are used for normal distributed data can be applied to assess the fit of the strutural equation model.

My robust_gof.ado estimates the robust RMSEA, CFI and TLI fit measures using the corrections proposed by Brosseu-Liard et al. and Savalei. It also estimates a 90 percent confidence interval for the Root-Mean-Squared-Error of Approximation. My robust_gof.ado can be executed after the sem command with the vce(sbentler) option and estat gof, stats(all) as a postestimation command by simply typing robust_gof. It returns the estimated fit indices and scalars as r containers. A survey example of islamophobia analysis in Germany will be presented to demonstrate the usefulness of my robust_gof.ado.

- Asparouhov, T. & Muthén, B. (2010): Simple second order chi-square correction. Los Angels, Ca: MPLUS Working papers
- Borsseau-Liard, P.E., Savalei, V. & Li, L. (2012): An investigation of the sample performance of two nonnormality corrections for RMSEA. Multivariate Behavioral Research, 47, 6, pp. 904-930
- Borsseau-Liard, P.E. & Savalei, V. (2014): Adjusting incremental fit indices for nonnormality. Multivariate Behavioral Research, 49, 5, pp. 460–470
- Jöreskog, K.G. (1970): A general method for analysis of covariance structures. Biometrika, 57,2, pp. 239–251
- Jöreskog, K.G., Olsson, U.H. & Wallentin, F.Y. (2016): Multivariate Analysis with LISREL. Switzerland: Springer

## 13:30–14:00 xtoaxaca: Extending the Oaxaca-Blinder Decomposition Approach to longitudinal data analyses

Hannes Kröger (DIW–German Institute for Economic Research, Berlin), Jörg Hartman (University of Göttingen)

HKroeger@diw.de

*Abstract:* The Oaxaca-Blinder (Oaxaca, 1973) decomposition approach has been widely used to attribute group level differences in an outcome to differences in endowment, coefficients, and their interactions. The method has been implemented for Stata in the popular oaxaca program for cross-sectional analyses (Jann, 2008). However, in the last decades research questions are more often focused on the decomposition of group based differences in change over time, e.g. diverging income trajectories, as well as decomposition of change in differences between groups, e.g. change in the gender pay gap. Another way in which decomposition analyses can be extended to longitudinal data is repeated crosssectional decompositions and time point specific decomposition of group levels differences based on latent growth curve models. We propose to unify these different research interest under a more general longitudinal perspective that has each of the applications as a special case of the Oaxaca-Blinder decomposition. We present this general view, give examples of applied research questions that can be answered within the framework and propose a first version of the program xtoaxaca which works as a postestimation command in Stata in order maximize flexibility in modeling and forms of longitudinal decompositions according to the user’s preferences.

- Jann, B. (2008). The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal, 8(4), 453–479.
- Oaxaca, R. (1973). Male-female wage differentials in urban labor markets. International Economic Review, 693–709.

## 14:00–14:30 Linear Discrete-Time Hazard Estimation using Stata

Harald Tauchmann (Friedrich-Alexander-University, Erlangen-Nürnberg; RWI–Leibniz Institute for Economic Research, Essen; CINCH – Health Economics Research Center, Essen)

harald.tauchmann@fau.de

*Abstract:* Linear fixed-effects estimators (first-differences, within-transformation) are workhorses of applied econometrics because they straightforwardly allow for eliminating unobserved time-invariant individual heterogeneity that otherwise may cause a bias. I show that these popular estimators are, however, biased and inconsistent when applied in a discrete-time hazard setting, that is with the outcome variable being a binary dummy indicating an absorbing state. I suggest an alternative, computationally simple, adjusted first-differences estimator. This estimator is shown to be consistent in the considered non-repeated event setting, under the assumption of unobserved time-invariant individual heterogeneity being uncorrelated with the changes in the explanatory variables. Using higher-order differences instead of first-differences allows for consistent estimation under weaker assumptions. Finally I introduce the new user written command xtlhazard that implements the suggested estimation procedure in Stata.

## 14:30–15:00 Heat (and hexagon) plots in Stata

Ben Jann (University of Bern)

ben.jann@soz.unibe.ch

*Abstract:* In this talk I will present two new Stata commands to produce heat plots. Generally speaking, a heat plot is a graph in which one of the dimensions of the data is visualized using a color gradient. One example of such a plot is a two-dimensional histogram in which the frequencies of combinations of binned X and Y are displayed as rectangular (or hexagonal) fields using a color gradient. Another example is a plot of a trivariate distribution where the color gradient is used to visualize the (average) value of Z within bins of X and Y. Yet another example is a plot that displays the contents of a matrix, say, a correlation matrix or a spacial weights matrix, using a color gradient. The two commands I will present are called heatplot and hexplot.

## 15:00–15:30 Coffee

## 15:30–16:00 Extending the label commands (cont’d)

Daniel Klein (INCHER –International Centre for Higher Education Research, Kassel) klein@incher.uni-kassel.de

*Abstract:* Four years ago, I first suggested extending Stata’s label commands to manipulate variable labels and value labels in a more systematic way. By now, I have refined my earlier approach and released a new suit of commands, elabel, that facilitate these everyday data management tasks. In contrast to most existing community-contributed commands to manipulate labels, elabel does not focus on solving specific problems. Combined with any of Stata’s label commands, it address any problem related to variable and value labels. elabel accepts wildcard characters in value label names, allows referring to value labels via variable names, selects subsets of integer to text mappings, and applies any of Stata’s functions to define new or modify existing labels. I demonstrate these features drawing on various examples and show how to write new ado-files to further extend the elabel commands.

## 16:00–16:30 The production process of the Global MPI

Nicolai Suppa (Juan de la Cierva Research Fellow, Centre d’Estudis Demogràfics, Spain)

nsuppa@ced.uab.es

*Abstract:* The Global Multidmensional Poverty Index is a cross-country poverty measure published by the Oxford Poverty and Human Development Initiative since 2010. The estimation requires household survey data, as multidimensional poverty measures seek to exploit the joint distribution of deprivations in the identification step of poverty measurement. Moreover, analyses of multidimensional poverty draw on several aggregate measures (e.g, headcount ratio, intensity) as well as on dimensional quantities (e.g, indicator contributions). Robustness analyses of key parameters (e.g., poverty cutoffs and weighting schemes) further increase the number of estimates.

During the 2018 revision for the first time figures for 105 countries were calculated in one round. For a large scale project like this, a clear and efficient workflow is essential. This paper introduces key elements of a workflow and presents solutions with Stata for particular problems, including (i) the structure of a comprehensive results file, which facilitates both analysis and production of deliverables, (ii) the usability of the estimation files, (iii) the collaborative nature of the project, (iv) the labelling of 1200 subnational units, and (v) the documentation of code and decisions. This paper seeks to share the gained experienced and to subject both the principal workflow and selected solutions to public scrutiny.

## 16:30–16:45 Coffee

## 16:45–17:30 Performing and interpreting discrete choice analyses in Stata

Joerg Luedicke (StataCorp)

*Abstract:* Discrete choice models are used across a variety of disciplines to analyze choices made by individuals or other decision-making entities. Stata supports a variety of discrete choice models such as multinomial logit and mixed logit models. While applying these models to a given dataset can be straightforward, it is often challenging to interpret their results. In this talk, I will provide an overview of Stata’s discrete choice modeling capabilities and show how to use postestimation commands to get the most out of these models and their interpretation.

## 17:30–18:00 Whishes and Grumbles

*Abstract:* Users air their whishes and grumbles and StataCorp responds

# Abstract – Stata Workshop

### Location: Department of Sociology at the LMU

Cip-Pool, 4th floor, Roum No. 409

www.en.soziologie.uni-muenchen.de

### Time: 10:00 – Registration

Workshop: 10:30 – expected to 18:30

## Workshop: Multiple Imputation

### by Jan Paul Heisig, Wissenschaftszentrum Berlin für Sozialforschung (WZB)

Missing data are a pervasive problem in the social sciences. Data for a given unit may be missing entirely, for example, because a sampled respondent refused to participate in a survey (survey nonresponse). Alternatively, information may be missing only for a subset of variables (item non-response), for example, because a respondent refused to answer some of the questions in a survey. The traditional way of dealing with item nonresponse, referred to as „complete case analysis“ (CCA) or „listwise deletion“, excludes any observation with missing information from the analysis. While easy to implement, complete case analysis is wasteful and can lead to biased estimates. Multiple imputation (MI) addresses these issues and provides more efficient and unbiased estimates if certain conditions are met. Therefore, it is increasingly replacing CCA as the method of choice for dealing with item nonresponse in applied quantitative work in the social sciences. The goals of the course are to introduce participants to the principles of MI and its implementation in Stata, with a primary focus on MI using iterated chained equations (also known as „fully conditional specification“).

### Prerequisites

Basic knowledge of Stata.