{smcl} {* 18May2009}{...} {title:Steps in creating a dataset} {p2colset 4 9 9 2}{...} {p2col:1.}Select observations.{p_end} {p2col:2.}Select variables.{p_end} {p2col:3.}Create new variables.{p_end} {p2col:4.}Add variable labels.{p_end} {p2col:5.}Add notes to variables.{p_end} {p2col:6.}Add value labels.{p_end} {p2col:7.}Rearranging variables.{p_end} {p2col:8.}Add internal documentation.{p_end} {p2col:9.}Add a datasignature.{p_end} {p2col:10.}Save the dataset.{p_end} {p2col:11.}SUMMARY of key commands.{p_end} {it:Note} For naming conventions for datasets, see {helpb wfstyle:Workflow style conventions.} {title:1. Select observations} {it:Drop observations based on comparison} {cmd:keep if} {it:exp} {cmd:drop if} {it:exp} For example: {cmd: drop if female==1} {it:Drop observations based on observation number} {cmd:keep in} {it:numeric list} {cmd:drop in} {it:numeric list} For example: {cmd: drop in 1/22} {title:2. Select variables} {cmd:keep } {it:variable list} {cmd:drop } {it:variable list} For example: {cmd: drop female year tempvar} {cmd: keep var1-var090} {title:3. Create new variables} {cmd:generate }{it:newvar} {cmd:=} {it:exp} [{cmd:if}] [{cmd:in}] {cmd:clonevar }{it:newvar} {cmd:=} {it:sourcevar} [{cmd:if}] [{cmd:in}] {cmd:replace }{it:newvar} {cmd:=} {it:exp} [{cmd:if}] [{cmd:in}] {p}Remember that if a variable is new, give it a new name. Verify that new variables are constructed correctly. Keep source variables used to create new variables.{p_end} {title:4. New variables should have a variable label} {cmd:label variable} {it:varname} "{it:label}" For example: {cmd:label var artsqrt "Square root of # of articles"} {it:Commands for listing variable labels and other information} {cmd:codebook} [{it:varlist}] [{cmd:if}] [{cmd:in}] {cmd:, compact} {cmd:describe} [{it:varlist}] [{cmd:if}] [{cmd:in}] {cmd:, simple fullnames numbers} {cmd:nmlab} {it:varlist} {cmd:tab1} {it:varlist} {title:5. New variables should have notes documenting their provenance} {cmd:note} {it:variable name} {cmd:"}{it:variable label}{cmd:"} For example: {cmd:local tag "pub# truncated at 20 \ wf5-varnotes.do jsl 2008-04-09."} {cmd:note pub1trunc: `tag'} {title:6. Add value labels to all categorical variables} {it:Step 1: Defining labels} (ideally, 10 characters or shorter) {cmd:label define Lyesno 1 1_yes 0 0_no} {it:Step 2: Assigning labels} {cmd:label value wc Lyesno} {title:7. Rearranging variables.} {cmd:aorder} [{it:varlist}] {cmd:order} {it:varlist} {cmd:move} {it:variable-to-move target-variable} For example: {cmd:aorder} {cmd:order id} {title:8. Add internal documentation to a dataset.} {cmd:label data "Workflow data from Russian ISSP 2002 \ 2008-04-02"} {cmd:note: wf-isspru02.dta \ workflow ch 6 \ wf6-save.do jsl 2008-04-05} {title:9. Add a datasignature.} {cmd:datasignature set} To confirm: {cmd:use wf-datasig02, clear} {cmd:datasignature confirm} To change the signature after modifications to the dataset: {cmd:datasignature set, reset} {title:10. Save the dataset} {cmd:quietly compress} {cmd:saveold mydata, replace} {title:12. Summary of key commands} {cmd:keep if} {it:exp} {cmd:drop if} {it:exp} {cmd:keep in} {it:numeric list} {cmd:drop in} {it:numeric list} {cmd:keep } {it:variable list} {cmd:drop } {it:variable list} {cmd:generate }{it:newvar} {cmd:=} {it:exp} [{cmd:if}] [{cmd:in}] {cmd:clonevar }{it:newvar} {cmd:=} {it:sourcevar} [{cmd:if}] [{cmd:in}] {cmd:replace }{it:newvar} {cmd:=} {it:exp} [{cmd:if}] [{cmd:in}] {cmd:label variable} {it:varname} "{it:label}" {cmd:note} {it:variable name} {cmd:"}{it:variable label}{cmd:"} {cmd:local tag "pub# truncated at 20 \ wf5-varnotes.do jsl 2008-04-09."} {cmd:note pub1trunc: `tag'} {cmd:aorder} [{it:varlist}] {cmd:order} {it:varlist} {cmd:label data "Workflow data from Russian ISSP 2002 \ 2008-04-02"} {cmd:note: wf-isspru02.dta \ workflow ch 6 \ wf6-save.do jsl 2008-04-05} {cmd:datasignature set} {cmd:datasignature set, reset} {cmd:quietly compress} {cmd:saveold mydata, replace}