Postfile can be used to generate new, computed datasets and to subset data into new datasets in Stata. |
There is an awesome command in Stata you may not yet have heard of called postfile. It lets you create Stata datasets. It can be used for a variety of tasks:
- For creating subsets of your data.
- For building datasets of generated statistics.
- For Monte Carlo-type experiments.
If you haven't already, I suggest creating a folder for code snippets or using a program to do so. You should try snippely which is free and available on Windows, Mac, and Linux as an adobe air application.
The following can be the first snippet you create. If you program in multiple languages you'll want to sort them into groups or folders.
Postfile Snippet
Postfile Snippet
- 1
- 2
- 3
- 4
- 5
- 6
- 7
tempname memhold
postfile `memhold' str10 stringvar numvar using "C:/Filename.dta" , replace
post `memhold' (your_str_var) (your_num_var)
postclose `memhold'
postfile `memhold' str10 stringvar numvar using "C:/Filename.dta" , replace
post `memhold' (your_str_var) (your_num_var)
postclose `memhold'
line 1 creates a temporary object (aka "tempname") to save the contents of the file you are creating temporarily. Each postfile related command that follows will reference this tempname (`memhold').
line 2 sets the names of the variables that will be in your new dataset. Here, I've specified the tempname (`memhold') followed by the variables I'm going to save. Order is important. The order you specify them in will be the their order in your new data set. Additionally, you have to specify string variables with a str followed by the number of characters desired. This can be confusing because string variables are then two words and numerical words just one.
Notice that I have used the replace option. That is because I tend to have to rerun postfile scripts frequently as they tend to be written through trial and error. This can be useful but be careful - you may overwrite something you want!
line 4 will write one observation to your dataset. You have to specify the tempname (`memhold') again. Data written in each variable is enclosed in parenthesis and corresponds to the order specified in line 2. The colors indicate this correspondence between line 2 and line 4. You will have to loop or repeat this command for every observation you want to write.
line 6 saves your file.
Example 1: Using Postfile to create subsets of data
The easiest way to create a subset of your data is to use a loop. Suppose you had a dataset in which you wanted only a specific set of observations and two of the variables. You could use postfile to write a file containing this paired down data set. This example uses the auto dataset (you can load it using line 1 from below). Take a moment to familiarize yourself with the dataset before proceeding.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
sysuse auto
tempname memhold
postfile `memhold' str18 car_make car_price using "C:/Filename.dta" , replace
foreach j in 2 4 7 18 23 29 33 38 39 47 51 53 67 74 {
post `memhold' (make[`j']) (price[`j'])
}
postclose `memhold'
tempname memhold
postfile `memhold' str18 car_make car_price using "C:/Filename.dta" , replace
foreach j in 2 4 7 18 23 29 33 38 39 47 51 53 67 74 {
post `memhold' (make[`j']) (price[`j'])
}
postclose `memhold'
Again, line 3 sets up a place for to store the data.
In line 4, you'll have to change the filename (green) to a location on your computer. Notice in line 4 how the variables I named correspond to the items in parenthesis in line 7.
line 6 starts a loop of observations we want to subset into our new dataset.
line 7 writes the observation (using subscripting), but because it is looped it will write an observation for 2, 4, 7, etc. So for the first time through the computer would think something along these lines:
- post `memhold' (make[`j']) (price[`j'])
- post `memhold' (make[2]) (price[2])
- post `memhold' ("AMC Pacer") (4749)
Subset of auto data created using postfile command in Stata |
If you know how to use the return list then you can create tables of returned data (statistics) for further manipulation, or organization. The following example will save the average weight of cars by mpg from the auto dataset. Take a moment to familiarize yourself with the dataset before proceeding.
To simplify writing your postfile scripts, you might want to try running a statistical test and then adapting it to run within a loop and for use with postfile.
I used the return list command to see what values were available. All of them [r(N), r(sum_w), r(mean), etc.] could be inserted into a new dataset using postfile, but we would have to specify variables for each following the postfile command. I'm only interested in mean weights by mpg.
Notice in the code below how I adapted summarize weight if mpg == 12 for use within a loop and within postfile to apply to all of the mpg values which range from 12/41 (line 7).
To simplify writing your postfile scripts, you might want to try running a statistical test and then adapting it to run within a loop and for use with postfile.
Ran a simple statistical test and checked the return list to see what data we could grab for postfile. |
Notice in the code below how I adapted summarize weight if mpg == 12 for use within a loop and within postfile to apply to all of the mpg values which range from 12/41 (line 7).
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
sysuse auto
tempname memhold
postfile `memhold' mpg avg_weight using "C:/Filename.dta" , replace
forvalues mpg = 12/41 {
summarize weight if mpg == `mpg'
if "`r(mean)'" != "" {
post `memhold' (`mpg') (`r(mean)')
}
}
postclose `memhold'
tempname memhold
postfile `memhold' mpg avg_weight using "C:/Filename.dta" , replace
forvalues mpg = 12/41 {
summarize weight if mpg == `mpg'
if "`r(mean)'" != "" {
post `memhold' (`mpg') (`r(mean)')
}
}
postclose `memhold'
Remember to change the filename to an appropriate location on your computer (line 3).
Again, notice that the order of the variables following the postfile command corresponds to each piece of data saved under the post command (line 4, line 9).
Line 6 starts a loop which cycles through all of the possible values for mpg in the dataset.
Line 7 uses the command summarize to calculate the mean weight for each value of mpg.
Line 8: I had to use an if command to prevent empty values from being written (line 8) . If you look closely at the dataset you will see that mpg ranges from 12-41, but doesn't include certain values, like 13. No means will be generated for mpg == 13 and this if statement is actually necessary to prevent an error: postfile expects a value for `r(mean)' and without one it will stop working.
Line 9: The first variable in our new dataset is the mpg. I used `mpg', which is set by the current iteration of our loop, as the mpg variable. To insert the average weights for each mpg, I inserted (`r(mean)') in the second slot to correspond with its location as the second variable - avg_weight.
Line 13: As usual, saves the file.
The end result of Example 2. |
Can you post the dataset?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks for sharing this. In explaining the second example, I couldn't follow your lines. I think it will be great if you shift numbers up!
ReplyDeleteCheers,
Eilya.
Perfect! exactly what I needed!
ReplyDeleteIs it possible to post the summarize data using a by statement as well? Posting all categories for the by variable?
ReplyDeleteNice explanation and article. Continue to write articles like these, and visit my website at https://usacrack.info/ for more information.
ReplyDeleteDAEMON Tools Pro Crack
iMazing Crack
Stata 17.0 Crack
HMA Pro VPN Crack
Enscape3D Sketchup Crack
Quillbot Premium Crack
EasyWorship Crack
EaseUS MobiSaver Crack
Teorex Inpaint 9.1 Crack
ReplyDeleteNice explanation and article. Continue to write articles like these, and visit my website at https://usacrack.info/ for more information.
SparkoCam Crack
Stata Crack
Nice explanation and article. Continue to write articles like these, and visit my website at https://usacrack.info/ for more information.
ReplyDeleteAirServer Crack
Express VPN Crack
Avast Cleanup Premium Crack
Windows 11 Crack
Stata Crack
Respect and that i have a swell supply: How To Budget House Renovation house repair quotation
ReplyDelete