Convert USDA National Nutrient Database to Stata Format

Here is some code to help convert the USDA’s National Nutrient Database (ASCII format) into Stata format. This database is useful for quantifying nutritional information and nutritional content of various common food items. Prior to using State, this method requires the use of the linux ‘sed’ command first (as performed on a Mac in Terminal).

After unzipping the (or etc) file downloaded, in Terminal (or a bash shell):

# Make sure you're in the directory containing ONLY the 
# downloaded files before running the next commands, or you 
# could end up modifying important files here!

# This converts quotation marks to two single quotes
LANG=ASCII sed -i '' 's/\"/'\'\''/g' *.txt

# This converts tildes to quotation marks
LANG=ASCII sed -i '' 's/~/\"/g' *.txt

Then in Stata:

insheet using FOOD_DES.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
rename v2 FdGrp_Cd 
rename v3 Long_Desc 
rename v4 Shrt_Desc 
rename v5 ComName 
rename v6 ManufacName 
rename v7 Survey 
rename v8 Ref_desc 
rename v9 Refuse 
rename v10 SciName 
rename v11 N_Factor 
rename v12 Pro_Factor 
rename v13 Fat_Factor 
rename v14 CHO_Factor 
save FOOD_DES,replace

insheet using FD_GROUP.txt,clear delimiter("^")
rename v1 FdGrp_Cd 
rename v2 FdGrp_Desc
save FD_GROUP,replace

insheet using LANGUAL.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
rename v2 Factor_Code
save LANGUAL,replace

insheet using LANGDESC.txt,clear delimiter("^")
rename v1 Factor_Code 
rename v2 Description
save LANGDESC,replace

insheet using NUT_DATA.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
gen str3 Nutr_No = string(v2,"%03.0f")
drop v2
rename v3 Nutr_Val 
rename v4 Num_Data_Pts 
rename v5 Std_Error 
gen str2 Src_Cd = string(v6,"%03.0f")
drop v6
rename v7 Deriv_Cd 
rename v8 Ref_NDB_No 
rename v9 Add_Nutr_Mark
rename v10 Num_Studies
rename v11 Min
rename v12 Max 
rename v13 DF
rename v14 Low_EB
rename v15 Up_EB
rename v16 Stat_cmt
rename v17 AddMod_Date
rename v18 CC
save NUT_DATA,replace

insheet using NUTR_DEF.txt,clear delimiter("^")
gen str3 Nutr_No = string(v1,"%03.0f")
drop v1
rename v2 Units 
rename v3 Tagname 
rename v4 NutrDesc 
rename v5 Num_Dec 
rename v6 SR_Order
save NUTR_DEF,replace

insheet using SRC_CD.txt,clear delimiter("^")
gen str2 Src_Cd = string(v1,"%03.0f")
drop v1
rename v2 SrcCd_Desc
save SRC_CD,replace

insheet using DERIV_CD.txt,clear delimiter("^")
rename v1 Deriv_Cd
rename v2 Deriv_Desc
save DERIV_CD,replace

insheet using WEIGHT.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
rename v2 Seq
rename v3 Amount 
rename v4 Msre_Desc 
rename v5 Gm_Wgt
rename v6 Num_Data_Pts
rename v7 Std_Dev
save WEIGHT,replace

insheet using FOOTNOTE.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
rename v2 Footnt_No
rename v3 Footnt_Typ
gen str3 Nutr_No = string(v4,"%03.0f")
drop v4
rename v5 Footnt_Txt
save FOOTNOTE,replace

insheet using DATSRCLN.txt,clear delimiter("^")
gen str5 NDB_No = string(v1,"%05.0f")
drop v1
gen str3 Nutr_No = string(v2,"%03.0f")
drop v2
rename v3 DataSrc_ID
save DATASRCLN,replace

insheet using DATA_SRC.txt,clear delimiter("^")
rename v1 DataSrc_ID 
rename v2 Authors
rename v3 Title
rename v4 Year
rename v5 Journal
rename v6 Vol_City
rename v7 Issue_State
rename v8 Start_Page
rename v9 End_Page
save DATA_SRC,replace
Leave a comment

Posted by on August 9, 2012 in Epidemiology, Programming


Tags: , ,

MESA Air Data Request Form

Next time I’m requesting data from the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) as an internal investigator, I’ll probably reflexively do another Google search before wading through my e-mails to find the link to the MESA Air Data Request Form at the MESA Air internal website.  Posting to this blog seems to be an effective way to register this information in Google. Let’s see if I can save myself a few minutes next time…

(If you’re an outside investigator interested in MESA Air, contact the Collaborative Health Studies Coordinating Center).

Leave a comment

Posted by on May 23, 2012 in Epidemiology


Tags: ,

Equivalent or alternative to Stata’s lincom and test or testparm in R

I often test for interactions using Stata’s ‘test’ (to perform a multiple partial f test for a list of dummy variables) and ‘lincom’ (to find linear combinations of terms) functions.  Recently I’ve been using R more and more, so I needed to understand how to make linear combinations of terms to come up with interaction / effect modification results.

Here’s how (for someone with little statistical expertise). You’ll need to load the ‘survey’ package in R using the library(survey) command:

Let’s say you have a group of people of four different race categories, and you want to know whether the relationship between particulate matter air pollution (pm25) and systolic blood pressure (sysbp) is modified by race (race4cats: 1=White, 2=Black, 3=Hispanic, 4=Other).  You decide to run a regression model as follows.

In Stata:

xi:reg sysbp age i.race4cats*pm25

In R:

myModel <- lm(sysbp ~ pm25*race4cats + age, data=mesa)

Now you test (“multiple partial F test” or “Wald test”) to see whether there is evidence of interaction overall by testing all the interaction terms together. The more flexible alternative to the ‘test’ or ‘testparm’ command in Stata is the regTermTest command in the ‘survey’ package in R.

In Stata:

testparm _IracX*

In R:


Next, if there is significant evidence of interactions, you’ll want to get the effect sizes and confidence intervals for each group.  I suppose you could just run stratified analyses to do this, but my statistically-minded colleagues seem to think it’s better to use the same interaction model to compute these effects using linear combinations of the parameters.  This may be especially important when you’re looking at interactions with continuous variables.  In any case, I use the lincom command in Stata to do this.  It appears that there are a number of alternative ways to do this same thing in R using ‘contrasts,’ but probably the most flexible method appears to involve use of the svycontrast() method in the survey package.

In Stata:

lincom pm25 // association between sysbp and pm25 for Whites
lincom pm25 + _IracXpm25_2 // association ... for Blacks
lincom pm25 + _IracXpm25_3 // association ... for Hispanics
lincom pm25 + _IracXpm25_4 // association ... for Other

In R:

svycontrast(myModel,c("pm25"=1, "pmper10:race4cats2) Black"=1))
svycontrast(myModel,c("pm25"=1, "pmper10:race4cats3) Hispanic"=1))
svycontrast(myModel,c("pm25"=1, "pmper10:race4cats4) Other"=1))

Now I can use R for almost everything! The only issue for me is that (as you can see) it seems to require a LOT more typing.

Also FYI – A useful pdf tutorial / guide to regression in R that contains more R code examples for Stata users (with side-by-side Stata code) is available from Oscar Torres-Reyna (Princeton). I saved it to my Documents folder so I can find it anytime I want by doing a Finder search for his name!

1 Comment

Posted by on May 17, 2012 in Epidemiology, Statistics


Tags: ,

Quieting the noisy Fisher-Price Kid Tough Camera

We just got a Fisher-Price Kid-Tough Digital Camera for our two year old, and our kid loves it, but it’s extremely loud.  Some kids might enjoy the bizarre noises that emanate from its tinny speaker, but most parents probably won’t. It would have been nice if Fisher-Price allowed parents to turn off the speaker or at least turn the volume down, but it appears there is no way to do this. So, we’ll have to silence it ourselves.  Before you begin, note that this will definitely void your warranty and will permanently mute / disable the speaker. (Though technically with minimal soldering skills you could re-enable the speaker if you really wanted to). Here’s how to do it:

First, gather together the following tools:

  • A triangle head screwdriver. This won’t be easy to find offline. I added a link to the set I purchased. If you get this set, use the largest (orange) driver of the set. A regular flathead screwdriver of just the right size might work, but it will be a challenge…
  • Some scissors
  • A butter knife

If you have already installed batteries, take them out of the camera before you begin.

The first step is probably the hardest.  Using the butter knife, remove the 4 screw covers from the back of the camera.  Push the tip of the butter knife deep into the crack between the screw cover and the soft plastic of the case, attempting to catch the lip of the screw cover where it meets the hard plastic of the case.  I wouldn’t worry too much about damaging the soft plastic of the case — it’s pretty resilient.  Exerting firm pressure, slowly work the screw cover out of its hole:

Next, using your triangular head screwdriver, unscrew and remove the four screws holding the back of the case onto the camera:

Detach the camera back from its front by gently prying the camera apart with your hands.  The shutter button will probably pop out — try not to lose it!  You’ll see later that there are four wires holding the two sides together — don’t be too vigorous in prying the camera apart because you could damage these wires if you really yank on them after getting the sides apart.

Now you can see the fairly simple inner workings of the camera.  As mentioned previously, there should be four wires connecting the sides.  The 2 blue ones go from the main circuit board to the annoying speaker, and the red and black ones go to the batteries.  In the picture below, the blue ones have already been snipped, disabling the speaker!

Next step: Using your scissors, cut the blue wires.  This will completely silence the speaker.  You could cover the ends of the wires with something like electrical tape after cutting them if you’re worried about them touching something on the circuit board, but that seemed like too much work for a cheap toy camera and an unlikely problem.  If you want to get really fancy and are into electronics, you could probably snip one blue wire and solder in a potentiometer to dial down the noise incrementally, but that seemed like way too much work for me.

Next, put the camera back together and screw in the 4 screws. Other than simply lining up the front and back, you’ll need to re-insert the shutter button.  Line up one of the grooves on the shutter button with the outcropping on the camera frame, like in the following picture:

Now you’re done!  You could put back the screw covers, but I didn’t bother because they’re probably choking hazards.  If your kid is apt enough to find the triangle head screwdriver and unscrew the four screws, she’s entitled to mess with the innards of the camera as much as she wants to!

Enjoy your quiet camera.


Posted by on March 19, 2012 in Parenting



Problem upgrading Stata on Mac

I had been having trouble upgrading my Stata 12 on my Mac. This is the error message I was getting:

. update all
cannot write in directory /Applications/Stata/.tmp

It appears to have something to do with the fact that I installed Stata to the Applications directory as an administrator, and so the new version of Stata also needs to be installed as a root user.

To do this, run Stata from Terminal using the ‘sudo’ command, which escalates your privileges to root if you have an Admin account. In the Mac Spotlight (command-space), type “Terminal,” then hit enter. You’ll get a command line. Type the following. Note that this depends on your having installed Stata in the /Applications/ directory:

sudo "/Applications/Stata/"

Be sure to include the quotation marks above. Enter your password. Stata should open. Then ‘update all’ in Stata should work.


Posted by on February 21, 2012 in Statistics


Tags: ,

Merge a large number of shapefiles using ArcGIS

Using the magic of ArcGIS’ python scripting, it is possible to merge a large number of shapefiles together programmatically. After downloading all census blocks using the python methods in a prior post, I used the python code contained in this post below to merge all the census blocks for the entire United States into one big happy shapefile. Put this code in a python script file (like ‘’) in the same directory with all the shapefiles you want to merge and run it. It is probably necessary to have all the shapefiles with matching coordinate systems, although I don’t know this for sure. Modify the workspacedirectory and output bigshapefile names to match your situation.

*UPDATE: This doesn’t work for merged shapefiles with any part greater than 2GB — so it doesn’t work for the huge census blocks files. If you’re lucky enough to have an ArcInfo license for ArcGIS, then you can create a file geodatabase (which allows for maps of any size) and use the Data Management -> General -> Merge tool or a script like the one below to merge the shapefiles to an enormous map in the file geodatabase. Otherwise, you’ll need to merge the shapefiles into several less-than-2GB chunks) instead of one big entire U.S. map.

import os
import fnmatch
workspacedirectory = "Z:/Downloads/temp"
bigshapefile = "Z:/Downloads/allblocks.shp"

shpfilearray = []
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.shp'):

import arcpy
arcpy.env.workspace = workspacedirectory

Posted by on January 25, 2012 in Geographic Information Systems


Tags: , ,

Download all U.S. census block shapefiles

I wanted to download all the United States census block shapefiles for a project I’ve been working on. Since you can only get them for individual states at and 50+ clicks (not to mention unzipping and merging all the files) takes far too long, I made a python script that downloads all the files. A future post will go through merging the shapefiles into a unified, huge country-wide file for ArcGIS.

Save the following code as in a new directory with enough space to hold several gigabytes of shapefiles. Modify the year and url depending upon your needs — the code below gets 2009 census block maps. The commented ‘os.system’ commands will also unzip the file and then delete it on linux or unix-based systems (like my Mac) with a bash shell.

import urllib
import csv
import sys
import os
year = '2009'

fips_csv = csv.DictReader(open('fips.csv', 'rU'), dialect='excel')

for row in fips_csv:
    statename = row['statename'].upper().replace(' ','_')
    statecode = row['code'].zfill(2)
    filename = 'tl_' + year + '_' + statecode + ''
    url = '' + year + '/' + statecode + '_' + statename + '/' + filename
    print 'getting ' + filename
        urllib.urlretrieve(url, filename)
        # os.system('unzip ' + filename)
        # os.system('rm ' + filename)
        print "Unexpected error:", sys.exc_info()[0]

Then save the following text containing state names and FIPS codes as fips.csv in the same directory as the script:

American Samoa,60,AS
District of Columbia,11,DC
Commonwealth of the Northern Mariana Islands,69,MP
North Carolina,37,NC
North Dakota,38,ND
New Hampshire,33,NH
New Jersey,34,NJ
New Mexico,35,NM
New York,36,NY
Puerto Rico,72,PR
Rhode Island,44,RI
South Carolina,45,SC
South Dakota,46,SD
Virgin Islands of the United States,78,VI
West Virginia,54,WV

Finally, run the file using python and all United States census block shapefiles will be automatically downloaded to the same directory as the two files you created.


Posted by on January 25, 2012 in Geographic Information Systems


Tags: , ,