Forums: Troubleshooting and Problems
components\config\data\SNOMED\2014-09-01\sct2_Description_Full-en.txt
or whatever dated version you download.
However...
The SNOMED file originally has 9 columns. I found that there were issues with one or more of columns’ data. However since OpenCDISC validation only needs 2 columns (conceptId ,term) I removed the other 7. Ideally you could just open the file in excel, delete the 7 coumns, and save as text. Unfortunately the file is over 1 million records. (about 1.7 mil). You have to load it to a decent text editor like textpad ….copy the first 1M rows to excel and delete the unnecessary columns. Then take next 700k records and do the same. Then save each excel file as tab delimited and merge those files back in the text editor.
Ideally we could provide the modified file to all US customers who have accepted the license. I can send a zippled 16 MB file in email as well so that you can avoid this step. We also found a similar issue with UNII or NDF-RT but since we can distribute those you would not face this challenge unless you need older or later versions than we provide.
Here is some sample SAS code that should handle removing the unwanted columns from the SNOMED file.
* SAS program to remove unwanted columns from SNOMED dictionary ;
* First, store the original SNOMED sct2_Description_Full-en.txt file in a new ;
* folder named Original under the SNOMED\(version) folder in the OpenCDISC ;
* Community application folder. ;
options compress=yes ;
%let OpenCDISCRoot=[*** Path to your OpenCDISC Community installation ***];
%let SNOMEDVersion=[*** ISO date of your SNOMED version ***];
* Read the raw data ;
proc import
dbms=tab
datafile = "&OpenCDISCRoot.\components\config\data\SNOMED\&SNOMEDVersion.\Original\sct2_Description_Full-en.txt"
out=work.SNOMED_Data
replace
;
run ;
* Create a separate dataset, and apply a format to CONCEPTID ;
proc sql ;
create table work.SNOMED_Narrow as
select distinct CONCEPTID format=32., TERM
from work.SNOMED_Data
;
quit ;
* Create the file with only the selected columns ;
proc export
dbms=tab data=work.SNOMED_Narrow
outfile = "&OpenCDISCRoot.\components\config\data\SNOMED\&SNOMEDVersion.\sct2_Description_Full-en.txt"
;
run ;
The code above is useful however depending on your SAS system options, the imported text file may be truncated. To ensure this does not happen I have added the GUESSINGROWS options to the PROC IMPORT (please note this does drastically increase the run time).
proc import
dbms=tab
datafile = "&OpenCDISCRoot.\components\config\data\SNOMED\&SNOMEDVersion.\Original\sct2_Description_Full-en.txt"
out=work.SNOMED_data
replace;
guessingrows=max;
run;
Alternatively, run the PROC IMPORT code above, look at the SAS log and alter the INFILE statement to avoid the truncation, the run time may be quicker.
Also, anyone curious where you can get the SNOMED CT from, you can find it here: https://isd.digital.nhs.uk/trud3/user/guest/group/0/pack/26/subpack/101/releases or google "TRUD SNOMED CT". It will take you to an NHS website (for UK based users).
1. Create an account.
2. Log in.
3. Click Subscribe on the applicable SNOMED CT.
4. Click Download releases.