Forums: Define.xml
The define.xml should describe your datasets. It seems that you create a variable with length 200, so the define.xml describes this acurately. So, if you want a define.xml to display a smaller length, you should change the length in the dataset.
See also: CDER Common Data Standards Issues Document (Version 1.1/December 2011)
(http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/UCM254113.pdf)
"For both CDISC and non-CDISC datasets, in order to significantly reduce dataset file sizes, the allotted character variable length/size for each column in a dataset should be the maximum length used. Lengths/Sizes of columns should not arbitrarily be set to 200, For example, if your USUBJID column has a maximum length of 18 being used throughout the dataset, the USUBJID’s column size should be set to 18, not to 200. Alternative solutions to this problem that involve some inclusion of a small amount of padding to column width may be acceptable as long as they don’t result in significant increases in file size due to the padding."
Disclaimer: The opinions expressed above are my personal thoughts and may not reflect the opinions of my employer (SAS ) or CDISC.
Hi Lex,
Thanks for your response. When I create the define.xml from OpenCDISC, the length does not represent the actual variable length in my dataset. For instance in define.xml DM.RACE is given a length of 200 (variable length is 32); --SEQ is a length of 10 with 5 significant digits, datatype=float (I am using integers with 1 signficant digit and a length of 8 in my dataset).
Since I am still trying to figure out a process to develop define.xml (we are not yet submitting in this format), I was wondering if others are manually correcting this to match their data or is there a way to pull this information from your datasets when creating define.xml?
Thanks for your help!!
Maria
Hey Maria,
We create the define file from our specification files, in the specification file all the attributes must be declared. The define creation is a relatively simple SAS program that writes text to a file (as put statements, there are alternatives, but this way we had best control over it).
It would also be possible to get the attributes from the actual data as well, but we have decided that this leaves too much room for error , i.e. the data is not necessarilly what you wanted it to be, there may be mistakes, you will not spot them if the define is created off the data. (Aso on the Controlled Terms: maybe not all values that are possible are in the data while you do want all of them in the define)
Well there's a lot more to it: we check the data against the specifications to the level of the cells, etc. Also about specifications vs. data: Which is to be the master there is a lot to say for both sides and even more for a hybrid situation... There's lots of food for thought here, maybe even process-altering thoughts, :-)
Thanks, Dirk! I think I'll roll up my sleeves and try out a SAS program to create define.xml.
Maria
Well, you would not have to start from scratch:
http://www.lexjansen.com/pharmasug/2011/SAS/PharmaSUG-2011-SAS-HW02.pdf
http://support.sas.com/rnd/base/cdisc/cst/index.html
Lex Jansen
Disclaimer: I work at SAS.
When I create define.xml, I notice that the length listed in not based on my datasets (ex. COREF has a length of 200 but in my dataset it is much smaller). Are there any plans to make this data-driven? It seems this could be a problem down the road.
Thanks,
Maria