Forums: Define.xml
Define file and aCRFs describe and explain your data and data collection process. Terminology code lists should be used for design of data collection process and (may be) during SDTM data conversion/mapping. A general rule is to include: I. ONLY values which were used during the data collection (e.g., only actually collected Lab Tests, Units, etc. or values presented on CRFs) II. ALL values used during data collection (e.g., all options presented on CRF, but not only values presented in collected data). This is very common issue in submission data. Often sponsors/data vendors created define.xml with code lists which represents collected data, rather than a data collection process. It's much easy to generate such incomplete code lists using collected data and simple programming technique (e.g., SAS Proc Freq, SQL Select Distinct, etc.), than use the different sources of data collection metadata (EDC configurations, CRF design specs, etc.). For example, CRF AE Severity may have Mild, Moderate, Severe options; only Mild and Moderate AEs were collected during the study conduct. Some sponsors create define.xml with (Mild;Moderate) values. It’s easy to do, but it’s wrong. You should include everything presented on CRF: (Mild;Moderate;Severe). Regarding your two questions: 1. All collected, only collected or pre-specified by protocol LB tests should be included in your define.xml codelist. 2. The same as above. In your example, if Y is the only option on CRF, then Y is the only value in your codelist. If CRF includes Y and N options, then your codelist should also be (N, Y) regardless of actually collected data (Y value only)
Excellent points you make. I believe the issues are indeed happening when the define.xml is seen as an artifact that has to be created from the data. In a well established end-to-end process the define.xml would be the rendition of metadata that has been used to drive the process.
Most of the metadata in the define.xml can not be derived just from the data, but needs to be actively managed in a consistent way across studies and end-to-end.
Disclaimer: The opinions expressed above are my personal thoughts and may not reflect the opinions of my employer (SAS ) or CDISC.
Sergiy/Lex,
I appreciatethe feedback and this make a lot of sense; albeit, quite a cumbersome endeavour. Also, regarding validation implementation within OpenCDISC would be impossible for inclusion. I'm sure this question will come up often, especially as sponsors begin to utilize the newly released MSG. Moreover, how more in-depth define.xml will become with the future release of ODM 1.3, whenever that happens.
Thank you!
Sergiy,
Yes, I agree with the enumertated value-level specifications. It creates both a more refined and valid correlation on a micro-level, which one would assume is the primary purpose of the define to beign with.
Thanks
Even the current define.xml standard allows you to attach a CodeList to a ValueList item.
Both variable-level and value-level entities get their attributes and sub-elements from the XML ItemDef construct.
Ok, I see what you are saying.
For define 1.0 we do not have the metadata to explicitely define what you want.
We can attach valuelists to VSORRESU and VSSTRESU, but we would not have the metadata to tell us explicitely which item in those valuelists is associated with which VSTESTCD.
You can do it by convention, but not explicitely. Define 2.0 will allow you to do that.
I do not think there are any references or guidances besides the CRT-DDS spec and the Metadata Submission Guideline.
Further, what about when I have both core and local labs? I will have a consistent LBORRESU for each LBCAT, LBTESTCD pair. Therefore, I will have no problem expressing that in the define.
Not so true for local labs for the same LBCAT, LBTESTCD pair. Yes, that's the reason to derive LBSTRESC, LBSTRESN, and LBSTRESU. But, there are no codelists for LBORRESU, because they are not meant to be controlled.
Good point!
Do you think that the check CT0050 "Value for --ORRESU not found in UNIT controlled terminology codelist" should be removed?
All,
This would be part of UNIT CT, which is extensible in any case. Having it present should not harm the validity of the data, just an explanation justifying its existence within it.
Anthony's point is that there is no much scientific sense to provide codelits for non-standardized data. Why do you need to bother about something like this? The codelist for all used local lab units will not add any value, but creates a data noise and reduce overall quality of define.xml content.
I believe i should be removed. Others have stated it here already - for local labs, there's a lot of noise.
Hi,
After reading the CRT-DDS and the newly released CDSIC Metadata Submission Guidelines (MSG), I'm curious about the CodeList and its overall representation. Therefore, here are some questions that I was hoping this forum could assist with.
The MSG and CRT-DDS are a little vague when it comes to this and interpretation can be misleading sometimes. I'd like to get some opinions on this, as well as, ask if the OpenCDISC development team would include such a valdiation crosscheck?
Thank you in advance!