Do's and Don'ts of Define.xml Webinar Q&A

  1. When we are adding new terms to the extended codelist, can we keep the NCI codes blank? or should we add some placeholder for those?
    The NCI codes must be left blank. Within the XML code, the "ExtendedValue' attribute is set to Yes (ExtendedValue="Yes"). When the XML is viewed using the stylesheet, the stylesheet will interpert the ExtendedValue="Yes" and display it to use as an asterisk in the codelist, indicating that this value was not part of the origin CDISC codelist.
     
  2. Is there a standard template available for the Reviewer's Guides (nSDRG, cSDRG and ADRG)?

    There is a standard template created by PhUSE for all three Reviewer's Guides.

    Study Data Reviewer's Guide templates (Search for Title of "nSDRG" and "cSDRG")

    Analysis Data Reviewer's Guide template (Search for Title of "ADRG")

    FDA Technical Conformance Guide (sections 2.2 and 2.3) recommends using PhUSE templates
     

  3. Can both eDT and CRF be added as reference?
    If a variable has multiple origins, leave the origin blank at the variable level. Provide value level metadata to specify which parts of the variables are collected by CRF or eDT.
     
  4. When will the new version of community be available with ADaM 1.1 validation rules?
    ADaM v1.1 validation rules have not yet been released by CDISC. As soon as CDISC publishes the validation rules, we will make the rules available in Community. For our Enterprise users, ADaM 1.1-beta rules are available with v3.4 Enterprise release.
     
  5. Should/can the complex algorithm(s) be put into the ADRG or SDRG?
    Yes, putting them in the Reviewer's Guide would be valid as there is no set guidance on where these long and complex derivations should be documented.
     
  6. For variables that are predecessors, instead of setting Origin=Assigned: is it more appropriate to set Origin=Predescessor and set the Define Comment to SDTM.AE.XXXXX or RAW.AE.XXXX?

    For your ADaM Define.XML file, whenever the values are copied straight out of an SDTM variable, you MUST have Origin=Predecessor and the Comment set to the SDTM dataset.variable.

    The raw data variables should not be referenced in any SEND/SDTM/ADaM Reviewer's Guides or Define.XML.
     

  7. Is the comment text recommended for representing Raw variables in Define.XML?

    Raw data variables should not be referenced in any SEND/SDTM/ADaM Reviewer's Guides or Define.XML's.

    The text must be reworded so any raw data references are changed to SEND/SDTM/ADaM terms.
     

  8. Can the Enterprise Define Designer tool create annotated CRF or annotate CRFs? Or just read aCRF and populate origin in Define.XML?
    Pinnacle 21 Enterprise reads the aCRF in order to populate Origin = CRF where applicable and populates the individual page numbers for the hyperlinks in the Define.XML, saving a considerable amount of time. It does not annotate CRFs. 
     
  9. Do we need origin for every variable? Can you give more examples of Assigned beyond protocol defined?

    The origin must be populated for every variable and also every value level metadata.

    The SDTM-IG v3.2, section 4.1.1.8.1, gives the following definition of an assigned origin as: "A value that is determined by individual judgment (by an evaluator other than the subject or investigator), rather than collected as part of the CRF or derived based on an algorithm. This may include third party attributions by an adjudicator. Coded terms that are supplied as part of a coding process (as in --DECOD) are considered to have an Origin of “Assigned”. Values that are set independently of any subject-related data values in order to complete SDTM fields such as DOMAIN and --TESTCD are considered to have an Origin of “Assigned”."
     

  10. Isn't the "Complex Algorithm document" supposed to go in the SDRG/ADRG? What are the advantages of creating another document?

    You could put any complex derivations into the Reviewer's Guide, as there are no rules stating otherwise.

    After PhUSE created the Reviewer's Guide templates, the FDA then recommended using these Reviewer's Guide templates in their Technical Conformance Guide. The Reviewer's Guide templates do not have a placeholder for complex derivations (although this does not mean you cannot add in your derivations).

    There is no rule stating what your complex algorithm document should be called. In the Define-XML v2.0 specification, CDISC provided an example SDTM data package where they do have a Reviewer's Guide and a separate "complexalgorithms.pdf".

    We do not have any compelling arguments to creating a "complexalgorithms.pdf", other than the fact that CDISC does it in their example and it doesn't affect the Reviewer's Guide templates that the FDA recommends using.
     

  11. What should be included in the UNIT codelist? When you capture the UNIT as a codelist and as an Other, specify field and map those values to CT?
    All of the unit options available on the CRF should be listed in the codelist, along with any values collected in your Other, specify field.
     
  12. Do you recommend sponsors to use Pinnacle community tool for Define.XML generation? 
    The overwhelming feedback from our Enterprise users is that the Define Designer is significantly better than the Community tool as it automates a number of steps and saves a lot of time. Creating define.xml with Pinnacle 21 Community tool is sufficient for some users and in certain situations, but does have limitations.
     
  13. in ADaM derivation can it not be blank even for assigned

    A derivation is only required when the variable or value level metadata specifies an origin of 'Derived'.

    Adding a derivation to something that is assigned will result in an error.

    However, comments are valid (not required) for assigned variables/VLM.

    To the end user viewing the Define.XML via the stylesheet, the comments and derivations appear to look the same, they appear in the same section in the Define.XML.

    With the exception of variables that reference an external dictionary, or variables where the origin is very obvious, we strongly recommend adding comments to assigned variables. When the origin is listed as Assigned and no comment is provided, there is practically no useful information at all as to where the variable came from or what it contains.
     

  14. Are you saying to always leave the comments blank for a variable that has origin=ASSIGNED?

    With the exception of variables that reference an external dictionary, or variables where the origin is very obvious, we strongly recommend adding comments to assigned variables. When the origin is listed as Assigned and no comment is provided, there is practically no useful information at all as to where the variable came from or what it contains.

    In the webinar we were pointing out that having a comment of "AE.AELLT" for the AE.AELLT variable does not add any value to the Define.XML.
     

  15. Should there be a separate codelist for PARAM and units or can we we do 1 code list which includes PARAM+units?
    Generally, your PARAM values will include the unit values (as the PARAM is a combination of the SDTM xxTEST and xxSTRESU values) and in these situations then you would have one codelist showing your PARAM values for that dataset.
     
  16. As per derivation/comment column: If EXSEQ is assigned based on sorting order then how could be the origin "derived"? Can you help in understading in differentating the origin?

    The CDISC SDTM sample provided with the CDISC Define-XML v2.0 spec has the xxSEQ variables listed as derived with the following derivation: "Sequential number identifying records within each USUBJID in the domain."

    xxSEQ is a fundamental part of SDTM and I would not want to spend too much time debating derived versus assigned, or spend time carefully wording/formatting the derivation. How you sort the data in order to assign xxSEQ should be fairly simply.

    I'd explain how you sort the data to apply SEQ, or I would put in a generic derivation like CDISC have.
     

  17. In the main domain page in the mapping document, should the 'derivation/comment' cell contain "see complex algorithms document" for complex derivations? I'm thinking recording the same derivation in two separate locations could result in discrepancies.

    If you have the derivation in a document called "complexalgorithms.pdf", then your Define.XML MUST indicate that the derivation is in the PDF document, and you really should provide a hyperlink to that document.

    There is no common industry practice to reference, but personally, I like to keep the full derivation in the Define.XML and also have it in the PDF. My reasoning behind this is that if anyone is attempting some machine readable excercises using the Define.XML, then they can still access the full derivation from the define.

    My approach is flawed because, as you have mentioned, now I have the full derivation in the Define.XML, plus a second copy of it in the PDF, which introduces a HUGE chance for discrepancies.

    In my defense, I only create the complexalgorithms.pdf when it's absolutely necessary and only then do I use it for the large derivations where it'es necessary. It is not necessary to put ALL derivations into complexalgorithms.pdf.

    In my experience, I have never needed to add more than 10 derivations to the complexalgorithms.pdf file.
     

  18. Would leaving Origin = NULL at the variable level generate an error when validating the Define.XML in Pinnacle 21 Enterprise?

    If the origin is null at the variable level, then the origins must be declared at the value level metadata level.

    Failure to properly define origins will result in DD0072 firing.

    DD0072 message description: "For regulatory submission data, Origin is required for all SDTM, ADaM, and SEND variables. It is at the sponsor's discretion whether to provide Origin at the Variable or Value Level. When Origin is provided for each Value Level item, then providing Origin on the Variable is optional. Define-XML specification represents variable Origin as def:Origin element under ItemDef element."
     

  19. For ADaM define.xml, in a case of having two DTYPE values in lab data, and another one in vitals, would you create two separate codelists?

    If the unique DTYPE values are identical in labs and vitals, then you could create just one codelist.

    However if there are one or more differences between the DTYPE values, then two separate codelists will need to be created.
     

  20. For Laboratory data collection, if same parameter is collected in local lab in CRF and in central lab, which origin should be mentioned, is it CRF or eDT ?

    You would have to use value level metadata to specify the multiple origins.

    At the variable level, variables like results (LBORRES), units (LBORRESU) and dates (LBDTC) will leave the origin missing.

    At the value level, where clauses will need to be created to show when the data comes from CRF and when it is from eDT. The same where clauses can be used across all variables with multiple origins, the simplier the where clauses the better.

    Maybe LBNAM can be used in the where clauses to distinguish between the CRF and eDT data.
     

  21. Where should we give translated text for supplemental qualifiers in Pinnacle 21 Define.xml specification?
    For datasets it’s the Description column, for variables it’s the Label column, etc.
     
  22. What should the variable's length be in Define.XML?
    From the Define-XML Specification v2.0, Section 5.3.11 ItemDef Element, page 75, the business rule for the length is: "Length should be defined as the maximum expected variable length."
     
  23. Is there any plan to allow for more than 1000 characters for derivations?

    XML and ODM do not specify a length restriction. However the FDA's current system only supports a maximum of 1000 characters.

    The Define.XML rule DD0086 flags Define.XML's with elements longer than 1000 characters.
     

  24. Are you suggesting one unit codelist per domain (e.g. UNIT_LB, UNIT_VS) or even multiple per domain (e.g. UNIT_VITALSIGNS and UNIT_BODYMEASUREMENTS)?

    At the variable level for LB and VS, each unit variable (LBORRESU, LBSTRESU, VSORRESU and VSSTRESU) should have a codelist assigned to it.

    The codelist should show all possible values for that variable or value level metadata which it is assigned to.

    Generally, the units in your LB data will be different from the units in your VS data, therefore you need a miminum of one codelist for your LB units and another codelist for your VS units.

    For LB, if your original unit (LBORRESU) values are different from your standardized unit (LBSTRESU) values, then you will need separate codelists for each variable.

    Sometimes for VS, the original units (VSORRESU) are the same as your standardized units (VSSTRESU), and in this situation it would be valid to have the one codelist assigned to both variables (VSORRESU and VSSTRESU).

    Your value level metadata is where you can then make some more customized codelists, based on your needs.

    For your example, maybe height and weight are collected on the CRF, while heart rate, blood pressure and temperature are collected via eDT.

    Here you might want to use your category (VSCAT) value to distinguish the data, then go ahead and make your UNIT_BODYMEASUREMENTS and UNIT_VITALSIGNS codelists.
     

  25. What's the origin for an ADaM variable, if it is copied from a variable in SUPPXX? Assign, or Derived or Predecessor?

    The suppqual variables are treated just like the parent domain variables.

    Where all values for a given QNAM are brought into the ADaM dataset, then the origin is Predecessor and the source shows SUPPxx.QVAL where QNAM = "ABC"
     

  26. Can you further detail how to present when origin is multiple source?  For example data from CRF and vendor data?  The DEFINE 'Origin' column will be null as you mentioned, then where will the details go?
    If a variable has multiple origins, you should leave the origin blank at the variable level, and provide value level metadata to specify which parts of the variable are CRF collected and which parts are eDT.
     
  27. Does the Define.XML tool automate the collection of CT from CRF as well along with the ones collected in Datasets, if some were intended to be collected but did not get into datasets?
    No, values of CT from the CRF that are not in the datasets will not be automatically populated by the Define.XML tool. Any extra values will need to be manually entered.
     
  28. One slide mentioned "Do not overcrowd your Define.XML with codelists from other data packages".  Does this mean ADaM Define.XML should not include codelists for SDTM predecessor variables carried into ADaM datasets?

    No, SDTM codelists within your ADaM Define.XML is fine as long as they are all predecessors.

    Far too often we have seen ADaM Define.XML's with all the codelists from SDTM present, even codelists for SDTM variables that were never copied over to ADaM.

    For example, I haven't yet seen all the TSPARMCD/TSPARM values copied over into ADaM as predecessor's, but I often see see the TSPARMCD and TSPARM codelists in the ADaM Define.XML.
     

  29. If an expected variable is not collected which origin should it get? Can it be left blank?

    If the variable was intended to be collected via CRF, eDT, etc. but no data was collected, then you should put in the intended collection method as the origin.

    If the expected variable was NEVER collected but you have it in your domain, you should absolutely put a comment in the define for that variable explaining that it was not collected.

    For the origin, you could leave it null, resulting in a validation error which you would explain in your Reviewer's Guide.
     

  30. Some Sponsors use STUDYID in codelist. Do we need to add STUDYID in Controlled Term?

    It seems excessive to create a codelist for STUDYID.

    For an ISE/ISS where there are multiple values for STUDYID it might be valid to create a codelist for STUDYID.

    For the vast majority of studies, I do not see any need to create a STUDYID codelist.
     

  31. I wish to know if Pinnacle 21 Enterprise to create Define.XML  is free for public use

    Pinnacle 21 Enterprise is a subscription based service, however Define.XML files can be made for free using P21 Community.
     

  32. We keep hearing from some clients that FDA still needs Define.PDF even if Define.XML is generated using v2.0. Can you clarify if Define.PDF is still needed with Define.XML 2.0

    The FDA's Data Standards Catalog does not list define.pdf as a supported or required standard, so technically there is no need for define.pdf.

    However, individual review teams within the FDA might make specific requests, such as asking for define.pdf.

    Hopefully you have some software available to automatically transform the Define.XML to define.pdf.

    With P21 Enterprise, creating a define.pdf from an existing Define.XML takes a matter of seconds.
     

  33. Some of the records get derived based on another records (E.g., % differential records in Haematology). In this case, what would be the origin for it?
    If lab test results are being derived from other lab test results, then your value level metadata should break out each test and list it's origin as CRF/eDT or Derived.
     
  34. Currently CLI version of community version of Pinnacle21 validator is not supporting Define.XML validation, do you have any plans to fix this issue in near future?
    We are making a number of enhancements to the upcoming version of Community, and this will be one of them. 
     
  35. Currently Pinnacle 21 is provides CDISC CT of 03/25/2016. But the new CDISC CT is out for use. How do we use the new CDISC CT?

    All versions of CDISC CT, including the latest version, can be downloaded for free from:
    https://www.pinnacle21.com/downloads/cdisc-terminology

    Specific instructions on configuring Community with different versions of CDISC CT can be found here:
    https://www.pinnacle21.com/configuring-pinnacle21-community-validator-cdisc-controlled-terminology

    For our Enterprise clients, all versions of CDISC CT are automatically available and configuring is not necessary.
     

  36. Regarding AESPID variable example mapped to AE.RECORDPOSxx variable. Does it mean, we should not use any raw data variable. 

    Raw data variables should not be referenced in any SEND/SDTM/ADaM Reviewer's Guides or Define.XML's.

    The text must be reworded so any raw data references are changed to SEND/SDTM/ADaM terms.
     

  37. For complex Algorithms, is it fine if we just mention in derivation "Refer Complex Algoritms document" when we have the derivation in separate document

    That would be ok, however I would attempt to provide a hyperlink in the Define.XML file.

    An SDTM Define.XML file will contain hyperlinks to individual pages within the aCRF.pdf. This same XML code that hyperlinks to the aCRF could be copied out and used to link to your complex algorithims document.
     

  38. Should the link to MedDRA dictionary CT be included in ADaM define as well as SDTM define?
    It would be best to show in your ADaM Define.XML what variables conform to the MedDRA dictionary and what version of the dictionary is being used.
     
  39. Regarding codelists - for observational findings - I can understand the requirement to list ranges (e.g. severities) but would you expect to see all the selectable clinical observations in the entire data collectin system listed?
    Best practice is to have a codelist in the Define.XML contain all allowable values for that variable (from the CRF, etc.). You would not include all values from the entire data collection system, just the values applicable to your study.
     
  40. If you have a value level metadata with different type & length, what do you display for type & length in the domain metadata?

    At the variable (domain) level, you should show the max length for that variable. If the variable contains a mixture of numeric and text values, then at the variable level the type would be 'text'.

    The value level metadata can be more specific. For a specific where clause, the variable might only contain integer values and therefore the type would be set to 'integer' and the length left null.

    For the same variable but different where clause, the values might all be text with a max length of 100. For this value level metadata, the type would be set to 'text' with a length of 100
     

  41. Is there a rule which tells you what is the maximum values should be displayed in the domain metadata (above codelist link) for long codelists? 

    No, there is no rule for this.

    Most people use the stylesheet provided by CDISC. Somewhere within this stylesheet (I looked but couldn't find where) there is some logic so that a small codelist gets all the values displayed plus a hyperlink to the actual codelist, while large codelists simply get a hyperlink to the codelist.

    Near the top of the CDISC stylesheet there are the following two comments in the code:

    1. The CDISC Define-XML standard does not dictate how a stylesheet should display a Define-XML v2 file.

    2. This example stylesheet can be altered to satisfy alternate visualization needs.
     

  42. Do we need to capture dictionary details in Supp domains as well apart from Define.XML file?

    Personally, I do not like to have dictionary details in SUPP.

    I have seen sponsors insist on migrating the MedDRA dictionary verison to SUPPAE. To do this, they have a SUPPAE record with the dictionary version attacched to EVERY SINGLE AE record. This creates a large number of repetitive, unecessary SUPPAE records.

    Then the same sponsor puts the MedDRA version into the Reviewer's Guide and also the Define.XML.

    Many times I have seen data packages that are supposedly passed QC, yet the MedDRA versions documented in the Define.XML, Reviewer's Guide and SUPPAE are never consistent.

    Your ADaM programmers should be able to programatically get the MedDRA version without going into SUPPAE.

    If people are insisting that the MedDRA version be migrated to SUPPAE for all records, because there is the possibility of having some records coded to version 18.0 while other records might be coded to 18.1, then your entire coding process should probably be looked into because migrating the MedDRA version to SUPP is simply a bandaid solution (your entire submission should be coded to one MedDRA dictionary).
     

  43. While setting define file, is there any restriction on the text length? If yes, could you please provide the complete list on length restrictions. For Example: Method description should be only 200 characters in length , Where Clause ID should be only 40 characters in length etc.

    XML and ODM do not specify a length restriction. However the FDA's current system only supports a maximum of 1000 characters.

    The Define.XML rule DD0086 flags Define.XML's with elements longer than 1000 characters.
     

  44. Why should we enter a comment which only states that the variable itself was used (like comment ex.exadj for variable exadj)?
    It's better to leave these types of comments out of the Define.XML. Simply stating that EXADJ goes to EX.EXADJ does not add any value to the Define.XML and if anything, could lead to human error like typo's and require additional QC time.
     
  45. Referring to your comment that no programming experience should be assumed: you had SAS code piece in your example of derivations. So, should it or should it not be presented?

    Sometimes the SAS code itself can be easy to read and is quite self explanatory. Personally I think the SAS code in my examples is easy to read so I left it as is.

    As soon as the programming gets too complex, it's best to convert your code to pseudo code. There are a lot of people who will review your Define.XML in the future and many of these people will not have any programming experience.

    If you feel the need to include your exact code, then there is the "FormalExpression" option available.

    Personally I have never seen this used in the real world.

    Please take a look at the Define-XML Specification v2.0. Section 5.3.13.1 (page 89) gives the definition of FormalExpression, while section 4.6.1.4 (page 45) shows and example using FormalExpression.
     

  46. Is there any tool to review define? or it should be only done manually?

    Pinnacle 21 Enterprise or Community both validate the Define.XML. The validation checks for things like an origin of "CRF" has associated page numbers, or an origin of "Derived" has a derivation and that the XML code is properly formed.

    This webinar focused on items that can only be checked manually, such as how easy it is to read the comments and derivations, or removing raw data references from the define, etc.
     

  47. How does the define expectations differ between SDTM and ADaM?

    The main purpose of the Define.XML is to describe the data, so in that regard there really is no difference in the expections of a SDTM Define.XML versus ADaM.

    There are slight differences in the defines when comparing SDTM to ADaM.

    The origins of "CRF" and "eDT" are not valid in ADaM, while the origin of "Predecessor" is only valid for ADaM.

    Derivations need to be fully described, no matter if the data is SDTM or ADaM. In general there are more derivations in ADaM data and also the derivations are more complex. This led to the creation of Analysis Results Metadata Specification Version 1.0 for Define-XML Version 2 Prepared by CDISC ADaM Metadata Sub-Team. This additional specification is only for ADaM, however at its core this is simply a way to better describe complex derivations and does not provide an additional expectation from and ADaM Define.XML.
     

  48. Is the programming code-free standard in the derivation sections apply to ADaM define as well?

    There are a lot of people who will review your Define.XML in the future and many of these people will not have any programming experience. Hence I like to provide pseudo code, even in ADaM define's.

    If you feel the need to include your exact code, then there is the "FormalExpression" option available.

    Personally I have never seen this used in the real world.

    Please take a look at the Define-XML Specification v2.0. Section 5.3.13.1 (page 89) gives the definition of FormalExpression, while section 4.6.1.4 (page 45) shows and example using FormalExpression.
     

  49. For VSCAT, on the CRF page, “Body measurements” and “Seated Vital Sings” are pre-printed. On the SDTM dataset, values for VSCAT are “BODY MEASUREMENTS” and “HEART RATE AND BLOOD PRESSURE”. According to the SDTM-IG 3.2, section 4.1.1.8.1 “Origin Metadata for Variables” : “An origin of “CRF” includes information that is preprinted on the CRF” So VSCAT should have an Origin of “CRF” but for VSCAT = HEART RATE AND BLOOD PRESSURE, it does not correspond to the pre-printed text. My understanding of SDTM-IG is to rather set as “Assigned” the Origin for VSCAT. That’s why, I use the annotation VSCAT = XXXX on the aCRF. Is it OK ?

    I think your approach is sufficient and in line with what I would do.

    If your pre-printed CRF text exactly matched your VSCAT values and you still had VSCAT as Assigned, I doubt anyone would notice it or question it, and it wouldn't matter at all.

    I like to annotate my xxCAT/xxSCAT values in the same way you have, because I think it makes reviewing the data and CRF easier.
     

  50. What should have been provided for the MedDRA variables instead of just the SDTM variable references?

    Nothing. The MedDRA variables are fairly self explanatory. The Define.XML should list them as Assigned and should clearly state what version of the MedDRA dictionary they have been coded to.

    Generally, there is nothing more to document for these variables.

    Personally I think that having a comment of "AE.AELLT" for the AE.AELLT variable does not add any value to the Define.XML. If anything, comments like this take time to make, would need to be QC'ed and still might come out with typo's.
     

  51. is it the Define.XML itself not supporting for example carriage return, or the standard CDISC stylesheet?

    At it's core, the Define.XML is machine-readable file. By applying a stylesheet we can make the Define.XML human-readable.

    Since this is a machine-readable document, there is no formatting.

    Through modifying the stylesheet, there might a way to get your carriage returns to be displayed, however I have not seen anyone do this and the preferred approach to keep your formatted text in a separate PDF document.
     

  52. Could you give a brief outline of the difference between an assigned variable and a derived variable please? If we set a value "xx" to an SDTM variable based on the value of another, is that an assignment or derivation?

    Section 4.1.1.8.1 in the SDTM-IG v3.2 gives the following definitions for derived and assigned origins:

    Derived: Derived data are not directly collected on the CRF but are calculated by an algorithm or reproducible rule, which is dependent upon other data values. This algorithm is applied across all values and may reference other SDTM datasets. The derivation is assumed to be performed by the Sponsor. This does not apply to derived lab test results performed directly by labs (or by devices).

    Examples illustrating the distinction between collected and derived values include the following:

    • A value derived by an eCRF system from other entered fields has an origin of "Derived, " since the sponsor controls the derivation.

    • A value derived from collected data by the sponsor, or a CRO working on their behalf, has an origin of

    "Derived."

    • A value derived by an investigator and written/entered on a CRF has an origin of "CRF" (along with a reference) rather than “derived”.

    • A value derived by a vendor (e.g., a central lab) according to their procedures is considered collected rather than derived, and would have an origin of “eDT”.

    Assigned: A value that is determined by individual judgment (by an evaluator other than the subject or investigator), rather than collected as part of the CRF or derived based on an algorithm. This may include third party attributions by an adjudicator. Coded terms that are supplied as part of a coding process (as in --DECOD) are considered to have an Origin of “Assigned”. Values that are set independently of any subject-related data values in order to complete SDTM fields such as DOMAIN and --TESTCD are considered to have an Origin of “Assigned”.

    There are cases where it is not exactly clear (some grey area) if an origin is Assigned or Derived. Without knowing the details of your variables, we cannot say for sure which is more appropriate. If it is not clear, then perhaps using Derived, and listing a clear computational method (detailing how you are setting a value "xx" based on the value of another variable), would be the most useful way (to someone reviewing the Define.XML) to handle the situation.
     

  53. if sdtm variables are carried to adam, does the Define.XML for adam need to repeat the code list for sdtm variables?

    You may not need to create codelists for these SDTM variables in your ADaM Define.XML, however do so is not wrong and in many cases (depending on your data) might be necessary to help facilitate reviewing the data.

    If you have a completed SDTM Define.XML available, then it should be easy to copy-paste some of the relevant codelists into your ADaM Define.XML.
     

  54. Regarding codelists, do you recommend making a separate codelist for each individual SDTM domain abbreviation?

    It may not be necessary to repeat these SDTM codelists in your ADaM define, however it cannot hurt.

    The one thing to avoid is having in your ADaM Define.XML is codelists for SDTM variables that were NOT carried over to ADaM. Doing so greatly increases the size of the Define.XML and makes it harder to decipher what is relevant to the ADaM data and what is not.
     

  55. Ongoing debate within our group. It appears that on occasion data management would create Concomitant Medication CRFs with almost any and all units that could ever be possible for medications. Sometimes this list can be on the order of 60-75 units. Would you still advocate that the codelist contain all 75 potential values? Even if, given study design, etc., some units would never be selected?

    This is a good question. I would create the codelist with all 75 possible values.

    The problem with this approach is that there are no validation rules that can automatically flag when there is a CRF value NOT in the codelist. There are rules to check that all the data values are present in the codelist.

    If data management can standardize their values better, so the same 75 units are present in the majority of studies, then this would cut down on QC time.
     

  56. Regarding the codelist: In the codelist, ALL values that are 'planned' (Available on the CRF) are to be listed, correct? Not just the values that actually occured with the patients? (example with Race - Asian listed as well even if it didn't occur, right?)

    Yes, the codelist should show the planned data collection, not just the values in the data.

    Technically the Define.XML could (some say "should") be created after the data collection has been finalized but before the study starts.
     

  57. Can we have both Comment and Method specified for the same variable?
    Yes, it is fine to include a comment, in addition to a computational method, for a derived variable, if that is appropriate to provide clarity for a variable in the Define.XML.
     
  58. What derivations should be provided in computational method section and What derivations should be provided in complex algorithms taht goes to external pdf?

    The external PDF should only be used on very long and complex derivations.

    The only advantage of the external PDF is to use formatting like bullet points, new lines, numbered lists, etc. in order to make the derivation easier to read for humans.

    Most of the time the external PDF document is not needed. On the times it is needed, you probably need less than 10 derivations in the exernal PDF.
     

  59. About the UNIT codelist, are there extra P21 Warnings that are generated when creating a custom UNIT codelist instead of using just the one UNIT codelist across the submission? I prefer to create sep UNIT codelists for the domains, but have gotten pushback about it.

    There are no rules to enforce separate codelists for separate UNIT variables.

    There could quite easily be two UNIT variables with the exact same values, e.g. VSORRESU and VSSTRESU often have the exact same distinct values and therefore it is perfectly acceptable to assign the one UNIT codelist to both VSORRESU and VSSTRESU.

    If you are experiencing pushback when you try to create separate codelists, I would ask whoever disagrees with you what they think the benefits are of having the one UNIT codelist. The exposure and laboratory example is the best to use. Get a distinct count of the units in labs and a distinct count of the units in exposure, then ask what the benefit would be to have a UNIT codelist assigned to EXDOSU, when the almost all the values in the UNIT codelist have nothing to do with EXDOSU.
     

  60. How are multiple codelists handled. For example DSDECOD

    At the variable level, I would create a codelist to capture all the DSDECOD values.

    Then at the value level I would create two codelists, one with values from the "Protocol Milestone" CDISC codelist and the other with values from the "Completion/Reason for Non-Completion" codelist. The where clases for the value level metadata would most likely reference the category (DSCAT) variable.
     

  61. how to handle ny response error message? is it p21 updated?

    I believe you are referring to: INVALID TERM IN CODELIST 'NO YES RESPONSE (YES ONLY)' CODELIST (DD0024)

    This validation rule fires when a variable should only have a value of ‘Y’ or null, per CDISC implementation guidance, but in the Define.XML that variable references a codelist that contains other values.

    A common reason for this issue is that a sponsor will create one No/Yes codelist, and have many variables reference it, regardless if all of the values of that variable apply.

    An example of this is the Subject Death Flag (DTHFL) variable in the Demographics domain. This variable, per CDISC guidance, should be ‘Y’ or null. However, it is common for sponsors to reference an No/Yes codelist in the Define.XML for this variable, that contains values of ‘N’, ‘U’, etc. By referencing a codelist with these other values, it becomes unclear if the sponsor is using these values that aren’t allowed.

    The solution to this issue is, for variables where only values of ‘Y’ are allowed, to reference a separate codelist with only this value.
     

  62. how to handle pkunit error message? is p21 updated?
    The SDTM Implementation Guide lists the UNIT codelist for the PCORRESU/PCSTRESU variables, therefore that is what we validate against. When the SDTM Implementation Guide is updated to show that the PCORRESU/PCSTRESU variables should use the PKUNIT codelist, we will validate these variables against that codelist.
     
  63. Instead of using separate file (supplemental data definitions), can extra long derivations just be put in the reviewers guide?
    There is no set guidance on where these long and complex derivations should be documented, so yes, putting them in the Reviewer's Guide would also be valid.
     
  64. How often do you see sponsors submitting separate documents to explain complex algorithms other than SDRG and/or ADRG documents

    Only some studies will need a complex algorithms document and even when one is used, there probably only needs to be a handful of algorithms in the external document.

    Having said that, in reality we don't see enough complex algorithms documents. At a guess, maybe 1 in 5 studies need a complex algorithms document but in reality we see 1 in 20 studies with a complex algorithms document.
     

  65. If a variable has more than one origins eg (CRF and Derived) what should go in to the origin column?
    If you have multiple origins for a variable, then at the variable level the origin must be left null and the origins must be declared at the variable level metadata.
     
  66. Should we use one codelist for the varaibles(eg QSCAT) in split domains or should we use separate codelists?

    I don't think there is a definitive right or wrong way to do this. It mainly depends on how you want to set up your Define.XML file.

    Will your define show the entire QS domain, just the smaller split QS domains, or both the entire QS domain plus the split domains?

    The CDISC Define.XML v2.0 SDTM example shows just the split domains, so in that instance I would just create QSCAT codelists for each split dataset.

    If the main QS domain was shown in the define, then a codelist for all QSCAT value would be appropriate.

    If your questionnaires have different origins, then you could possibly use the QSCAT values in your value level metadata where clauses.
     

  67. Does CDISC intend to provide firm annotation rules?

    CDISC is currently working on version 2.0 of the Metadata Submission Guidelines.

    This will give more guidance on annotations. Sorry I do not know if CDISC have forecasted a completion date for this.
     

  68. when we leave origin to null for multi-origin variables in the variable tab, do you suggest to leave a comment to guide reviewers to the VLM to see the origins?

    You could use a comment, this will not hurt anything but also is not absolutely necessary.

    The variable will have a hyperlink to take people to the variable level metadata where they will see the multiple origins.
     

  69. Do you usually use FREQ codelist for CMDOSFRQ. This is often a messy field. We code them as much as possible.  Is that a good way to do it. 

    Unfortunately, that's exactly how I would do it.

    Go through and convert as much as you can to CDISC CT, add whatever doesn't match as extended codelist values in your Define.XML codelist (to avoid SD0037), and lastly add an explanation to your Reviewer's Guide, explaining why CT2002 is firing for CMDOSFRQ.
     

  70. We have an oncology study with > 10 amendments. We put the first 200 characters of the inc/exc in the test code in IE and TI. Then, we include the complete inc/exc criteria in a PDF that is linked to in the Define. Is this a good approach?

    Yes, that approach is acceptable and it's actually my preferred approach.

    I find that if you abreviate words and sentences to try and get the text below 200 characters, there's way too much back and forth on what words to include/exclude as nobody can come to an agreement.

    Personally I don't think there's any harm in truncating at 200 characters, it's quick, and as you said you are providing the full text in PDF. Plus the protocol should have the full criteria.

    Generally people can also easily tell if a sentence has been truncated and this might prompt them to look in your Reviewer's Guide for the full text. Easily spotting abbreviated text is a little harder and the end user might not get the full meaning if they do not review the full text in the Reviewer's Guide / protocol.
     

  71. if the aCRF has Visit page, should you annotate SVSTDTC/SVENDTC even if these are derived and what should the origin be

    I would probably avoid annotating SVSTDTC/SVENDTC.

    I'm guessing that this Visit page is mainly used to populate dates like VSDTC for assessments captured on the following CRF pages at the same visit. If this is the case, then I would annotate VSDTC and any other date variables populated from this page. Lastly I would set SVSTDTC and SVENDTC with origins of Derived in the Define.XML.
     

  72. what are the main differences between SDTM DEFINE and ADaM DEFINE, many varialbes are copied from SDTM, do we need to have CRF pages written again in ADaM DEFINE?

    The basic principles behind the Define.XML are the same for SEND, SDTM and ADaM.

    The main difference between SDTM and ADaM Define.XML's is what you mentioned with the origins.

    For SDTM, every study has origin values of CRF and/or eDT. The SDTM data should then be used as the input to creating the ADaM data and because of this, it is not valid in an ADaM Define.XML to have origins of CRF and eDT. Instead for ADaM define files there is the "Predecessor" origin, to identify the SDTM variable where this data originated from.
     

  73. Can P21 scan an aCRF to make connections between aCRF and Define.XML? If so, is there any reference on how to encode annotations in any particular way to facilitate this connection?

    Yes, P21 Enterprise will read in your aCRF and make any Define.XML page origin corrections.

    There is no particular formatting issues we have encountered
     

  74. For dual origin fields, what is the recommended extent of value level needed (just ORRES)? Or do we generate value meta for every variable of dual nature (DTC, QSTESTCD, etc.)?

    I would show the multiple origins for all variables involved.

    Using QS as an example, there really isn't too many variables which will have multiple origins, probably only QSTEST, QSORRES, QSDTC and maybe VISIT.

    I'd try and use QSCAT in my Where Clauses, to avoid having to list out all the QSTESTCD values which apply to each case.
     

  75. Back to the Complex Algorithms topic. Does the link to the document need to point to the specific section within the document that explains a particular variable?

    It would definitely be better to have the link take you to the right spot, but this is not required.

    An SDTM Define.XML should already have some hyperlinks that take you to specific pages in the aCRF, so you should be able to use the same method or same code to create good hyperlinks to your complex algorithms document.
     

  76. Is there a limit on the characters allowed in the derivation cell? or is it okay to add >700 characters as long as it's readable?

    XML and ODM do not specify a length restriction. However the FDA's current system only supports a maximum of 1000 characters.

    The Define.XML rule DD0086 flags Define.XML's with elements longer than 1000 characters.
     

  77. Can we create a separate doc for algos? I mean can it be part of SDRG?
    There is no set guidance on where these long and complex derivations should be documented, so yes, putting them in the Reviewer's Guide would also be valid.
     
  78. if long derivation is directed with a link to complex algorithm document, should we limit our wording of derivation in source/comment?
    I like to keep the full derivation in the Define.XML, even if it looks messy and is hard to understand. I do this purely so that if anyone is doing some automated tasks pulling information out of the Define.XML, then they can still access the full derivation.
     
  79. Can you show an example of how the value level would look for multiple origins. Can you also clarify what origin you put at the variable level? Sorry. Did you say to leave the origin null at the variable level for those with multiple origins and add the multiple origins with a method at the value level?
    If you have multiple origins for a variable, then at the variable level the origin must be left null and the origins must be declared at the variable level metadata.
     
  80. what is the standard for including reference data - i.e. data that is used in SDTM derivation - should it be sent as xpt file and referred in RG or should it be linked as PDF file in define,xml itself for example Lab details or ranges from a spreadsheet that is used to create LB domain

    Lab ranges are essentially assigned. Even if the ranges are provided with your centralized lab data, someone had to assign those range values when they were creating the lab data.

    It isn't required to provide your spreadsheet with the tests and ranges, however it would be fairly easy to provide.

    I'd put these ranges into a PDF and link to it from the Define.XML, or you could put the ranges as an appendix in the SDRG.
     

  81. Should sas programs be part of the submission?

    Define.XML v2.0 does provide a way to link the Define.XML to an external file containing code.

    CDISC have provided an example on how to perform this. Please look at the Define-XML-2-0-Specification, page 45, section 4.6.1.3. Example of Method Definition with Programming Code Reference

    CDISC also released the standards Analysis Results Metadata v1.0 for Define-XML v2. This is specifically to facilitate explaining complex derivations in ADaM data and details how to submit code.
     

  82. Is our understanding correct that if the variable such as AE.AELLT is going to AELLT you do not need to present it in the define (comments)

    Yes, there's really no need to document that, since it's redundant information.

    Additionally, if AE.AELLT is your raw dataset and raw variable names, then technically documenting this in the Define.XML is incorrect since they are raw data references.
     

  83. May I ask it is acceptable of, in one codelist, letting some decode value blank while some are not, since some of the terms of not in SDTM CT?
    Per the Define.XML rules, this is not acceptable and will result in OD0082 firing: "When Codelist contains at least one item with Term and Decoded Value (CodeListItem), then all other items must be of the same type."
     
  84. If avisit is derived would avisitn be assigned or derived. I think it should be derived as it is part of the avisit but I have seen it done both ways.

    This is a bit of a grey area and I too have seen AVISITN with origins of assigned and derived.

    Sometimes I see the derivation listed for AVISITN as a CASE statement based on the AVISIT values.

    AVISIT and AVISITN are paired variables, if AVISIT has a clear derivation then, to me, I think it doesn't really matter if AVISITN has an origin of derived or assigned.
     

Want a demo?

Let’s Talk.

We're eager to share and ready to listen.

Cookie Policy

Pinnacle 21 uses cookies to make our site easier for you to use. By continuing to use this website, you agree to our use of cookies. For more info visit our Privacy Policy.