Wednesday 18 January 2017

Nursing Nanda Diagnosis

[music playing] >> mary engler: well,welcome back from break and i'm delighted to introduce-- after such an incredible morning with such greatspeakers -- i'm delighted to introduce our next speakerdr. bonnie westra, who'll be presenting big dataanalytics for healthcare. dr. westra is directorfor the center of nursing informatics and associate professor in the school of nursing at theuniversity of minnesota.

she works to improvethe exchange and use of electronic health data. her important work aims tohelp older adults remain in their community andlive healthy lives. dr. westra is committed tousing nursing and health data to support improved andbetter patient outcomes as well as developing thenext generation of nurse informaticists --informatistatcians. [laughter]

okay. please, join me in a warmwelcome for dr. westra. [applause] >> bonnie westra: is itpotato or potato [laughs]? so, i am just absolutelythrilled to be here and this is an amazing audience. it's grown since lastyear, so this is great. so, today what i'd liketo do is to relate the importance of big data inhealthcare to what we're

talking about today,identify some of the critical steps to make datauseful so when you think of electronic health recorddata or secondary use of existing data, there is alot that has to be done to make it useable forpurposes of research. look at some of theprinciples of big data analytics and then talkabout some examples of some of the science, and you'llhear a lot more about that during the week in termsof more in depth on that.

so, when we think about bigdata science, it's really the application ofmathematical algorithms to large data sets toinfer probabilities for prediction. that's the verysimple definition. you'll hear a number ofother definitions as you go through the week as well. and the purpose is really tofind novel patterns in data to enable datadriven decisions.

i think as we continue toprogress with big data science, we won't only findnovel patterns but in fact we'll be able to do muchmore of being able to demonstrate hypothesis. one of my students was ata big data conference that mayo university in minnesotawas putting on, and one of the things that they'restarting to do now is to replicate clinical trialsusing big data, and they're in some cases able to comeup with results that are 95

percent similar to havingdone the clinical trials themselves. so we're going to be seeinga real shift in the use of big data in the future. so when i think about bigdata analytics, what this picture's really portrayingis big data analytics exists on a continuum for clinicaltranslational science from t1 to t4 where there'sfoundational types of work that need to be done but weactually need to apply the

results in clinical practiceand to learn from clinical practice that it theninforms foundational science again. when you look at the middleof this picture, what this is really showing is thatthis is really what nursing is about. if you look at the ana'sscope and standards of practice on the socialpolicy statements, nursing is really about protecting,promoting health and then to

alleviate suffering. so when we focus on -- whenwe think about big data science in nursing, that'sreally kind of our area of expertise. and what you see on thebottom of this graph is it's really about when we movefrom data, you know, we don't lack data. we lack information andknowledge and so it's really about how we transformdata into information into

knowledge, and then the wiseuse of that information within practice itself. this was, i -- we weredoing a conference back in minnesota on big data and ihappened to run into this graphic that just, you know,it's like how fast is data growing nowadays? and so what you can see isdata flows so fast that the total accumulation in thepast two years is a zeta byte. and i'm like, "well,what is a zeta byte?"

a zeta byte is a onewith 21 zeroes after it. and that what you can seeis the amount of data that we've accumulated in thelast two years equals all the total informationin the last century. so the rate of growth ofdata is getting to be huge. data by itself though,isn't sufficient. it really needs to beable to be transferred or transformed intoinformation and knowledge. well, when we think abouthealthcare, what we can see

is that the definition isthat it's a large volume, but it might notbe large volume. so when you think aboutgenomics sometimes it's not a large volume, but it'svery complex data, and that as we think about gettingbeyond genomics and we think about where we're at, it'sreally looking at where are all the variety of datasources and, it's the integration of multipledatasets that we're really running into now.

and it's data thataccumulates over time, so it's ever changing andthe speed of it is ever changing. what you can see in theright-hand corner here is that there -- as we thinkabout the new health sciences and data sources,genomics is a really critical piece, but theelectronic health record, patient portals, socialmedia, the drug research test results, all themonitoring and censoring

technology and more recentlyadding in geocoding. so as we think aboutgeocoding, it's really the ability to pinpoint thelatitude and longitude of where patients exist. it's a more precise way oflooking at the geographical setting in which patientsexist, and that there's a lot of secondary data thenaround geocodes that can give us backgroundinformation about neighborhoods that includesuch things as, you know,

looking at financialclass, education. now it doesn't mean thatit always applies to me, because i might be an oddperson in a neighborhood, but it gives us morebackground information that we may not be able to getfrom other resources. so, big data is really aboutvolume, velocity, voracity as dr. grady pointedout earlier today. now as we think about bigdata, 10 years ago when i went to the university ofminnesota and my dean,

connie delaney [phonetic sp]had talked about doing data mining and i thought,"oh, that sounds really interesting." because i was in thesoftware business before and our whole goal wasto collect data in a standardized way that canbe reused for purposes of research and qualityimprovement. i just didn't know what todo with it once i got it. and so i've had the fortuneto work with data miners.

we have a large computerscience department that does internationally known forits data mining, and a lot of that work was fundedprimarily by the national science foundation at thattime because it was really about methodologies. well now we're starting tosee big data science being funded much more mainstreamin addition now, nih, ctsa, et cetera, are all workingon how do we fund the knowledge, the newmethodologies that we need

in terms of bigdata science? so, an example of some ofthe big data science that really is funded alreadytoday is that if we look at our ctsas. so, there's 61-plus ctsaclinical translational science awards across thecountry and the goal is to be able to sharemethodologies, to have clinical data repositoriesand clinical data warehouses, and then tobegin to start to say, "how

do we do some research thatgoes across these ctsas? how do we collaboratetogether?" or as we look at pcornet. pcornet is another example. so as we think about,there are 11 clinical data research networks -- thismay have increased by now -- as well as 18 patientpowered research networks. we happen to participate inone that has 10 different academic of healthcaresystems working together,

and it means that for ourdata warehouse we have to have a common data modelwith common data standards with common data queries inorder to be able to look at research such as we'relooking at als, obesity, and breast cancer. and wouldn't it be nice ifwe could look at some of the signs and symptoms thatnurses are interested in addition to looking atspecific kinds of diseases? when we look at some of thework that optum health as

well as other insurancecompanies, they're really beginning to take a look atamassing large datasets. so optum labs happens tohave 140 million lives from claims data, and they'readding in 40 million lives from electronic healthrecords, so that provides really large data sets forus to be able to ask some questions in ways that wehaven't been able to do. i'm excited about reuseof existing data, and so hopefully some of thatenthusiasm will wear off on

you today because it'sreally a great opportunity. now, in order to use largedata sources, what that means is that we needa common data model. we need standardizedcoding of data and we need standardized queries. what i mean by that is thatif we don't ask about the same variables and we don'tcollect the data or code the data in the same ways, itmakes it hard for us to be able to do comparisons thenacross software vendors or

health systems oracademic institutions. and with the pcori grant forinstance, we're actually looking at how do we docommon queries so that if we've got the common models,we can write a query and share the queries withothers to be able to pull data out from multiplehealth systems in a similar way. so i'm going to talk aboutwhat i mean by that a little bit more and show youexamples of how we have to

be thinking in nursing aboutthis as well as thinking interprofessionally. so when you look at pcornet,they started with a common data model one, then theywent to version two, and now this is version three that'sbeing worked on at this time. so you can see in the topleft hand corner we have conditions which might bepatient reported conditions as well as healthcareprovider conditions, but you can also see that down inthe left hand corner that

there are also diagnosis. so diagnosis are icd9coding that goes with it. icd10 is that unfolding. notice when you think aboutyour science, where is the data that you want foryour science, and is it represented in thiscommon data model? i would suggest that there'smany types of data in the common data model that'simportant to all of us as we think about where we'regoing whether it's

demographics of medicationsor, you know, what are the kinds of diseasesthat people have? and there's also somethingmissing as we move forward. so, before i get to what'smissing one of the things that i want to point outthat's critical is that in order for pcori or ncatsor any of these other organizations to be able todo queries across multiple institutions they haveto have data standards. and so when we look atdemographics for instance,

omb is the standard thatwe use for demographics. when we look at medications,it's rxnorm for medications. laboratory iscoded with loinc. procedures are coded withcpt hcpcs or icd9/icd10 codes. we also have diagnosis thathave icd9/icd10 but in addition, snomed ct codes oranother type of standard. and when we look at vitalstatus we're looking at the cdc standard for vitalstatus and with vital signs they're using loinc.

so loinc started asa laboratory data. it's expanded to includetypes of documents. it also has expanded now toinclude a lot of clinical assessments. so you're going to find themds used in nursing homes, oasis that's used inhomecare, you'll see things like the braden or the morsefall scales, and we're expanding more types ofassessments that are important to nursesin the loinc coding.

it also, by the way,includes the nursing management minimum dataset,which the announcement just came out this week thatwe've just finished updating variables and they've beencoded in loinc, so if you wanted to look at thework of linda aiken, for instance, you'd findstandard codes that can be used acrossmultiple settings. so, our vision of what wewant to see in terms of clinical data repositoriesthat are critical for nurses

is when we look at clinicaldata, we need to expand that to include the nursingmanagement minimum data -- the nursingmanagement dataset. what that means is weneed to look at nursing diagnosis, nursinginterventions, nursing outcomes, acuity, and wealso have to take a look at a national identifierfor nurses. which, by the way, everyregistered nurse can apply for an npi which is thenational provider identifier

so that we could tracknurses across settings, just like we do any other -- youknow, the physicians or the advanced nursepractitioners, but it's available for any rnto be able to apply. so, when we extend whatdata's available, if we added in what are theinterventions that nurses do? what are the additionalkinds of assessments that nurses do? that data is really criticalfor us to be able to do big

data science. what you can also see isthat there's management data -- often times we think ofthat as claims data -- but when you think aboutmanagement data it needs to go beyond that when we starttalking about standardized units. like if i see a patient inan icu does it matter and how do we even name icus? or psychiatric units?

at mayo we used tocall it three mary bry. well, howgeneralizable is that? so there are ways to be ableto generalize the naming of units and that actuallybuilds off of the ndnqi database. and then when we look at theworkforce in nursing, linda aiken's work i think is juststellar in terms of really trying to understand, whatare the things that we understand about nursesbecause they effect

patients' outcomes, and theyalso affect our nursing workforce outcomes as well. so our clinical datarepositories need to expand to include additional datathat's sensitive to nurses and nursing practice, and italso needs to go across the continuum of care. now, at the university ofminnesota, we have a ctsa award, and our partner isfairview health systems. and so you can see here thatas we built our clinical

data repository we have avariety of different kinds of data about patients andabout encounters that we have available to reusefor purposes of research. you can bet that thestudents that i have in the doctoral program are allbeing trained to be big data researchers. it's like, "stick with mekid, because this is the way we're going." so they use this but theyalso use, like, some of the

tumor registries ortransplant registries as another datasources as well. and this data's availablethen for looking at cohort discovery or recruitmentobservational studies, and predictive analytics. now, when you look at what'sactually in there and we characterize that data,we basically have over 2 million patients just inthis one data repository, and we have about 4 billionrows of unique data, so we

what's important to takea look at is, what is the biggest pieceof the pie here? it's flow sheet data. and what is flow sheet data? >> female speaker:[inaudible] bonnie westra: yeah, it'sprimarily nursing data, but it's also interprofessional,so ptot speech and language, dietician, social workers,there are specialized data collection for, like,radiation oncology and that

kind of stuff. but a lot of it isnurse sensitive data. so one of the things thatwe've been doing as part of our ctsi or ctsa award, iswe're looking at this what we call extended clinicaldata, and developing a process to standardize howwe move from the raw data and mapping the flow sheetdata to clinical data models. and that these clinical datamodels then will become generalizable acrossinstitutions the actual

mapping to the flow sheeti.d.s will be unique to each institution. one of the reasons this isimportant is i was just working on our pain clinicaldata model this last weekend trying to get ready to moveit into a tool we call i2b2, and we had something like364 unique i.d.s for the way we collect pain data, andthat those 364 unique i.d.s actually representedsomething like 54 concepts. or represented actually ithink 36 concepts, and when

you do pain rating on ascale of 0 to 10, we had 54 different flow sheet i.d.sthat are pain rating of 0 to 10. why don't we have one? so, what that means is thatwe have a concept in our clinical data model calledpain rating, specifically 0 to 10. we also have the flack andthe long baker and you know, every other pain ratingscale possible in the system. but it means that we have toidentify a topic like pain.

we have to identify whatare the concepts that are associated with that. then we have to look at howwe map our flow sheets to those concepts. we then present it to ourgroup in an interactive process for validationbefore we can actually move that into making it usefulfor purposes of research -- researchers. so we now have astandardized process that

we've been able to develop,and now we're moving it into trying to develop opensource software, so that if you wanted to come play withus and you wanted to say, "i like the model you're usingand i want to use it, and let's see if we can do somecomparative effectiveness research," that it'ssomething that can be shared with others. and that's part of thenature of the ctsa awards is that we develop thingsthat can be used across so

everybody doesn't haveto do it independently. so here's examples of someof the clinical data models that we've been developing. so behavioral health, wehave somebody who's a specialist in that areawho's working on a couple of models. most of them arephysiological at this point, and we started that waybecause of another project we're working with.

but one of the things thatwe started with internal is we said, "what are thequality metrics that we're having to report out thatare sensitive to nursing?" so when you looking atprevention of falls, prevention of pain, cauti,vte, and one other i can't think of right now, but wereally tried to take a look at what are those thingsthat are really sensitive to nursing practice and thenhow do we build our data models that can be used forquality improvement, but

also can be used then forpurposes of research? if we do certain things at acertain point in time, does it really matter? and then we've extended itto some other areas that are, you know, based on whatare the most frequent kinds of measures that mightbe important to nurse researchers to beable to work with. now, one of he things thatthe ctsas do is many of them use a tool called i2b2, andi2b2 can do many things, but

one of the first things itdoes is it provides you with de-identified counts, of howmany patients do you have that meet certain criteria;so if you're going to submit a grant, that you would beable to know whether you had enough patients to actuallypotentially recruit. one of the things that ismissing out of it is almost everything that'sin flow sheets. so, judy warren andcolleagues proposed an example of what would itlook like in i2b2 if we

added in some of he kinds ofmeasures that we're looking at that are like reviewof systems of some of the clinical quality measures. so we're in the process ofreally looking at a whole methodology of how to movethat flow sheet data from the data models in to i2b2so that anybody could say, "oh, i'd like to study, youknow, prevention of pressure ulcers. how many stage four pressureulcers do we actually have

and, you know, what kind oftreatments are they getting and does it matter?" and so that's an example ofhow this tool will be used. now, in order to make datauseful it also has to be coded. so remember the slide ishowed you that showed we're using rxnorm and we're usingloinc and we're using omb and we're using cdc codes? well, when we look at whatcode set should be used for standardizing the data thatwe use that's not part of

those kinds of data, you'llsee that the american nurses association actually hasrecognized 12 terminologies or datasets and they'redone recognizing new ones. now it's just continuingto keep them up to date. and so, the ana just cameout with a new position statement, "inclusion ofrecognized terminology supporting the nursingpractice within electronic health records and otherinformation solutions." what that means is they sayin that new paper that just

came out is that allhealthcare setting should use some type of astandardized terminology within their electronichealth records to represent nursing data. it makes it reusable thenfor purposes of quality improvement and comparativeeffectiveness research. however, when it is storedwithin clinical data repositories or when we'relooking at interoperability across systems, then snomedct is the standard that

would be used fornursing diagnosis. so you might use the omahasystem of nanda or ccc or any of these, but it has tobe mapped then to snomed ct so that if i'm using theomaha system and you're using icnp, that theyactually can talk to each other where they havecomparable terms. what the ana has alsorecommended is that nursing interventions, whilethere's many standardized terminologies, actually usesnomed ct for being able to

do information exchangeand for building your data warehouses if you're usingdifferent systems that you want to do research with. and that nursing outcomeswould be used with snomed ct, sometimes maybe loinc,and that assessments be used with loinc, and i won'tgo into all the details underneath that because it'smore complicated than that. because sometimes theanswers are loinc and sometimes they'resnomed ct, depending.

so there's a lot that goeson behind the scenes, but this is rally importantbecause if -- and this actually comes off of theonc recommendations for interoperability forclinical quality measures -- that's how these standardsactually came about so that it's consistent with thefederal policy when we're doing this. so, ana, it's ontheir website. the url was so long that wehad permission just to put

it on our website andgive you a short url. so if you want to learn moreabout it the url is listed down here. so, another effort that isgoing on is that in addition to some of the foundationalwork that we're doing through the ctsa, is thatthere is a whole group that's headed by susanmatheny that is about how do we build out an assessmentframework in very specific coding for the kinds ofquestions that we asked for

physiological measures? so when we look at the loincassessment framework we start with firstphysiological measures, and then there's other thingsshown in orange called the future domains that alsohave to look at what are the assessment criteria that aredocumented in electronic health records that needstandardized code sets? so there's a group thatsusan matheny is heading up that includes softwarevendors, different

healthcare systems, peoplewith ehrs that aren't the same ehrs, and they'repulling together a minimum set of assessment questionsand getting standardized codes for those minimum setof assessment questions and they were just submitted toloinc i think the end of june for final coding anddistribution in the next release of loinc. and this group is continuingon to build out additional criteria for assessment,so that we have comparable

standards acrossdifferent systems. now, i mentioned that thenursing management minimum dataset -- this was actuallydeveloped back in about 1997 recognized by the americannurses association and has been just updated for twoout of the three areas. so in the environment youcan see the types of data elements that are included-- and this is very high-level data elements-- there's a lot of detail underneath these.

and you can seenursing resources. now, when this was updatedwe harmonized it with every standard we couldpossibly find. a lot of it has been ndnqi,so the nursing database for quality indicators, but it'salso been harmonized with every other standard wecould find so that there weren't different standardsconsistently for these types of variables. it also -- if you'vefollowed the future of

nursing -- future of nursingwork from the iom report and the robert wood johnsonfoundation, it matches the workforce data that they'retrying to collect through the national board --state boards of nursing. so again, if you'recollecting data for one reason that in fact you canactually use it for multiple reasons when you're using astandard across the country. so, there is areference here. you can go to loinc.org,and if you look under news

you'll see the release thatcame out this last week about this, and then you'llalso see that if you go to the university ofminnesota website that the implementation guide isavailable that gives you all of the details that younever wanted to know but need if you're actuallygoing to standardize your data. so, the point of all this isthat when you think about using big data and you wantto do nursing research, it's really critical that wethink about all of our

multiple data sourceswhether it's electronic health record or if you'rethinking about with management minimumdataset for instance. you're thinking aboutscheduling, you're thinking about hr data, and thatdoesn't even begin to get into all the device dataand the personal data contributed by patients. so that's additional data,and think about what it's going to take to standardizethat in addition.

it won't be on my plate, butmany of you might want to actually do that becauseit's a really good way to begin to move forward. so the message that i wantedto leave you with on that is there's lots of data. when we think about nursingresearch that we are at the very beginning of startingto say, "what data?" and how do westandardize that data? and how do we store andretrieve that data in ways

that we can do comparativeeffectiveness research with that data or some ofthe big data science. just one example, i'm notgoing to cover today but i'll talk a little bittomorrow, is we're pulling data our to electronic heathrecords to try to say, how do we really understandpatients that are likely to have sepsis, and thenthere's the sepsis bundle, that if you do -- you know,if you do certain types of evidence-based practicequickly and on time, you can

actually preventcomplications. well, we're pulling outelectronic health record data, and guess what? this is really interesting. we got an nsf grant to dothis and so we said, "well, we're going to look atevidence-based practice guidelines, nurses andphysicians, well guess what? the evidence-based practiceguidelines for nurses aren't really being used.

and so we're having tofigure out how would you find the data. not because nurses aren'tdoing a good job just the guideline types of softwarewasn't used in the way we thought. so then we said, "well,we'll look at, you know, we'll look at certain dataelements, and then we're also going to look atphysician guidelines and are they being used?"

so, in order to know if youdid something in a timely manner, you have to know,when did somebody suspect that sepsis began? do you know wherethat's located? maybe in a physician's note. and so the best way to findout if patients are likely to develop sepsis is nurse'svital signs and the flow sheet data. and so consistentdocumentation in those flow

sheet data becomesreally critical. and then if they're beingfollowed and adjusted, you have to understand thingslike fluid balance, cognitive status, yourlaboratory data as well as the vital sign data that'sgoing on with that, and lots of other stuff. so this ehr data is criticalin terms of being able to really look at how do weprevent complications. so i'm going to talk alittle bit now moving into

more of the analytics. so when we think aboutanalytics there is a book, it's free online. this is not an advertisementfor them, but it was one that changed my life. and so it's called, "thefourth paradigm of science." and it really talks about,how do we move into data intensive scientificdiscovery? and one of the things that ithink is really interesting

is, how many of you haveevery read a book -- a fiction book -- it'scalled "the timekeeper?" it is really a fun book. the thing that's fun aboutit is it talks about before people knew time existed,they hadn't picked up the observational patternthousands of years ago that basically said that, "oh,there is this repetitious thing called time." it then goes on to talkabout the consequences for

us of how we wantmore of it, you know? and so it's not always agood thing to discover things, but, you know, ourfirst science was really about observations andreally trying to understand what do we notice? you know, what'sthe empirical data? we then moved into thinkingabout a theoretical branch. so what are our models? how do we increase thegeneralizability of our science?

from there we've moved intoin the last few decades computational branch whichis really how do we simulate complex phenomena? and now, we're movinginto data exploration or something that'scalled e-science. so we can hear the term bigdata, or big data science. e-science is another termthat's used for that. so when you look at that,what you can see is that we have data that's beingcaptured by all kinds

of instruments. we have data that'sprocessed by software and we have information andknowledge that's stored in computers. and so, what we really haveto do is how do we look at analyzing data from thesefiles and these databases in coming up withnew knowledge? and it requires new ways ofthinking, and it requires new tools and new methodsas we move forward.

so foundational to big datascience is algorithms and artificial intelligence. so how do we take a look atif this then that, if this then that? so it requires structureddata, you know, so that we can develop these algorithmsto be able to come to conclusions. now machines are muchfaster at processing these algorithms than the humanmind is, and they can

process much more complex. so our big data science isreally about the use of algorithms that are able toprocess data in really rapid ways. semi -- what we call-- semi-artificial. not totally like you justthrow it in there and it does it and it givesyou the answer. there's a lot moreto it than that. so there's some principlesabout big data science that

are important, and one ofthose principles is let the data speak. so, what that means is weoften times will say, as i take a look at trying tounderstand cauti is one of the subjects that one ofmy students is working on. she's really trying tounderstand, we have these guidelines for how doyou have this catheter associated urinary tractinfection, how do we prevent that?

so if we follow theguidelines, why aren't we doing any better? and what's missing is weprobably don't have the right data thatwe're looking at. so she's actually combiningsome of the management data along with the clinical datato try to say are there certain units? are there certaintypes of staffing? is there -- you know, howdo staff satisfaction?

you know, how does thatall play into all of this? what's the experience? what's the education? you know, what's thecertification, the background? and so, she is throwing inmore types of data and then trying to let the data speakin terms of, you know, does this provide us any newinsights that we can think about? another thing is torepurpose existing data.

so once you have data, 80percent of big data science is the data preparation. i think it's closer to 90,but it takes forever to kind of get the data set upbecause it's not like you're collecting new data witha standardized instrument that, you know, hasall these validity and reliability, so there's alot of data preparation and transformation thatneeds to go on. so once you've got that doneand you understand the data

and the metadata, that isthe context, the meaning, the background of whydo we collect this? what does it actually mean? you know, give methe context of this. then we can understand,how is it collected? why was it collected? what are thestrengths of it? what are the limitations? when i first started inthis, i worked in

homecare software. there wasn't anything ididn't know about oasis. because i learned a tonby making every mistake, working with everybody icould, and understand it thoroughly. when i went to working withbig health system data, i'm like a noviceall over again. so once i get a good datasetset up believe me, i'm going to be workingwith that forever.

and so you'll see someexamples of that tomorrow on a different talk. so in big data scienceanother thing that we have to think about is that nequals all versus sampling. so it's not necessarilyabout random sampling, it's really about once you've gotall the data, you know, how does that effect yourassumptions about what you're doing in science? and there's anotherprinciple called

correlationsversus causality. so, you know, randomizedclinical trials are trying to understand the why. why did this happen? and what we're trying tounderstand and when we've got big data is, you know,what's the frequency with which certain things occur? what's the sensitivity? what's the specificity?

how do we understand theprobabilities that go with it? and so we're often timeslooking at correlations versus trying tolook at causation. big data's messy. i've had a chance to workwith our ctsi database where they've done a lot ofcleanup and standardization and then i've worked withthe raw data, same software vendor. i've certainly learned thatonce you have the data and

you clean it up, it reallymakes a difference. and will it ever be perfect? absolutely not. but we think our instrumentsare perfect, you know? and they'reactually not either. so there is a certainprobability that things occur and you get alarge enough dataset. you know, it really makes adifference in how you work with the data.

and then there's also aconcept called data storage location. so, there are some peoplethat think you should put all the world's data into acentral database and work with it, and then there areothers that do something called federateddata queries. so federated data queries iswhere, like with our pcori grant, everybodyhas their own data. it's modeled in the sameway and so we can send our

queries to be able to do bigdata research without having all the data in the samepot at the same time. another thing that's reallycritical is big data is a team sport. i can't say that enough. if you ask me all themathematical foundation for the kind of research we'redoing, i'm not the one that can tell you that. i work with these computerscience guys that have very

strong mathematicalbackground, and i get educated everydayi work with them. and so we need to -- and ialso know from example that they really don'tunderstand clinical. and so, you know, when wehad a variable gender they were going to take male anddo male/not male female/not female. and it's like, you only havetwo answers in the database, so why do we need fouranswers [laughs], you know,

for this? but that's just a simplething but they don't understand, like, you know,what's a cvp, for instance. i have to actually look someof that up now too as i'm getting further away fromclinical but it's really trying to understand youneed a domain specialist. you need a data scientist. a data scientist is anexpert in databases, machine learning, statistics,and visualization.

and you need aninformatician. so how do you standardizeand translate the data to information and knowledge? so, you know, understandingall that database stuff and he terminology stuffis really important. as i said, 80 percent ispreprocessing of the data. and then there's a wholething called dimension reduction andtransform use of data. so, one of my student said,"well, i want to use icd9

codes so i'llask for those." and i'm like, "what are yougoing to do with them?" and so she finally got downto what i really need to understand is there'scertain diseases that predispose peopleto having cauti. and so, i only need to beable to aggregate them at a very high level to see --and so it means you have to know all your icd9 structureand be able to go up to immunosuppressive drugs forinstance or other diseases

that predispose you togetting infections or previous historyof infections. so, you don't want13,000 icd9 codes. you really wanthigh-level categories. so it's learning how to usethe data, how to transform the data. a lot of times we have manyquestions that represent the same thing, so doyou create a scale? if your assumption for yourdata model is that you need

binary data, how doyou do your data cuts? you know? so with oasis data we use noproblem or little problem and moderate to severeproblem because we need a binary variable. and so it's that kind ofstuff that you need to do. and then there's all kindsof ways of saying, how do you understand thestrength of your answers? you can quantifyuncertainties so you're

looking at things like accuracy,precision, recall, trying to understand sensitivity,specificity, using aucs to try and understand thestrength of your models. so i'm going to quickly gothrough just a few examples of how we're now moving intousing some of these types of analysis and some of thenewer methods of being able to analyze data. so, one is naturallanguage processing. another is visualizationand a third is data mining.

what i'm not going todo is address genomics. i wouldn't touch thatone, it's not my forte. so, natural languageprocessing is really another name for it iscalled text mining. and that is, as we take alook at this, five percent of our data is reallystructured data and the most is not structured data. so we really need to -- wereally need to think about how do we deal with thatunstructured data because it

has a lot ofvalue within it. but, so an nlp can actuallyhelp us be able to create structured data fromunstructured data so that we then can be able to usethat data more effectively. so, it really uses computerbased linguistics and artificial intelligence tobe able to identify and extract information and sofree text data's really the sources. so when you think ofnurse's notes for instance.

the goal is to create usefuldata across the various sites and to be able toget structured data for knowledge discovery. and there are veryspecific criteria for trustworthiness. when i did my doctoralprogram and we wanted to do qualitative research -- thatwas many years ago people were a lot like, wellthat sounds like foo foo. well, now there is like, youknow, really trustworthy

criteria and there'strustworthy criteria for data mining as well. so when you look at, howmany of you have heard of watson? yeah, so when you thinkabout watson, watson was used initially testedwith jeopardy, you know? and finally itbeat human beings. so now ibm is actuallymoving into how can we use that for purposesof healthcare?

and how do we begin toharness the algorithmic potential of watson? so, watson is really anopportunity to begin to think about big data scienceand do you know how they're training it? they're asking -- they'redoing almost kind of like a think out loudwith physicians. like how do youmake decisions? you know, they're reviewingthe literature to see what's

in the literature. we need some nurses feedingdata into watson so that we can get other kindsof data in addition. but watson uses naturallanguage processing to then create structured datato do the algorithms. so when you think aboutanother example, how many heard of google flu trends? yeah, so with google flutrends, one of the things is how do you mind dataon the internet?

what kinds of things arepeople actually searching for that are thingsthat are about flu? what are thesymptoms of flu? what are the medicationsyou take for managing the symptoms of flu? and what they found is thatactually google flu trends could predict a flueepidemic before the cdc could. because it was based onpatients trying to do their symptoms, and then based onthat, they could see that

there was thistrend emerging. now when they actuallylooked at who had flu, the reported flu and the googletrends, cdc outdid google, but it pointed to anemerging trend that was occuring. and actually what we'reseeing now is we're doing some of that kind of miningof data with pharmaceutical reports lookingfor adverse events. and so we're using thefda has an adverse event

reporting system, and whatthey're finding is that as they're looking at thecombination of different drugs that people are takingthey're beginning to see where adverse eventsare occurring through combinations of differentdrugs that previously weren't known. so when you think about wedo these clinical trials, we get our drugsout on he market. after the drug's out on themarket it's like, how do

they actually work inthe real population? and i think eric'spresentation earlier with that new graphic that justcame out of nature, that one out of 10 or one out of 15people actually benefits, the question is howmany people get harm? and how do we know what thecombination of drugs is that could actually cause harm? so there's some reallyinteresting stuff that's going on with mining dataand looking at combinations

to try to understand, arethere things we just don't know? so another area's looking atnovel associative diagnosis. when i first read this i'mlike, "i don't get it." and what it is, is thatwe're really trying to understand what kinds ofmeaningful diseases co-occur together that wepreviously didn't know? so an example is obesityand hypertension. that's a real common one.

we know that those twogo together frequently. but how many combinations ofdiseases that we just don't understand go together? so there's a team ofresearchers that compared literature mining withclinical data mining and what they did is with thismassive dataset they looked at all the icd9 codesin a massive dataset. so this person has thesethree or five or 14 diagnosis that all co-occurtogether and they said,

"what do we see in theliterature of what diagnosis co-occur together?" because they thought thatthey could validate commonly known ones which they couldand they could discover new ones that neededfurther investigation. well, what they did is theylooked at that, is that they found there's very littleoverlap between diagnosis in the clinical datasetand in the literature. so the question is, is itthat the methodology needs

to be improved? is it that we only know thetip of the iceberg of what kind of thingsco-occur together? can we gain new insightsabout new combinations that frequently co-occur togetherthat can help us predict problems that people haveand try to get ahead of it? another example is earlydetection of heart failure. so there was a study thatwas done and i won't pronounce the name on thisby this person and the team

and what they were reallytrying to do is can they determine whether automatedanalytics having counter notes in the electronichealth record might enable the differentiationof subjects who would ultimately be diagnosedwith heart failure. so if you look at signs andsymptoms that people are getting, can you begin tostart seeing early on that this person's going to bemoving into heart failure or that their heart failuremight actually be worsening?

so that you can anticipateand try to prevent problems so that you can anticipateand try to make sure that the right treatmentis being done? so they wanted to use --they used novel tools for text mining notes for earlysymptoms and then they compared it with patientswho did and did not get heart failure. the good news is, is theyfound that they could detect heart failure early.

the bad news is people whodidn't get heart failure also had some ofthose symptoms. so again, we're at thebeginning of this kind of science and it really needsto be refined so that we can begin to get betterspecificity and sensitivity as we do these algorithmsthat we're developing for predicting. now visualization is anothertype of tool and, so as you think about how do weunderstand massive amounts

of information? so there's a lot ofdifferent tools for helping us to be able to quickly beable to see what is going on, and so these are justexamples of visualization not to read what thedetails are about this. but what you can see isthere was a study done by lee [phonetic sp] andcolleagues where they were trying to understand olderadults and their patterns of wellness from point a toeight weeks later in terms

of their wellness patterns. but what they were reallytrying to do in this study is to say, what kind of waycan you visualize holistic health? and do you visualizeholistic health and the change in holistic healthover these eight weeks by using a stacked bar graph,you know, or one of the other types of devices? and then they had focusgroups and they tried so

say, "what do youthink about this?" you know, "how well doesthat help you to process the information?" and so it helped them to beable to think about it -- it's really a cognitivescience kind of background of how people processinformation, what kind of colors, how much contrast,what shapes and design help people be able toprocess information? so this is kind of anemerging area where we're

really trying to understandpatterns related to different phenomena. karen munson for instance,one of my colleagues, has been looking at this withpublic health data, and she's looking at what arethe patterns of care for maternal childhealth patients? moms who have a lot ofsupport needs from public health nurses, and are thereindividual signatures of nurses and how they providecare and are certain

patterns more effective,and with what subgroup of patients are thosepatterns more effective? so she's using visualizationmore like this stream graphic over on the topleft side here to look at signatures of nursingpractice over time. so one of the things i findis that as we're doing data mining, the geneticalgorithms are increasing in their accuracy andtheir abilities. so if you think about thefinancial market, i don't

know about you, but i cameback from a trip to taiwan one time, went to purchasesomething at radioshack and my credit card was declined. and i'm like, "what doyou mean my credit card's declined?" and they said,"it's declined." and so i'd usedit in taiwan. what i didn't know is thatwas an unusual pattern for me and they happened to pickit up and they said, "were

you in taiwan?" and i'm like, "yeah,i was in taiwan." they said, "okay, fine. we'll enableyour card again." well, it used to be thatthey would do a 25 percent sample of all thetransactions and be able to pick up these abnormalpatterns to try to look for fraud. now they actually canprocess 100 percent of

transactions withfairly good accuracy. so if they can do that withbank transactions, why can't we do that with ehr data? and part of it is theyhave nice, structured data [laughs], you know? in compared towhat we're using. so data mining is reallyabout, how do you look at a data repository, select outthe type of data you want, look at preprocessing thatdata, which is 80 percent of

the work, do transformation-- so creating scales or looking at levelsof granularity. but then it uses somedifferent kind of algorithms and differentanalytic methods. so up until i got to datamining on this graphic we're really talking abouttraditional research in many ways. but when we get to datamining we're then looking at all kinds of differentalgorithms that get run that

are semi-automated that cando a lot of process that we have to do manually intraditional statistical analysis. and, in order to come upwith results, the next step is critical. we can come up with lotsof really weird results. i can't remember the onethat eric showed earlier, or maybe patricia grady didwhen she said, you know, "diapers and candy bars."

or something like that. but whatever it was, itdoesn't make sense, and so we really have to make surethat we're using our domain knowledge in order to see,is this actually clinically interpretable aswe move forward? so, data mining is alsoknown as knowledge discovery in databases. it's automated orsemi-automated processing of data using very strongmathematical formulas to do

this and that there areabsolutely ways of being able to look at thetrustworthiness of the data. so we use -- a lot of it issensitivities, specificity, recall accuracy, precision. there's also somethingcalled false discovery rates is another way of checkingout the validity of what you're finding. and there are lots ofdifferent methods, so some of those methods areassociation rule learning,

there's clustering analysis,there's classification like decision trees, and many newmethods that are emerging constantly. so it's not like you can saydata mining is just data mining. it's like sayingquantitative analysis, you know? so it's lots of differentmethods of being able to do this. i think an example of datamining is the fusion of big data and little babies.

so there was actually astudy that was done looking at all the sensory of datain a nicu and trying to understand who's likely todevelop infections and that they were able to find that24 hours earlier than the usual methods of capturingcontinuous data from multiple machines they wereable to pick up who was going to run into troubleand to head it off with the nicu babies. so, it has verypractical applications.

another example is lookingat type 2 diabetes risk assessment and really tryingto understand with not just association rules, but nowwe're moving into newer methods of trying to lookat time series along with association rules and tryingto see patterns over time and how those patterns overtime and the rules you can create from the data willpredict who's likely to run into problems. and so, some of the workthat george simon [phonetic sp]

has done with his groupis really looked at survival association rules and theysubstantially outperform the framingham score in terms ofbeing able to look at the development ofcomplications. so, in conclusion, big dataare readily available. we don't lack data. the informationinfrastructure is critical for big data analytics. one of my colleagues i'vedone research with, she

said, "i just keep hopingone of the days you can just throw it all in the pot andsomething will happen." and it's like, that is notwhat big data analysis is about. there are rules just likethere are for qualitative research orquantitative research. and that the analyticmethods are now becoming mainstream. so 10 years ago it wouldbe really hard to get data mining studies fundedunless you went to the nsf.

now that's getting to bemore and more mainstream. as a matter of fact, if youlook in nursing journals and you look for nurses who aredoing data mining, you won't find a lot out there yet. so it's still just really atthe beginning, but at least we're starting to get somefunding available now for doing it. so, one of the implicationsthough out of this that we really need to be thinkingabout is how are we training

our students, theemerging scientists. how are we trainingour self here today? but how are we training theemerging scientist to really be prepared to do this kindof science of big data analysis, and the newermethods that need to be done? how do we think aboutintegrating nurses into existing interprofessionalresearch teams? so, i don't know about you,but how many nurses do you know that are on ctsas thatare doing the data mining

with nursing data as partof the data warehouse? or on pcori grants wherethey're building out, you know, some of the signs andsymptoms that nurses are interested in are theinterventions in addition to the interprofessional data. and so, it's reallyimportant that we take a look at making sure thatwe're including nurse sensitive data as part ofinterprofesional data and that means that we reallyneed to be paying attention

to the datastandards, you know? so that we are collectingconsistent data in consistent ways withconsistent coding so we can do the consistent queries tobe able to really play in the big data science arena. so with that, i'll stopand see if you have any questions. i think we have oneminute [laughs]. we have a questionover here.

okay, so the question is howdo you find the colleagues like in computer sciencewho can really help you? well, i tell you, i wasreally ignorant when i started. i actually worked withsomebody from the university of pennsylvania the firsttime i did it because i didn't know any data minersat the university of minnesota. and, then i got talking withcolleagues who said, "oh, do you know so and sowho knows so and so?" and then i started actuallypaying attention to what's

being published at theuniversity of minnesota. it turns out that vipinkumar, who's head of the computer science department,is actually one of the best internationally knowncomputer scientists actually, he and michaelsteinbach, one of my research partners, havetheir own book published on data mining for the classthat my students take with -- along with the computerscience students. so, one, start with lookingat -- if you look at some of

the publications coming outof your university, it's the first place to start tofigure out if you have anybody around whocan do data mining. and i just didn't even knowto think about that when i first started. so, it's a goodway to start. part of it is playingattention to there's a number of -- if you go toaimia for instance there's a whole strong track of dataminers that have their own

working group at aimia. also, there's a lot of datamining conferences going on and so if you just startsearching for -- i mean, personally i do, i would dodata mining and university of minnesota in google, andthat's a really fast way of finding out who's doing thatas another strategy to try to find partners. and they were thrilled todeath, believe me, to get hooked up with people inhealthcare because they knew

that was an emergingarea, big data. they just knew that theydidn't know it, and i didn't know what they knew sotogether it made a good partnership. okay, thank you. >> mary engler: thankyou, dr. westra, that was just wonderful.

No comments:

Post a Comment