OHNLP News

From Open Health Natural Language Processing (OHNLP) Consortium
Jump to: navigation, search

Contents

OHNLP at AMIA NLP-WG PreSymposium

MedTime 1.0.2 released

MedXN 1.0.1 released

MedTagger 1.0.1 released

MedTime 1.0.1 released

MedTime 1.0 released

  • Nov 18, 2013 by Hongfang
  • The Mayo Clinic NLP team has released MedTime, a system to detect and normalize temporal expressions in the clinical domain.
  • MedTime documentation

MedXN 1.0 released

  • Nov 18, 2013 by Hongfang
  • The Mayo Clinic NLP team has released MedXN, a system to normalize medication data to clinical drugs.
  • MedXN 1.0

MedTagger 1.0 released

  • Nov 18, 2013 by Hongfang
  • The Mayo Clinic NLP team has released MedTagger, a system consisting of three components: dictionary lookup for indexing, pattern-based information extraction, and CRF-based named entity recognition.
  • MedTagger 1.0

cTAKES is an Apache Top Level Project

  • cTAKES has graduated from the Apache Incubator to being a Top Level Project at apache.org
  • See Mailing lists to search within or post to the new lists (you need to subscribe to a list before you post to it)
  • Apache cTAKES releases are available from the apache.org website

cTAKES 2.5 released

  • May 2012
  • The SHARP (SHARPn) Natural Language Processing (NLP) team has released an updated version (2.5) of the Clinical Text Analysis and Knowledge Extraction System (cTAKES). cTAKES is a free and open source NLP system for accessing clinical information stored in free text through NLP techniques.
  • cTAKES 2.5 includes
  • a new attributes (assertion) extractor
  • a semantic role labeler
  • an additional sectionizer
  • updates to the coreference resolver
  • an updated part-of-speech model

cTAKES 2.0 actively under development

  • cTAKES 2.0 will include
  • changes to the type system
  • a different tokenizer (implementing Penn Treebank tokenization rules)
  • If you have created your own dictionaries, for best results when switching to cTAKES 2.0, you should re-create them using the new tokenizer.

cTAKES 1.3 - Bundled SNOMED CT and RxNorm dictionaries, a constituency parser, and a coreference resolver

  • Dec 2011
  • The SHARP (SHARPn) Natural Language Processing (NLP) team has released an updated version (1.3) of the Clinical Text Analysis and Knowledge Extraction System (cTAKES). cTAKES is a free and open source NLP system for accessing clinical information stored in free text through NLP techniques.
  • cTAKES 1.3 includes
  • a constituency parser (a new annotator)
  • a coreference resolver (a new annotator)
  • a bundled set of UMLS dictionaries (SNOMED-CT and RxNorm), which allows anyone with a UMLS username and password to perform SNOMED-CT and RxNorm Named Entity Recognition out-of-the-box by just putting your UMLS username and password in a config file, provided you have an internet connection at runtime. You no longer need to separately download the UMLS Metathesaurus and convert the UMLS files to a format cTAKES expects (unless you wish to use additional vocabularies or customize the subset of UMLS yourself).

SideEffect annotator and icTAKES available

  • Oct 2011
  • The SHARPn team has released an updated version of the Clinical Text Analysis and Knowledge Extraction System (cTAKES), cTAKES 1.2.
  • cTAKES 1.2 includes
  • a new annotator (beta version), SideEffect, which extracts physician-asserted drug side effects from clinical notes.
  • This release also introduces an integrated version of cTAKES, icTAKES for end users and developers.
  • To distinguish these two cTAKES versions, cTAKES is distributed as cTAKES1.2.1 and icTAKES is distributed as cTAKES1.2.2.
  • Download

New cTAKES Annotator now available

  • March 2011
  • The SHARP 4 NLP team is excited to announce the release of a new annotator in cTAKES. This is the second cTAKES release from efforts completed through the ONC-funded SHARP project. For more on the SHARP 4 project, see the SHARP 4 Wiki
  • This release includes a Smoking Status Classifier that processes clinical documents and identifies patients' smoking status at the patient level as well as the document level. This pipeline will generate one of five smoking status categories:
  • Past smoker
  • Current smoker
  • Smoker
  • Non-smoker
  • Unknown
  • This annotator is modular, so if you already have cTAKES installed you can add this piece without having to re-install cTAKES.
  • Click here to download: OHNLP Documentation and Downloads
  • Stay tuned. The team is working on more updates to cTAKES that will advance the secondary use of Electronic Health Record data.

The SHARP 4 NLP team is excited to announce the release of cTAKES 1.1.

  • December 2010
  • The SHARP 4 NLP team is excited to announce the release of cTAKES 1.1. This is the first cTAKES release from efforts completed through the ONC-funded SHARP project. For more on the SHARP 4 project, see the SHARP 4 Wiki
  • The new software includes an updated medication annotator that will allow researchers to extract drug mentions from clinical free text. This was first developed under a grant from the AT&T Foundation. It includes features such as:
  • Frequency
  • Dosage
  • Strength
  • Form
  • Route
  • Duration
  • Drug change status
  • This version of cTAKES also includes a dependency parser — a foundational component that analyzes syntactic structure. This building block enables the development of future cTAKES components that utilize grammatical context to extract events, attributes, and relations from clinical documents.
  • Click here to download: OHNLP Documentation and Downloads
  • Stay tuned. The team is working on more updates to cTAKES that will advance the secondary use of Electronic Health Record data.

A beta version of a PAD annotator is now available for download.

  • July 28, 2010
  • The PAD (Peripheral Artery Disease) annotator is an optional addition to the Mayo Clinic cTAKES pipeline.
  • The new annotator includes an aggregate analysis engine for running the 'PAD term spotter' pipeline.
  • The 'PAD term spotter' pipeline extracts textual mentions pertaining to the diagnosis, treatment, etc. of lower limb Peripheral Artery Disease (PAD).
  • The pipeline assesses presence of phrases indicative of PAD in one or more sentences contained in radiology reports, and classifies documents for the presence or absence of PAD.
  • There are four possible classification provided for PAD (present, probable, absent, unknown) and two levels of classification, namely, document and patient level
  • Funding provided by the NHGRI in association with the eMERGE network

Two challenge grants have been funded that will build upon cTAKES

  • December 3, 2009
  • The first project, "Natural Language Processing for Cancer Research Network Surveillance Studies" launches a collaborative effort to advance adoption within the HMO Cancer Research Network (CRN) of "industrial-strength" natural language processing systems.
  • A collaborative effort of six CRN sites to address the challenges of the cost of adopting a comprehensive and flexible NLP system, previously only feasible in large, academically-oriented biomedical informatics research programs and the need to acquire NLP training for local staff which presents residual barriers for adoption.
  • A significant impact of this grant is the plan to integrate into the cTAKES system an open-source de-identification tool.
  • The proposed NLP algorithm in the grant will classify sets of patient charts as either containing or not containing a diagnosis of breast cancer which could dramatically reduce the cost of research in this area.
  • Original aspects of the grant include a transdisciplinary research team and a resourceful strategy for building the technical resources and staff to support an ongoing program of applied NLP research.
  • The second project, "Multi-source Clinical Question Answering System" proposes to develop a unified platform for clinical Question Answering (QA) from multiple sources of clinical and biomedical narrative that implements semantic processing of the questions by fusing two existing technologies - Mayo Clinic's cTAKES (Clinical Text Analysis and Knowledge Extraction System) and the University of Colorado's Question Answering System.
  • The challenges in the Question Answering field are mainly attributed to the number of components that require domain specific training along with strict system requirements in terms of high precision and recall complemented by an accessible and user-friendly presentation.
  • The approach proposed in the grant is to overcome them by re-using components already in place as part of the two existing technologies.
  • The approach brings together information from encyclopedic sources and the EMR to present it into an unified form to the clinician at the point of care or to an investigator in the lab.

Announcing the release of the Automated Retrieval Console (ARC)

  • November 12, 2009
  • From the Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC) that can use cTAKES and/or medKAT.
  • The Automated Retrieval Console (ARC) is open source software that provides a suite of tools to mix and match NLP with machine learning and interfaces to quickly calculate performance. ARC requires only examples of targeted information in order to develop, evaluate, and deploy IR algorithms.
  • More information, code, a step-by-step tutorial, and how-to videos for using ARC can be found at MAVERIC.

IBM and Mayo Clinic jointly present the Open Health NLP Consortium Workshop at AMIA 2009

  • Presenters will be
  • Anni Coden and Michael Tanenblatt (IBM, T.J. Watson Research), James Masanz and Guergana Savova (Mayo Clinic)
  • Sunday, November 15, 2009 at 5:30 p.m. in room Yosemite C
  • The focus of this workshop is to describe the various open source components in more detail with an emphasis on how to deploy them in your institution and how to adapt them to different types of clinical reports (e.g. radiology notes, discharge summaries).
  • The open source components address lexical, syntactic and semantic analysis and their adaptation both to the clinical domain and to the type of report. Several dictionary lookup methodologies will be discussed, as well as dictionary development and negation. *An implementation of relationship extraction will be shared.
  • The formal presentation will conclude with a discussion on how to execute an NLP system within an institutional system and how to become an active member of the consortium.
  • The second part will be "hands-on" - so, if you are interested, bring your laptop and install Eclipse and UIMA on it. We will walk through examples of code and complete "pipelines" and address any particular development questions you may encounter.
  • Please view Workshop Announcement for more information
  • There is no cost for the workshop, registration is not required but would be appreciate by e-mailing medNLP@us.ibm or clinicalnlp@mayo.edu.

OHNLP launch

  • The News Release for the launch of the Open Health Natural Language Processing (OHNLP) Consortium is now available.
  • medKAT/P 1.0 released March, 2009. Contributed by IBM.
  • cTAKES 1.0 released March, 2009. Contributed by Mayo Clinic.
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox