Description
Metadata
Settings
About:
Background Pathway diagrams are fundamental tools for describing biological processes in all aspects of science, including training, generating hypotheses, describing new knowledge and ultimately as communication tools in published work. Thousands of pathway diagrams are published each year as figures in papers. But as static images the pathway knowledge represented in figures is not accessible to researchers for computational queries and analyses. In this study, we aimed to identify pathway figures published in the past 25 years, to characterize the human gene content in figures by optical character recognition, and to describe their utility as a resource for pathway knowledge. Approach To identify pathway figures representing 25 years of published research, we trained a ma-chine learning service on manually-classified figures and applied it to 235,081 image query results from PubMed Central. Our previously described pipeline 1 was utilized to extract hu-man genes from the pathway figure images. These figures were characterized in terms of their parent papers, human gene content and enriched disease terms. Diverse use cases were explored for this newly accessible pathway resource. Results We identified 64,643 pathway figures published between 1995 and 2019, depicting 1,112,551 instances of human genes (13,464 unique NCBI Genes) in various interactions and contexts. This represents more genes than found in the text of the same papers, as well as genes not found in any pathway database. We developed an interactive web tool to explore the results from the 65k set of figures, and used this tool to explore the history of scientific discovery of the Hippo Signaling pathway. We also defined a filtered set of 32k pathway figures useful for enrichment analysis.
Permalink
an Entity references as follows:
Subject of Sentences In Document
Object of Sentences In Document
Explicit Coreferences
Implicit Coreferences
Graph IRI
Count
http://ns.inria.fr/covid19/graph/entityfishing
11
http://ns.inria.fr/covid19/graph/articles
3
Faceted Search & Find service v1.13.91
Alternative Linked Data Documents:
Sponger
|
ODE
Raw Data in:
CXML
|
CSV
| RDF (
N-Triples
N3/Turtle
JSON
XML
) | OData (
Atom
JSON
) | Microdata (
JSON
HTML
) |
JSON-LD
About
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 Unported License
.
OpenLink Virtuoso
version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Copyright © 2009-2025 OpenLink Software