![]() |
POSet Ontology Categorizer (POSOC)v. 1.0, updated May, 2005 |
![]() |
The POSet Ontology Categorizer
(POSOC) software
package provides tools for what we call the categorization task
in poset-structured taxonomic ontologies such as the Gene
Ontology (GO): given a collection of possibly weighted nodes of
interest
(e.g. genes, proteins, EC nodes, and/or textual phrases), how can we
analyze
their distribution within the GO? In other words, are they concentrated
in one area, distributed to multiple areas, located "high" or "low" in
the structure, etc.? Given such a query, POSOC returns the rank-ordered
list of GO nodes that best "summarize" or "categorize" that query with
respect to various parameter settings, including:
POSOC (Joslyn, Mniszewski et al. 2004) was developed at the Los Alamos National Laboratory, and has been used for analyzing gene expression experiments, automated annotation of protein function (Verspoor, Cohn et al. 2004b), and the determination of textual evidence for protein annotations (Verspoor, Cohn et al. 2004a). POSOC has required the development of novel computer science and mathematical techniques in applied finite order theory, and these are now part of a broader research program addressing their application to knowledge discovery generally (Joslyn 2004a, 2004b; Joslyn and Bruno 2005). Future applications of POSOC and related methods include rapid response to novel biothreats (Verspoor, Joslyn et al. 2005) and to the management of generalized semantic hierarchies (Verspoor, Joslyn et al. 2003) . POSOC consists of a set of Java interfaces, classes, and programs that run on Linux or Windows platforms, and incorporates graph classes from OpenJGraph. A copy of the November 2004 Gene Ontology is included in the distribution. |
|
![]() A graphical representation of POSOC output for the test query "2C-" (included in the distribution). GO nodes with positive score are listed, with colored groups indicating categories or "clusters", inclusive of ancestors (e.g. the brown cluster headed by "GO:8152 metabolism" includes the red cluster headed by "GO:16070 RNA metabolism"). Nodes are labeled by their score rank, cluster rank (if a cluster head, in italics) and percentage of the query gene list they cover. |
Primary publication on POSOC
LANL submission to BioCreAtive, use of Gene Ontology Categorizer
Towards distance measures in quantified semantic hierarchies.