POSOC logo

POSet Ontology Categorizer (POSOC)

v. 1.0, updated May, 2005

POSOC logo
The POSet Ontology Categorizer (POSOC) software  package provides tools for what we call the categorization task in poset-structured taxonomic ontologies such as the Gene Ontology (GO): given a collection of possibly weighted nodes of interest (e.g. genes, proteins, EC nodes, and/or textual phrases), how can we analyze their distribution within the GO? In other words, are they concentrated in one area, distributed to multiple areas, located "high" or "low" in the structure, etc.? Given such a query, POSOC returns the rank-ordered list of GO nodes that best "summarize" or "categorize" that query with respect to various parameter settings, including:
  • A node or nodes to "focus"on;
  • Selection of a scoring function;
  • Selection of a pseudo-distance measure (a measure of vertical distance between nodes "in a chain", currently including minimum chain length, maximum chain length, average of extreme chain lengths, and average of all chain lengths)
  • Selection of a specificity level (where low specificity results in fewer "general" or "high" set of clusters, while increasing specificity results in more "deeper" or "more specific" clusters).
POSOC cluster results can be compared against desired or known results by calculations of precision, recall, and f-score for graph neighborhood relationships. 

POSOC (Joslyn, Mniszewski et al. 2004) was developed at the Los Alamos National Laboratory, and has been used for analyzing gene expression experiments, automated annotation of protein function (Verspoor, Cohn et al. 2004b), and the determination of textual evidence for protein annotations (Verspoor, Cohn et al. 2004a). POSOC has required the development of novel computer science and mathematical techniques in applied finite order theory, and these are now part of a broader research program addressing their application to knowledge discovery generally (Joslyn 2004a, 2004b; Joslyn and Bruno 2005). Future applications of POSOC and related methods include rapid response to novel biothreats (Verspoor, Joslyn et al. 2005) and to the management of generalized semantic hierarchies (Verspoor, Joslyn  et al. 2003) .

POSOC consists of a set of Java interfaces, classes, and programs that run on Linux or Windows platforms, and incorporates graph classes from OpenJGraph. A copy of the November 2004 Gene Ontology is included in the distribution.



A graphical representation of POSOC output for the test query "2C-" (included in the distribution). GO nodes with positive score are listed, with colored groups indicating categories or "clusters", inclusive of ancestors (e.g. the brown cluster headed by "GO:8152 metabolism" includes the red cluster headed by "GO:16070 RNA metabolism"). Nodes are labeled by their score rank, cluster rank (if a cluster head, in italics) and percentage of the query gene list they cover.


Software


Contacts and Acknowledgements

This work was sponsored by the Department of Energy under contract W-7405-ENG-36 to the University of California, and by a Cooperative Research and Development Agreement (CRADA) with Procter & Gamble Corp.


Publications

Primary Publications

CA Joslyn, SM Mniszewski, A Fulmer, and GG Heaton: (2004) "The Gene Ontology Categorizer", Bioinformatics, v. 20:s1, pp.169-177
Primary publication on POSOC
KM Verspoor, JD Cohn, CA Joslyn, SM Mniszewski, A Rechtsteiner, LM Rocha, and T Simas: (2004a) "Protein Annotation as Term Categorization in the Gene Ontology Using Word Proximity Networks", BMC Bioinformatics, in press
LANL submission to BioCreAtive, use of Gene Ontology Categorizer

Secondary Publications and Presentations

JD Cohn, KM Verspoor, SM Mniszewski, CA Joslyn: (2004) "Predicting Protein Function Using Nearest Neighbor Categorization", Proc. 2nd Annual Rocky Mountain Regional Bioinformatics Conf. (Rocky 04)
CA Joslyn, SM Mniszewski, A Fulmer, and GG Heaton: (2003) "Measures on Ontological Spaces of Biological Function", in: Pacific Symposium on Biocompuating PSB 03, LAUR 02-6864, 03-0043
CA Joslyn, SM Mniszewski, A Fulmer, and GG Heaton: (2003) "Structural Classification in the Gene Ontology", in: Proc. 6th Bio-Ontologies Workshop, Intelligent Systems for Molecular Biology (ISMB 03), LAUR = 03-2988.
CA Joslyn, SM Mniszewski, and KM Verspoor: (2004) "Combinatorial Knowledge Discovery for Bio-Ontology Management", invited lecture at Stanford Medical Informatics, LAUR 04-3935
KM Verspoor, JD Cohn, SM Mniszewski, and CA Joslyn: (2004b) "Nearest Neighbor Categorization for Function Prediction", in: Proc. 5th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP 05), in press

Related Technical Areas

CA Joslyn: (2004a) "Poset Ontologies and Concept Lattices as Semantic Hierarchies", in: Conceptual Structures at Work, Lecture Notes in Artificial Intelligence, v. 3127, ed. Wolff, Pfeiffer and Delugach, pp.287-302, Springer-Verlag, Berlin,
CA Joslyn: (2004b) "Order Theoretical Knowledge Discovery", in: DIMACS Workshop on Applications of Order Theory to Homeland Defense and Computer Security, LAUR 04-6208
CA Joslyn and WJ Bruno: (2005) "Weighted Pseudo-Distances for Categorization in Semantic Hierarchies", to appear in Lecture Notes in AI, 2005 Int. Conference on Conceptual Structures
Towards distance measures in quantified semantic hierarchies.
CA Joslyn and SM Mniszewski: (2004) "Combinatorial Approaches to Bio-Ontology Management with Large Partially Ordered Sets", in: SIAM Workshop on Combinatorial Scientific Computing (CSC 04), LAUR 03-8213
CA Joslyn, JS Oliverira, and C Scherrer: (2004) "Order Theoretical Knowledge Discovery: A White Paper", LAUR 04-5812
KM Verspoor, CA Joslyn, and GJ Papcun: (2003) "Gene Ontology as a Source of Lexical Semantic Knowledge for a Biological Natural Language Processing Application", in: Workshop on Text Analysis and Search for Bioinformatics (SIGIR 03)
KM Verspoor, CA Joslyn, JA Ambrosiano, A Bäcker, O Bodenreider, L Hirschman, P Karp, H Kelly, S Loranger, M Musen, R Sriram, and C Wroe: (2005) "Knowledge Integration for Biothreat Response", Los Alamost Technical Report