Workshop on Hardware Performance Monitor Design

and Functionality

2/25/2005 - Slides and more info at http://lacsi.rice.edu/workshops/hpca11



The purpose of this workshop is to improve the design of performance counter hardware in microprocessors and strengthen kernel and software support for accessing the counters. The workshop will result in a summary white paper that outlines the major issues and then summarizes the recommendations of the participants. This document is intended to be used as a reference for both the designers of future processors as well as operating system engineers.

Organized by:

Olaf Lubeck (Los Alamos National Labs), olubeck@lanl.gov, Phil Mucci (University of Tennessee), mucci@cs.utk.edu,

Mike Lang (Los Alamos National Labs), mlang@lanl.gov, Rob Fowler (Rice University), rlf@cs.rice.edu

Format:

Selected invited talks and 15 minute “Position Presentations” followed by focused discussions.


Call for Participation:

Please send an abstract for you position presentation to mlang@lanl.gov by January 3rd.


Program (available 1/24/05)


Audience:

HPC Tool Developers, Architects, and Performance Analysts


Important Dates:

Position Abstracts due: January 3, 2005
Author Notifications: January 10, 2005
Workshop Date: Saturday, Feb 13, 2005


Contact:

Mike Lang, mlang@lanl.gov, 505-665-5756


Abstract:

All microprocessors contain the ability to monitor the performance of code through the use of detailed on-chip event counters. This hardware is used in three different modes for performance analysis: aggregate start/stop for code regions, trace record generation upon execution of a code region and statistical methods based on the context of the processor upon a hardware counter event.


Vendor specific software interfaces such as Apple's CHUD and portable interfaces such as PAPI from the University of Tennessee have been developed and are empowering application engineers and tool developers convenient and robust access to the functionality that the performance counting hardware provides. Furthermore, production quality tools like TAU, HPCToolkit and PSRUN have been developed that use these interfaces to present the performance counter data in a meaningful and useful manner. Both single and multi-processor performance models based on the extrapolation of this data are being developed, allowing better understanding of the performance of the application under a variety of different hardware and software environments. Given the prevalence of this hardware, it will not be long before feedback directed compilation techniques can be applied using the above technology.


However, there are some major problems are impeding progress in this area. First and foremost is the lack of some common functionality across the various microprocessors. While impossible to completely standardize the semantics of every event, the fundamentals of cache-based RISC/CISC architectures provide the opportunity to agree upon a subset of meaningful events on every platform. The data from these microprocessors lack a model that would allow adequate coherent performance accounting. This is continuously compounded by a lack of validation and adequate documentation of the hardware. Additionally, the requisite software infrastructure in the operating system is often not present.


This workshop invites chip designers/architects, tool developers, performance analysis, and model developers to come and address these problems. The one-day workshop envisions a series of focused presentations followed by guided discussion sessions that will be captured by workshop recorders. The tangible output from the workshop will be a summary white paper that outlines the major issues and then summarizes the recommendations of the participants. This document is intended to be used as a reference for both the designers of future processors as well as operating system engineers.