The University of Florida , the University of Arizona and Rutgers, the State University of New Jersey , have established a national research center for autonomic computing (CAC).

This center is funded by the Industry/University Cooperative Research Center program of the National Science Foundation, CAC members from industry and government, and university matching funds.

Projects at CAC Rutgers

Here you will find descriptions of ongoing projects at the Rutgers CAC.

Autonomic Cloud Bursts on Amazon EC2

Cluster-based data centers have become dominant computing platforms in industry and research for enabling complex and compute intensive applications. However, as scales, operating costs, and energy requirements increase, maximizing efficiency, cost-effectiveness, and utilization of these systems becomes paramount. Furthermore, the complexity, dynamism, and often time critical nature of application workloads makes on-demand scalability, integration of geographically distributed resources, and incorporation of utility computing services extremely critical. Finally, the heterogeneity and dynamics of the system, application, and computing environment require context-aware dynamic scheduling and runtime management.

Read more...

High-Throughput Asynchronous Data Transfers

Large scale applications, such as financial analytics, engineering simulations, process monitoring and control or enterprise system management, which typically run on distributed platforms such as large datacenters, clusters and HPC systems, generate massive amounts of data. This data must be extracted from the system and transported in a timely manner, to remote consumers for online processing, analysis, monitoring, decision-making, etc. However, managing and transporting this data is becoming a significant bottleneck, imposing considerable overheads on the applications and leading to inefficient resource utilization and frequent QoS violations. Advanced interconnect architectures and innovative communication protocols, such as for example, customized high speed interconnection buses, one-sided remote direct memory access with zero-copy and OS and application bypass data transfers, have been introduced to address these challenges. However, these advances also significantly increase the complexity exposed to the applications and applications must be adapted and managed at runtime to effectively use these capabilities.

Read more...

A Computational Engine for Financial Modeling on the Cell Broadband Engine

Computational finance addresses problems such as rick estimation and management, volatility and option prices from a numerical point of view. The importance of the results requires accurate and precise mathematical models and systems, which are usually difficult to solve. An alternative and tractable solution is to use numerical simulation models, e.g., Monte Carlo or recursive binomial algorithms. However, to achieve the desired accuracy, we have to run a large number of simulations that take a significant amount of time. Existing solutions have reduced the execution time by providing parallel implementation for the numerical algorithms and by using clusters or grids of commodity computers. The hardware and maintenance costs, the failure rate of commodity clusters, the heat dissipation and the cooling costs motivate the search for alternative solutions.

Read more...

Robust Clustering Analysis for Self-Monitoring Distributed Systems

The control and timely management of large-scale distributed systems, such as device networks, data centers, and compute clusters are tasks that are rapidly exceeding human ability, given their complexity, dynamics, and large amounts of data involved. Thus, the automated and online management of these systems is essential to ensure their continued performance and robust operation. Fortunately, systems' available in-network resources can be harnessed to perform self-monitoring and data analysis tasks which are crucial for effective management.

Read more...

Autonomic Management of Instrumented Datacenters

Technical advances are leading to a pervasive computational ecosystem that integrates computing infrastructures with embedded sensors and actuators, and giving rise to a new paradigm for monitoring, understanding, and managing natural and engineered systems - one that is information/data-driven and autonomic. The overarching goal of this research is to develop sensor system middleware and programming support that will enable distributed networks of sensors to function, not only as passive measurement devices, but as intelligent data processing instruments, capable of data quality assurance, statistical synthesis and hypotheses testing as they stream data from the physical environment to the computational world.

Read more...

Autonomic Computing Engines on MS-HPCS

Consolidated and virtualized cluster-based computing centers have become dominant computing platforms in industry and research for enabling complex and compute intensive applications. However, as scales, operating costs, and energy requirements increase, maximizing efficiency, cost-effectiveness, and utilization of these systems becomes paramount. Furthermore, the complexity, dynamism, and often time critical nature of application workloads makes on-demand scalability, integration of geographically distributed resources, and incorporation of utility computing services extremely critical. Finally, the heterogeneity and dynamics of the system, application, and computing environment require context-aware dynamic scheduling and runtime management.

Read more...

Autonomic Data Streaming and In-transit Processing

Emerging enterprise/Grid applications consist of complex workflows, which are composed of interacting components/services that are separated in space and time and execute in widely distributed environments. Couplings and interactions between components/services in these application are varied, data intensive and time critical. As a results, high-through, low latency data acquisition, data streaming and in-transit data manipulation is critical.

The goal of this project is to develop and deploy an autonomic data management services that support high throughput, low latency data streaming and in-transit data manipulation. Key attributes/requirements of the service include: (1) support for high-throughput, low-latency data transfers to enable near real-time access to the data, (2) ability to stream data over wide area networks with shared resource and varying loads, and be able to maintain desired QoS, (3) minimal performance overheads on the application, (4) adaptations to address dynamic application, system, and network states, (5) proactive control to prevent loss of data, and (6) effective management of in-transit processing while satisfying the above requirements.

Read more...

Autonomic Computing Engines

Consolidated and virtualized cluster-based computing centers have become dominant computing platforms in industry and research for enabling complex and compute intensive applications. However, as scales, operating costs, and energy requirements increase, maximizing efficiency, cost-effectiveness, and utilization of these systems becomes paramount. Furthermore, the complexity, dynamism, and often time critical nature of application workloads makes on-demand scalability, integration of geographically distributed resources, and incorporation of utility computing services extremely critical. Finally, the heterogeneity and dynamics of the system, application, and computing environment require context-aware dynamic scheduling and runtime management.

Read more...

Sensor-driven Autonomic Management of Transportation Ecosystems

The national transportation ecosystem constitutes a major part of the national investment (estimated at $25 trillion) and is critical for the mobility of our society as well as its economic growth and prosperity. Infrastructure system components such as bridges, highways, tunnels, traffic systems, air- and seaports, dams, etc., are clearly critical assets that should be protected and properly managed. However, unprecedented demands coupled with an aging infrastructure and new classes of threats are together making this ecosystem brittle and vulnerable. The importance and immediate nature of this problem along with the need for a more effective infrastructure has been recently recognized by the Congressional allocation of $84 Millions for bridge monitoring and evaluation in 2007. In spite of such investments, the enormity of the transportation ecosystem and its integration into the basic fabric of every aspect of our lives requires immediate short- and long-term decisions about how to allocate the limited resources for maintaining, safeguarding, and optimizing this national asset.

Read more...

Programming Sensor-driven Autonomic Applications

Technical advances are leading to a pervasive computational ecosystem that integrates computing infrastructures with embedded sensors and actuators, and giving rise to a new paradigm for monitoring, understanding, and managing natural and engineered systems - one that is information/data-driven and autonomic. The overarching goal of this research is to develop sensor system middleware and programming support that will enable distributed networks of sensors to function, not only as passive measurement devices, but as intelligent data processing instruments, capable of data quality assurance, statistical synthesis and hypotheses testing as they stream data from the physical environment to the computational world.

Read more...

Autonomic Reconfigurable Financial Data Processor with Power Management and Fault Tolerance for Grid Computing with Stock Exchange Data

Power management is the most serious issue confronting grid computing. Fault tolerance is becoming an issue, because the hardware is now vulnerable to logic errors due to cosmic rays. The complexity of the reconfigurable hardware and software to meet these objectives requires an autonomic approach in a data center having 30,000+ microprocessors. The goal of this project is to design a reconfigurable, autonomic, low power, fault tolerant custom processor to translate stock exchange trading data into a canonical format for a data center. This processor will be used in the Message Board middleware system at Merrill Lynch at the Distributed Queue level in a large-scale, multiprocessor data center environment. The processor will use standard streaming I/O hardware interfaces to translate stock exchange data packets into the Merrill Lynch canonical format to accelerate posting to the company's data bases by IBM file servers. Merrill Lynch needs this processor to generate canonical format trading data that can be reliably posted in real-time to the company's data bases, because file servers do this too slowly. The trading data is not standardized, and its format keeps changing, so this processor must be reconfigurable. It will be a high-speed hardware interpreter of tables for a lexical analyzer and grammar designed with lex and yacc. So, the processor can be easily reconfigured as stock exchange data changes. This is very likely the first of a new class of reconfigurable processor.

Read more...

Interconnect Serializer-Deserializer Jitter Reduction Hardware for Autonomic Computing

In recent multi-core microprocessor and data center computers, interconnect scaling has emerged as a critical issue. At present, parallel 32 and 64 bit busses are being replaced with serializer-deserializer (SERDES) busses, where one pair of differential wires carries the bus between two chips. The issue the forced this change was the problem of skewing of data transitions on the bus. In going from chip to chip, not all of the data or address lines on the bus would change simultaneously, due to electrical crosstalk, electro-magnetic interference, and variability in transistors within a single chip. The result was data errors in the bus. Instead, with SERDES, we serialize 10 bits of parallel data, and shift them out of the transmitter chip at a much higher clock rate than the normal system clock. The clock, in fact, is embedded in the data. The receiver chip recovers the clock and data from the SERDES, and parallelizes the data while synchronizing it with the receiver chip clock.

Read more...

Flip-Flop Architectures Tolerant to Multiple-bit Upsets from Cosmic Rays in Autonomic Hardware

There is an ongoing problem with hardware transient logic errors caused by cosmic ray hits and by ?-particle hits. As the solar wind from the Sun hits the Earth's magnetic field and atmosphere, the proton stream undergoes 4 to 5 levels of reaction, and is converted into a neutron flux. These neutrons then hit transistors in VLSI circuits, and cause a logic upset if their effective charge transfer is greater than the critical charge (Qcrit). Unfortunately, the only way to screen out the neutron flux is with 5 to 6 feet of cement packaging. The alpha-particle flux comes from radioactive decay of trace elements in the chip package. Due to Moore's law scaling, we went from 90 nm chip features to 45 nm features (this year), and Qcrit decreased by a factor of 4. Baumann has projected that the transient logic errors due to this bombardment in 45 nm technology will be greater than the logic errors in unprotected static RAM. RAM memories had to be protected starting in the 1970's with Error Correcting Codes, and now logic must be protected as well.

Read more...