Traces of filesystem activity play a crucial role in analyzing system behavior, enabling researchers to gain valuable insights into various aspects of file management. However, obtaining accurate and comprehensive traces can be challenging, often requiring modifications to the filesystems being monitored and incurring runtime overhead. Consequently, there is a scarcity of filesystem traces, limiting the availability of sample workloads for researchers.

In this article, we present a portable toolkit that addresses these challenges by deriving approximate traces of NFS (Network File System) activity through passive network monitoring. Our toolkit utilizes a promiscuous Ethernet listener interface to capture and reconstruct NFS-related RPC (Remote Procedure Call) packets, providing detailed traces of NFS activity and corresponding client system calls. This powerful tool is currently in use at Princeton University and other sites, and it is readily accessible through anonymous ftp.

Motivation

Filesystem traces are an integral part of analyzing computer system behavior, serving as a foundation for studying program performance, memory access patterns, and filesystem activity. However, capturing meaningful filesystem traces can be particularly demanding. Filesystem behavior often extends over long periods, necessitating the collection of extensive traces that may span weeks or even months. Modifying the filesystem itself to gather trace data is a complex task that can introduce undesirable runtime overhead. Additionally, the challenges are amplified in distributed filesystems, especially when the network comprises heterogeneous machines. Consequently, the number of conducted traces for Unix filesystem workloads remains relatively small, primarily limited to computing research environments.

Given that distributed filesystems transmit their activity over a network, it seems reasonable to leverage network monitoring to obtain traces of such systems. Ethernet-based networks are well-suited for this purpose since traffic is broadcast to all connected machines within a subnetwork. Numerous general-purpose network monitoring tools exist that listen “promiscuously” to the Ethernet they are connected to. However, the information provided by these tools is often insufficient for constructing meaningful filesystem traces, as filesystem operations can span multiple packets and rely on the context of previous operations.

While previous studies have characterized the impact of NFS traffic on network load, their focus has been primarily on understanding traffic patterns and developing statistical models of individual packet sources, destinations, and types. In contrast, our toolkit aims to collect traces of NFS file access activity by monitoring Ethernet traffic, providing a higher-level analysis of file access traffic patterns. By employing a “spy” machine with a promiscuous Ethernet interface connected to the same network as the file server, our toolkit can analyze each NFS-related packet and generate detailed traces at an appropriate level of abstraction. This allows researchers to gain deeper insights into filesystem behavior and facilitate trace-driven simulation of filesystem algorithms.

The rpcspy Program

The rpcspy program serves as the interface to the system-dependent Ethernet monitoring facility. Its primary function is to monitor network traffic, extract packets containing NFS data, and present the data in a user-friendly format. By maintaining a table of pending call packets, rpcspy ensures that a complete RPC transaction, consisting of a call and a reply, is emitted as a single record. The output format of rpcspy includes information such as timestamps, execution times, server and client names, RPC command names, and command-specific arguments and return data.

One of the key advantages of rpcspy is its configurability. Users can customize which hosts and RPC commands are traced, specify call and reply fields to be printed, tap into specific Ethernet interfaces, and set timeouts and runtime durations. While the primary purpose of rpcspy is to provide input for nfstrace, it can also be used as a simple NFS diagnostic and performance monitoring tool, offering valuable insights into current NFS activity and facilitating the identification of potential issues.

nfstrace: The Filesystem Tracing Package

While rpcspy provides traces of low-level NFS commands, it alone does not capture user-level activity, limiting its usefulness for comprehensive filesystem traces. To bridge this gap, we introduce nfstrace, a filter for rpcspy that produces a log of user-level filesystem commands that are likely to have triggered the monitored activity. nfstrace generates records each time a file is opened, providing a summary of the events. This summary includes timestamps, command times, read or write directions, file IDs, client information, transferred bytes, and file sizes.

It’s important to note that nfstrace produces an approximation of the underlying user activity since NFS does not directly expose open and close commands. Instead, nfstrace employs heuristics to infer the occurrence of these system calls. For example, it assumes that any sequence of NFS read calls on the same file issued by the same client represents a single read open, and the close is assumed to occur when the last read in the sequence completes. Similar rules apply to write operations. While these approximations may not provide perfect accuracy, extensive testing has demonstrated their reliability and effectiveness in capturing the essence of user-level filesystem activity.

Additionally, nfstrace offers the option to map file handles to file names and modes, further enhancing the trace’s comprehensiveness. This mapping information, although not always complete, provides valuable context by linking file handles to their corresponding names and associated modes. By considering file handle changes over time and applying simple heuristics, nfstrace intelligently tracks file name mappings and ensures the accuracy and reliability of the trace data.

Using rpcspy and nfstrace for Filesystem Tracing

The passive-monitoring approach offered by rpcspy and nfstrace presents a compelling alternative to more complex and intrusive methods of obtaining filesystem traces. Unlike approaches that involve kernel modifications or specialized hardware, our toolkit requires no changes to client and server kernels, making it highly accessible and easily deployable across heterogeneous environments. With just under 5000 lines of code, written by a single programmer over a few weeks, rpcspy and nfstrace provide a lightweight and efficient solution for capturing filesystem traces.

Ensuring the accuracy and reliability of the traces heavily depends on the performance of the machine running rpcspy and nfstrace. It is crucial to verify that the machine can handle the network traffic and prevent packet buffer overruns that might result in data loss. Additionally, it’s essential to validate the non-intrusive nature of the trace, particularly when dealing with sensitive data or privacy concerns. Users should be informed about the nature and purpose of the trace, and measures should be taken to protect individual privacy, such as disabling name translation or employing encryption techniques when sharing trace data.

Conclusion

In conclusion, the NFS tracing toolkit consisting of rpcspy and nfstrace offers a powerful and accessible solution for capturing and analyzing filesystem activity through passive network monitoring. By leveraging promiscuous Ethernet listening and decoding NFS-related RPC packets, the toolkit enables researchers to obtain comprehensive and meaningful traces of NFS file access. The flexibility and configurability of the toolkit make it applicable to various computing environments, and its simplicity and efficiency ensure wide adoption. With the ability to derive insights into system behavior and simulate filesystem algorithms, the NFS tracing toolkit opens up new possibilities for filesystem research and analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *