It's an interesting question to try to define IO. Technically, IO is probably anything that communicates with something off-chip, so accessing main memory can be considered IO, although I don't plan to treat it that way. My current plan is to focus on disk and network issues, but even there things are beginning to get fuzzy, because solid state drives use a traditional disk API, but don't have anything akin to rotational latency. At some point I may decide to create a chapter focusing on storage (cache, memory, and "disks," regardless of technology) and another on network IO, but for now, the traditional view that IO largely means disk and network access seems reasonable to me. I welcome comments on the best way to organize these issues.
So what are the topics relevant to the creation of sizzling disk and network IO? Here are the main ones on my list:
- Prefetching, buffering, and caching (by hardware, drivers, OSes, language runtime systems, and applications)
- Asynchronous and concurrent reads/writes (including disk striping)
- Memory-mapping files
- Avoiding disk fragmentation
- Network protocol choices (e.g., TCP vs UDP)
- Doing IO on deltas instead of full data sets
- Reducing network distances (e.g., via CDNs)
- Data compression and "bundling" (e.g., CSS sprites)
- Latency vs. Bandwidth
No comments:
Post a Comment