RAID Configuration over NT Servers

RAID Configuration over NT Servers There is significant confusion in many organizations using ORACLE on NT over when and what RAID technologies should be used. This article will attempt to make this information very clear and understandable.

Types of RAID available with NT-compatible hardware
Vendors typically offer the following choices for RAID configurations:

RAID0
Disk stripping. RAID0 requires at least two physical disks. Data is read and written across multiple drives, so disk I/O is relatively evenly spread. Writes can occur in a block or streaming manner (similar to non-RAIDed disks) as requested by the operating system. Disk failure results in lost data. Compared to a single disk drive, RAID0 has the following attributes:
- Better read performance
- Better write performance
- Inexpensive in cost
- Not fault-tolerant
- Storage equivalent to sum of physical drive storage in the array
- Readily available from most vendors

RAID1
Disk mirroring. RAID1 requires two physical disks. Logical writes are done by physically writing the data to both disks simultaneously, and can typically be done in a block manner or streaming manner, as requested by the operating system. Reads can be done using either disk. In the event of a disk failure, data can still be retrieved and written to the surviving disk. Compared to a single disk drive, RAID1 has the following attributes:
- Better read performance
- Similar write performance
- Expensive
- Fault-tolerant
- Storage equivalent to 1/2 the sum of the physical drive storage in the mirrored set.
- Readily available from most vendors

RAID5
Disk stripping with parity. RAID5 requires at least three physical disks. On a logical write, a block of data is physically written to disk, parity information is calculated using the block just written plus blocks already existing on disk, then the parity information is written to disk. In RAID5, the parity information is rotated among the physical disks to prevent bottlenecks caused by a dedicated parity disk. Note that writes occur in a block manner regardless of whether the O/S is sending a stream of data to be written or requests to write whole blocks. On a logical read, data is read from multiple disks in a manner very similar to RAID0. In the event of a disk failure, data can be reconstructed on the fly using the parity information. Compared to a single disk drive, RAID5 has the following attributes:
- Data is stripped across multiple physical disks and parity data is stripped across storage equivalent to one disk.
- Better read performance
- Poorer write performance
- Inexpensive
- Fault-tolerant
- Storage equivalent to N - 1 times the number of physical drives in the array.
- Readily available from most vendors

RAID10 (or RAID0+1)
Mirrored stripe sets. RAID10 requires at least 4 physical drives, and combines the performance gains of RAID0 with the fault-tolerance and expense of RAID1. Data is written simultaneously to two mirrored sets of striped disks in blocks or streams. Reads can be performed against either striped set. In the event of a failure of a disk drive in one striped set, data can be written to and read from the surviving striped set. Compared to a single disk drive, RAID10 has the following attributes:
- Better read performance
- Better write performance
- Expensive
- Fault-tolerant
- Storage is 1/2 of the sum of the physical drives' storage
- Currently available from only a few vendors (at the time of this writing)

Possible configurations using 4 physical disks:

Configuration Number of disks Available space Max Reads/Sec Max Writes/Sec

Single disk 1 4 GB 60 60

RAID0 4 16 GB 240 240

RAID1 4 8 GB 240 (2 arrays) 120 (2 arrays)

RAID5 4 12 GB 180 60

Possible configurations using 6 physical disks:

Configuration Number of disks Available space Max Reads/Sec Max Writes/Sec

Single disk 1 4 GB 60 60

RAID0 6 24 GB 360 360

RAID1 6 12 GB 360 (3 arrays) 180 (3 arrays)

RAID5 6 20 GB 300 90

As can be seen from the charts, RAID0 offers good read and write performance, but no fault tolerance. RAID1 offers good read performance, and half as much write performance, but provides fault-tolerance. RAID5 reclaims most of the space lost to RAID1, provides fault-tolerance, offers reasonably good read performance, but poor write performance. (In fact, RAID5 requires 4 disks to regain the same write performance as a single disk). Also, note that streaming logical writes, as well as block-level logical writes, to RAID5 arrays are handled as block-level physical writes. Finally, read or write workload capacity can be increased in any RAID configuration by adding physical disks.

ORACLE database files on RAID
Given the information regarding the advantages and disadvantages of various RAID configurations, how does this information apply to an ORACLE instance? The discussion below will provide information about how database files are used by an ORACLE instance under OLTP and DSS classifications of workload. In general, it is the author's opinion that fault-tolerant disk configurations should be used to improve availability unless performance requirements and budgetary constraints entirely prohibit it's use. Note that the perspectives presented below are very sensitive to the number of users: if your organization has a 10-20 user OLTP system (and thus, a low throughput requirement), then you may get very acceptable performance with all database files stored on RAID5 arrays. On the other hand, if your organization has a 100 user OLTP system (resulting in a higher throughput requirement), then a different RAID configuration may be absolutely necessary. An initial configuration can be outlined by estimating the number of transactions (based on the number of users), performing adjustments to encompass additional activity (such as hot backups, nightly batch jobs, etc.), then performing the necessary mathematical calculations. It is strongly suggested to compare this information to published benchmarks for validation. (Please see the article in this section on sizing.)

OLTP (On-line transaction processing) workloads
Characterized by multi-user concurrent INSERTS, UPDATES, and DELETES during normal working hours, plus possibly some mixture of batch jobs nightly. Large SELECTS may generate reports, but the reports will typically be "canned" reports rather than ad-hok queries. The focus, though, is on enabling update activity that completes within an acceptable response time. Ideally, each type of database file would be spread out over it's own private disk subsystem, although grouping certain types of files together (when the number of disks, arrays, and controllers is less than ideal) may yield adequate performance. (Please see the article on Instance tuning for information regarding groupings of database files in an OLTP system.)

Redo logs.
During update activity, redo logs are written to in a continuous and sequential manner, and are not read under normal circumstances. RAID5 would be the worst choice for performance. Oracle Corporation recommends placing redo logs on single non-RAIDed disk drives, under the assumption that this configuration provides the best overall performance for simple sequential writes. Redo logs should always be multiplexed at the ORACLE software level, so RAID1 provides few additional benefits. Since non-RAID and RAID0 configurations can vary with hardware from different vendors, the organization should contact their hardware vendor to determine whether non-RAIDed disks or RAID0 arrays will yield the best performance for continuous sequential writes. Note that even if redo logs are placed on RAID1 arrays that the redo logs should still be mirrored at the ORACLE level. When the log writer process determines that it does not know whether the contents of a particular redo log are valid, it will mark that redo log as "STALE" in the V$LOG table. If this redo log is the only copy, then it cannot be archived, and will cause a database halt (assuming the database is running in archivelog mode). If redo logs are multiplexed as recommended by Oracle Corporation, then the archiver process will choose a copy of the redo log that is not marked as "STALE", thus generating no interruptions. If the redo logs are mirrored only at the hardware level, then both copies of the redo log are "STALE".

Archive logs
As redo logs are filled, archive logs are written to disk one whole file at a time (assuming, of course, that the database is running in archivelog mode), and are not read under normal circumstances. Any RAID or non-RAID configuration could be used, depending upon the performance requirements and size of the redo logs. For instance, if the redo logs are large, then they will become full and be archived less often. If an archive log is likely to be written no more than once per minute, then RAID5 may provide acceptable performance. If RAID5 proves too slow, then a different RAID configuration can be chosen, or the redo logs can simply be made larger. Numerous early ORACLE installations wrote redo logs directly to tape rather than disk, so reasonable sizing of redo logs can unquestionably minimize the write requirements enough to make RAID5 performance acceptable in small to medium volume installations. Note that a fault-tolerant configuration is advisable: if the archive log destination becomes unavailable, the database will halt.

Rollback Segments
As modifications are made to the database tables, undo information is written to the buffer cache in memory. These rollback segments are used to to maintain commitment control and read consistency. Rollback segment data is periodically flushed to disk by checkpoints. Consequently, the changes to the rollback segments are also recorded in the redo logs. However, a smaller amount of information is typically written to the rollback segments than to the redo logs, so the write rate is less stringent. A fault-tolerant configuration is advisable, since the database cannot operate without rollback segments, and recovery of common rollback segments will typically require an instance shutdown. If the transaction rate is reasonably small, RAID5 may provide adequate performance. If it does not, then RAID1 (or RAID10) should be considered.

User tables and indexes
As updates are performed, these changes are stored in memory. Periodically, a checkpoint will flush the changes to disk. Checkpoints occur under two normal circumstances: a redo log switch occurred, or the time interval for a checkpoint expired. (There are a variety of other situations that trigger a checkpoint. Please check the ORACLE documentation for more detail.) Like redo log switches and generation of archive logs, checkpoints can normally be configured so that they occur approximately once per minute. Recovery can be performed up to the most recent checkpoint, so the interval should not be too large for an OLTP system. If the volume of updated data written to disk at each checkpoint is reasonably small (ie. the transaction rate is not extremely large), then RAID5 may provide acceptable performance. Additionally, analysis should be performed to determine the ratio of reads to writes. Recalling that RAID5 offers reasonably good read performance, if the percentage of reads is much larger than the percentage of writes (for instance, 80% to 20%), then RAID5 may offer acceptable performance for small, medium, and even some large installations. A fault-tolerant configuration is preferable to maximize availability (assuming availability is an objective of the organization), although only failures damaging datafiles for the SYSTEM tablespace (and active rollback segments) require the instance to be shutdown. Disk failures damaging datafiles for non-SYSTEM tablespaces can be recovered with the instance on-line, meaning that only the applications using data in tablespaces impacted by the failure will be unavailable. With this in mind, RAID0 could be considered if RAID5 does not provide the necessary performance. If high availability and high performance on a medium to large system are explicit requirements, then RAID1 or RAID10 should be considered.

Temp segments
Sorts too large to be performed in memory are performed on disk. Sort data is written to disk in a block-oriented. Sorts do not normally occur with INSERT/UPDATE/DELETE activity. Rather, SELECTS with ORDER BY or GROUP BY clauses and aggregate functions (ie. operational reports) , index rebuilds, etc., will use TEMP segments only if the sort is too large to perform in memory. Temp segments are good candidates for non-RAIDed drives or RAID0 arrays. Fault-tolerance is not critical: if a drive failure occurs and datafiles for a temp segment are lost, then the temp segment can either be recovered in the normal means (restore from tape and perform a tablespace recovery), or the temp segment can simply be dropped and re-created since there is no permanent data stored in the temp segment. Note that while a temp segment is unavailable, certain reports or index creations may not execute without errors, but update activity will typically not be impacted. With this in mind, RAID1 arrays are a bit unnecessary for temp segments, and should be used for more critical database files. RAID5 will provide adequate performance if the sort area hit ratios are such that very few sorts are performed on disk rather than in memory.

Control files
Control files are critical to the instance operation, as they contain the structural information for the database. Control files are updated periodically (at a checkpoint and at structural changes), but the data written to the control files is a very small quantity compared to other database files. Control files, like redo logs, should be multiplexed at the ORACLE software level onto different drives or arrays. Non-RAIDed drives or or any RAID configuration would be acceptable for control files, although most organizations will typically distribute the multiple copies of the control files with the other database files, given that the read and write requirements are so minimal. For control files, maintaining multiple copies in different locations should be favored over any other concern.

Software and static files
The ORACLE software, configuration files, etc. are very good candidates for RAID5 arrays. This information is not constantly updated, so the RAID5 write penalty is of little concern. Fault-tolerance is advisable: if the database software (or O/S software) becomes unavailable due to a disk failure, then the database instance will abort. Also, recovery will include restore or re-installation of ORACLE software (and possible operating system software) as well as restore and recovery of the database files. RAID5 provides the necessary fault-tolerance to prevent this all-inclusive recovery, and good read performance for dynamic loading and unloading of executable components at the operating system level.

DSS (Decision Support System) workloads
In comparison to OLTP systems, DSS or data warehousing systems are characterized by primarily SELECT activity during normal working hours, and batch INSERT, UPDATE, and DELETE activity run on a periodic basis (nightly, weekly, or monthly). There will typically be a large amount of variability in the number of rows accessed by any particular SELECT, and the queries will tend to be of a more ad-hock nature. The number of users will typically be smaller than their ajoining OLTP systems (where the data originates). The focus is on enabling SELECT activity that completes within an acceptable response time, while insuring that the batch update activity still has capacity to complete in it's allowable time window. Note now that there are two areas of performance over which to be concerned: periodic refreshes and ad-hock read activity. The general level directive in this case should be to configure the database such that read-only performed by end users is as good as it can get without rendering refreshes incapable of completion. As with OLTP systems, each type of database file would ideally have it's own private disk subsystem (disks, arrays, and controller channel), but with less than ideal resources certain grouping tend to work well for DSS systems. (Please see the article on Instance tuning for information on these groupings.)

Redo logs
Redo logs are only written to while update activity is occurring. In a DSS-oriented system, a significant portion of data entered interactively during the day may loaded into the DSS database during only a few hours. Given this characteristic, redo logging may tend to be more of a bottleneck on periodic refresh processes of a DSS database than on it's ajoining OLTP systems. If nightly loads are taking longer than their allowance, then redo logging should be the first place to look. The same RAID/non-RAID suggestions that apply to redo logging in OLTP also apply with DSS systems. As with OLTP systems, redo logs should always be mirrored at the ORACLE software level, even if they are stored on fault-tolerant disk arrays.

Archive logs
Like redo logging, archive logs are only written out during update activity. If the archive log destination appears to be over-loaded with I/O requests, then consider changing the RAID configuration, or simply increase the size of the redo logs. Since there is a large volume of data being entered in a short period of time, it may be very reasonable to make the redo logs for the DSS or data warehouse much larger (10 or more times) than the redo logs used by the OLTP system. A reasonable rule of thumb is to target about one log switch per hour. With this objective met, then the disk configuration and fault-tolerance can be chosen based on the same rules used for OLTP systems.

Rollback Segments
Again like redo logging, rollback segments will be highly utilized during the periodic refreshes, and virtually unused during the normal work hours. Use the same logic for determining RAID or non-RAID configurations on the DSS database that would be used for the OLTP systems.

User tables and indexes
Writes are done to tablespaces containing data and indexes during periodic refreshes, but during the normal work hours read activity on the table and indexes will typically far exceed the update work performed on a refresh. A fault-tolerant RAID configuration is suggested to sustain availability. However, in most cases the business can still operate if the DSS system is unavailable for several hours due to a disk failure. Information for strategic decisions may not be available, but orders can still be entered. If the DSS has high availability requirements, select a fault-tolerant disk configuration. If RAID5 arrays can sustain the periodic refresh updates, then it is typically a reasonably good choice due to it's good read performance. As seen above, the read and write workload capacities can be adjusted by adding physical drives to the array.

Temp segments
In a decision support system or data warehouse, expect temp segment usage to be much greater than what would be found in a transaction system. Recalling that temp segments do not store any permanent data and are not absolutely necessary for recovery, RAID0 may be a good choice. Keep in mind, though, that the loss of a large temp segment due to drive failure may render the DSS unusable (unable to perform sorts to answer large queries) until the failed drives are replaced. If availability requirements are high, then a fault-tolerant solution should be selected, or at least considered. If the percentage of sorts on disk is low, then RAID5 may offer acceptable performance; if this percentage is high, RAID1 or RAID10 may be required.

Control files
As with OLTP systems, control files should always be mirrored at the ORACLE software level regardless of any fault-tolerant disk configurations. Since reads and writes to these files are minimal, any disk configuration should be acceptable. Most organizations will typically disperse control files onto different disk arrays and controller cards, along with other database files.

Software and static files
Like OLTP systems, these files should be placed on fault-tolerant disk configurations. Since very little write activity is present, these are again good candidates for RAID5.

Taking the above information into consideration, can an organization run an entire ORACLE database instance on a single RAID5 array? The answer is "yes". Will the organization get a good level of fault-tolerance? Again, the answer is "yes". Will the organization get acceptable performance? The answer is "it depends". This dependency includes the type of workload, the number of users, the throughput requirements, and a whole host of other variables. If the organization has an extremely limited budget, then it can always start with a single RAID5 array, perform the necessary analysis to see where improvement is needed, and proceed to correct the deficiencies.

Configuration	Number of disks	Available space	Max Reads/Sec	Max Writes/Sec
Single disk	1	4 GB	60	60
RAID0	4	16 GB	240	240
RAID1	4	8 GB	240 (2 arrays)	120 (2 arrays)
RAID5	4	12 GB	180	60