There are at least 5 options to consider:
The upside to this is:
- Increased scalability horizontally. If you need higher capacity, you can add another machine or node. You do not need to necessarily buy bigger machines -- you need more machines.
- Proven technology. An OPS has been around for a long time, we have many people using this -- it is well known and understood by many people. There are design considerations you have to be aware of -- but they are well known (which is half the battle). For example, partitioning will play a key role in designing an application with OPS.
- Fairly transparent. As opposed to replication for example -- replication will impact how the application interacts with data. OPS will not. At the application level, the fact you are using OPS is not as relevant as it would be with replication. OPS will impact the DBA but not necessarily the programmer. Replication will impact both.
Some of the downsides are:
- You have to design to work with OPS. You have to understand OPS. You need to be aware of the potential performance bottlenecks. In 8.1.6 (Oracle8i, release 2) there is a special mode of OPS in a 2 node cluster where 100% of the performance issues are a NON-issue. Both nodes are used to read data -- only one at a time to modify. The second node will take over for the first node in the event of a failure.
- Performing upgrades will most likely cause server downtime. Typically (there are exceptions) all nodes would be offline to upgrade. The time to perform the upgrade will vary but may be as long as 1 hour. These are upgrades to the Oracle software -- not to the OS.
- DBA must be fully aware of an OPS environment. Demands a DBA who
knows OPS, not just an Oracle DBA.
Replication is asynchronous in nature - you have to assume that at any point in time, the various databases can and will contain different data as they are all at different points in time. This fact can make application development somewhat harder. Let's say the site you are building is a site to sell stuff. You have an inventory. You have X widgets in inventory. At 2 different sites, people simultaneously order X widgets. As far as your software is concerned, both orders can be filled. It is only AFTER the transactions at both sites have been performed that you discover in the background that you now have -X widgets in inventory (you had an order for 2X widgets). This is known as an update conflict and it is up to you to 'fix it'. Oracle discovers it for you but will then invoke user-defined code to fix the problem. This process must be part of your application design.
You can decide to avoid conflicts by partitioning data (e.g.: West Coast data and East Coast data, you need to go to the west coast server to affect west coast data and vice versa). Many people do this however if you fail over from the west coast to the east coast -- you must remember that there probably are transactions queued in the failed west coast machine that will be propagated to the east coast when you fix the problem. You still have the potential for update conflicts when using replication for failover -- you still need to design for how you will handle these issues.
To answer your replication questions directly:
Replication vs. Parallel Server - which one would you recommend and for what reasons? What are the pros and cons of each (keep in mind scalability, redundancy, availability and backup issues).
If failover and availability are the key issues -- OPS wins out in my opinion.
If redundancy is the main issue -- replication is the answer (as OPS is not
a redundant solution -- there is only one database).
They are equal in the backup area (same issues, same considerations). Replication
may be better for uptime and backup purposes, but how is response?
Both replication and OPS have certain performance implications. Both will have to be designed for in many respects. With OPS and a 2-node configuration -- we can eliminate the performance considerations. With OPS and a single write node and many (more then 2) read nodes, we can eliminate most of the performance considerations. With replication, a client transaction will generate about 2x the IO for any given WRITE transaction and that will not affect read transactions at all.
In short -- both will have issues. Replication - Long distance (for disaster recovery purposes) - how would the replication work? Would you be able to update both DB.s, then the changes get written to both of them (during the update/replicate process) or is only one database updated then the changes from that database get written to the second/replicated database?
Replication works in a store and forward fashion. The client updates a row in db1, db1 will not only update that row but it will queue a transaction (locally). After the client commits, db1 will pick up the transaction from the queue and send it to db2, db3, ... and so on. You can update the same row in any number of databases -- but you have to be able to handle the resulting update conflicts this causes -- Oracle detect them, you fix them. OPS would allow at most 1 person at a time to modify a single row (just like a single database would since OPS is a single database).
Some of the upsides of replication are:
- Replication works over long distances and WANS.
- Replication is tolerant of short network outages between database servers (short is relative here and is a function of the amount of changes performed in a given period of time. You do not want the servers to get too far out of sync. It could affect the application as well if the data diverges too much).
- Replication imposes a 'fixed' overhead (2X the data modified for inserts/deletes, 3X for updates (before and after images are stored in the queue).
- Replication provides you with a copy of the data elsewhere. With
OPS -- there is one database. An earthquake can take out the only copy.
Some of the downsides are:
- Conflict resolution and its impact must be carefully studied and designed
for.
- You need a DBA familiar with this technology.
- Your developers need to be familiar with the impact of this technology.
(NOTE: you can use OPS and replication together. OPS for rapid failover,
replication for a hot spare. Something else to consider).
Some of the upsides of standby databases are:
- They are extremely easy to setup.
- Used for disaster and/or failure tolerance
- They have no performance impact on the master machine.
- Real-time synchronization of the standby database
- Standby database may be used for read-only transactions (8i only)
- They impose no design considerations on the application.
Some of the downsides are:
- The DBA must be aware of the standby database. If you make structural
changes to the master, they must be coordinate to the standby. This is minimal.
- Once you failover to the standby -- there is *no* failback. The standby
will become the master. You will build a new standby to cover this new master.
- Only one standby database is supported
- Must have sufficient network bandwidth between master and standby databases
to transfer the archived log files.
- Separate license is required
NOTE = Oracle 8I send the archived log files automatically to the destination, also automatically applies the archived redo logs to the standby DB as they arrive.
Yahoo on the other hand has lots of little machines with lots of little databases. You want stocks -- you go to quote.yahoo.com. You want some business news -- you goto that server. Tech news - else where. You want a random site, that is random.yahoo.com. And so on. Yahoo is a whole bunch of databases working together. If part of yahoo is down -- hardly anyone notices since each server is just doing a little part of the whole. Yahoo never goes down but at any point in time, part of it might be. This requires lots more design and a really good / redundant directory service (to find stuff) but is doable.
Whether this can work for you depend on the nature of your site.
For more information, you can read this article: Implementing
an Automated Standby Database
Replication | Standby Database | Geographic Mirroring | Geo-mirrored Standby | |
Overhead on Production System | High, limit to max transactions per second | Low | Medium | Low to medium |
Potential Data and Transaction Loss | Yes (transactions that have not been replicated to remote site when disaster occurs) |
Yes (transactions in active logs of production site that have not been transferred to standby site) |
No | No |
Concurrent Use of Failover System | Yes | Read-only Reporting Only | No (except other applications until failover initiated) |
Read-only reporting only |
Fall Back or Recovery Technique | Instance and media recovery with conflict resolution or rebuild | Re-clone the database | Storage re-synchronization | Storage re-synchronization or re-cloan the database |
Separation Distance | Thousands of miles | Thousands of miles | Large performance impact for distances over 66 km | hundres of miles |
Application support | Application must be replication aware | any application should work | any application should work | any application should work |
System Environment | heterogeneous mix of systems vendors and Oracle versions | Same systems vendor and Oracle version | Same systems vendor and Oracle version | Same systems vendor and Oracle version |
Cost | Medium | Low | High | Medium to High |
Complexity | High | Medium | Low | Medium |