The task of retrieving the top or bottom N rows from a database table (by salary, sales amount, credit, etc.) is often referred to as a "top-N query." This task is fairly common in application development. The most straightforward, but inefficient, way of accomplishing such a query is by retrieving all rows from the database table(s), sorting them by specified criteria, scanning from the top, and then selecting the top N rows. This is definitely not an elegant solution.
PROBLEM
Writing a procedure that retrieves and sorts all rows from a database
table requires lots of code, and it probably won't execute as quickly as
you'd expect
SOLUTION
Use a single SQL statement to perform a top-N query. You can do so
either by using:
1 - The ROWNUM pseudocolumn available
in several versions of Oracle
2 - By utilizing
new analytic functions available in Oracle 8i: RANK() and DENSE_RANK().
In this example, I am going to use table EMP in the queries examples of this article. Here is the table's structure:
Name Null?
Type
----------- -------- ------------
EMPNO NOT NULL
NUMBER(4)
ENAME
VARCHAR2(10)
JOB
VARCHAR2(9)
MGR
NUMBER(4)
HIREDATE
DATE
SAL
NUMBER(7,2)
COMM
NUMBER(7,2)
DEPTNO
NUMBER(2)
Let's start out by updating employee James (Empno = 7900) and setting
his salary to NULL:
UPDATE Emp
SET Sal = NULL
WHERE Empno = 7900;
COMMIT;
Now we'll look at the data in Emp table with 14 rows:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal FROM Emp;
EMPNO ENAME JOB
MGR HIREDATE SAL
------ ---------- --------- ---------- --------- ----------
7369 SMITH CLERK
7902 17-DEC-80 800
7499 ALLEN SALESMAN
7698 20-FEB-81 1600
7521 WARD SALESMAN
7698 22-FEB-81 1250
7566 JONES MANAGER
7839 02-APR-81 2975
7654 MARTIN SALESMAN
7698 28-SEP-81 1250
7698 BLAKE MANAGER
7839 01-MAY-81 2850
7782 CLARK MANAGER
7839 09-JUN-81 2450
7788 SCOTT ANALYST
7566 19-APR-87 3000
7839 KING PRESIDENT
17-NOV-81 5000
7844 TURNER SALESMAN
7698 08-SEP-81 1500
7876 ADAMS CLERK
7788 23-MAY-87 1100
7900 JAMES CLERK
7698 03-DEC-81
7902 FORD ANALYST
7566 03-DEC-81 3000
7934 MILLER CLERK
7782 23-JAN-82 1300
Using the ROWNUM Pseudocolumn
One way to solve this problem is by using the Oracle pseudocolumn ROWNUM.
For each row returned by a query, the ROWNUM pseudocolumn returns a number
indicating the order in which Oracle selects the row from a table or set
of joined rows. The first row selected has a ROWNUM of 1, the second has
2, and so on. You can use the ROWNUM pseudocolumn to limit the number of
rows returned by a query to 5:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM Emp
WHERE ROWNUM < 6;
EMPNO ENAME JOB
MGR HIREDATE SAL
------ ---------- --------- ---------- --------- ----------
7369 SMITH CLERK
7902 17-DEC-80 800
7499 ALLEN SALESMAN
7698 20-FEB-81 1600
7521 WARD SALESMAN
7698 22-FEB-81 1250
7566 JONES MANAGER
7839 02-APR-81 2975
7654 MARTIN SALESMAN
7698 28-SEP-81 1250
Listing 2. ROWNUM pseudocolumn limits the number of rows returned
by the query to five.
If an ORDER BY clause follows ROWNUM in the same query, the rows will
be reordered by the ORDER BY clause.
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM Emp
WHERE ROWNUM < 6
ORDER BY Sal;
Listing 3 shows the output of the above code.
EMPNO ENAME JOB
MGR HIREDATE SAL
----- ---------- --------- ---------- --------- ----------
7369 SMITH CLERK
7902 17-DEC-80 800
7521 WARD SALESMAN
7698 22-FEB-81 1250
7654 MARTIN SALESMAN
7698 28-SEP-81 1250
7499 ALLEN SALESMAN
7698 20-FEB-81 1600
7566 JONES MANAGER
7839 02-APR-81 2975
Watch Out! Because
the ROWNUM is assigned upon retrieval, it is assigned prior to any sorting!
This is opposite to the result you would get in SQL Server using
the TOP clause. In order to select employees with the highest five salaries,
you have to force sorting and then apply ROWNUM condition. Here is the
syntax for a top-N query where N = 5 (this syntax with the subquery works
only in Oracle 8i):
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM
(SELECT Empno, Ename, Job, Mgr, Hiredate,
Sal
FROM
Emp
ORDER
BY NVL(Sal, 0) DESC)
WHERE ROWNUM < 6;
Listing 4 shows the output of the above code. I used the NVL() function to sort the expression because sorting just by Emp_Salary would have put all records with NULL salary before those with the highest salaries, and that's not what we wanted to achieve.
EMPNO ENAME JOB
MGR HIREDATE SAL
----- ---------- --------- ---------- --------- ----------
7839 KING PRESIDENT
17-NOV-81 5000
7788 SCOTT ANALYST
7566 19-APR-87 3000
7902 FORD ANALYST
7566 03-DEC-81 3000
7566 JONES MANAGER
7839 02-APR-81 2975
7698 BLAKE MANAGER
7839 01-MAY-81 2850
Listing 4. After forced sorting and applying the ROWNUM condition.
Utilizing
Oracle 8i's Ranking Functions
Another way to perform a top-N query uses the new Oracle 8i feature
called "analytic functions" . The SQL language, while extremely
capable in many areas, has never provided strong support for analytic tasks,
such as computing rankings, cumulative and moving averages, lead/lag comparisons,
and reporting.
Oracle 8i now provides a new wide set of analytic functions that address
this need.
For a top-N query you can use two ranking functions: RANK and DENSE_RANK.
Both allow you to rank items in a group—for example, finding the top-five
employees by salary.
The difference between RANK() and DENSE_RANK() is that RANK() leaves
gaps in the ranking sequence when there are ties. In our case, Scott and
Ford tie for second place with a $3,000 salary; Jones' $2,975 salary brings
him in third place using DENSE_RANK() but only fourth place using RANK():
SELECT Empno, Ename, Job, Mgr, Sal,
RANK() OVER
(ORDER BY SAL Desc NULLS LAST) AS Rank,
DENSE_RANK() OVER
(ORDER BY SAL Desc NULLS LAST) AS Drank
FROM Emp
ORDER BY SAL Desc NULLS LAST;
Listing 5 shows the output of the above code.
EMPNO ENAME JOB
MGR SAL
RANK DRANK
------ ---------- --------- ---------- ---------- ---------- ----------
7839 KING PRESIDENT
5000 1
1
7788 SCOTT
ANALYST 7566
3000 2
2
7902 FORD
ANALYST 7566
3000 2
2
7566 JONES
MANAGER 7839
2975 4
3
7698 BLAKE MANAGER
7839 2850
5 4
7782 CLARK MANAGER
7839 2450
6 5
7499 ALLEN SALESMAN
7698 1600
7 6
7844 TURNER SALESMAN
7698 1500
8 7
7934 MILLER CLERK
7782 1300
9 8
7521 WARD SALESMAN
7698 1250
10 9
7654 MARTIN SALESMAN
7698 1250
10 9
7876 ADAMS CLERK
7788 1100
12 10
7369 SMITH CLERK
7902 800
13 11
7900 JAMES CLERK
7698
14 12
Listing 5. Compare the difference between RANK() and DENSE_RANK()
in the last columns of this output.
The NULLS FIRST | NULLS LAST clause determines the position of rows with NULL values in the ordered query.
If the sequence is in descending order, then NULLS LAST implies that NULL values are smaller than non-NULL ones and rows with NULLs will appear at the bottom of the list. If the NULLS FIRST | NULLS LAST clause is omitted, then NULL values are considered larger than any other values and their ordering position depends on the ASC | DESC arguments.
If the ordering sequence is ascending (ASC), then rows with NULLs will
appear last; if the sequence is descending (DESC), then rows with NULLs
will appear first. NULLs are considered equal to other NULLs and, therefore,
the order in which rows with NULLs are presented is nondeterministic.
Using RANK() to Obtain a Top-N Query
To obtain a top-N query, use RANK() in a subquery and then apply a
filter condition outside the subquery:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM
(SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
RANK() OVER
(ORDER BY SAL Desc
NULLS LAST) AS Emp_Rank
FROM Emp
ORDER BY SAL Desc NULLS LAST)
WHERE Emp_Rank < 6;
Listing 6 shows the output of the above code.
EMPNO ENAME JOB
MGR HIREDATE SAL
------ ---------- --------- ---------- --------- ----------
7839 KING PRESIDENT
17-NOV-81 5000
7788 SCOTT ANALYST
7566 19-APR-87 3000
7902 FORD ANALYST
7566 03-DEC-81 3000
7566 JONES MANAGER
7839 02-APR-81 2975
7698 BLAKE MANAGER
7839 01-MAY-81 2850
Listing 6. Obtaining a top-N query by using RANK() in a subquery
and then applying a filter condition outside the subquery.
Using the same technique, you can retrieve the bottom-five employees
by salary:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM
(SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
RANK() OVER
(ORDER BY SAL ASC
NULLS FIRST) AS Emp_Rank
FROM Emp
ORDER BY SAL ASC NULLS FIRST)
WHERE Emp_Rank < 6;
Listing 7 shows the output of the above code.
EMPNO ENAME JOB
MGR HIREDATE SAL
----- ---------- --------- ---------- --------- ----------
7900 JAMES CLERK
7698 03-DEC-81
7369 SMITH CLERK
7902 17-DEC-80 800
7876 ADAMS CLERK
7788 23-MAY-87 1100
7521 WARD SALESMAN
7698 22-FEB-81 1250
7654 MARTIN SALESMAN
7698 28-SEP-81 1250
Listing 7. Using the same technique shown in Listing 6 to retrieve
the bottom-five employees by salary.
Ranking functions can be used to operate within groups, too—that is,
the rank value gets reset whenever the group changes. This is achieved
with a PARTION BY subclause. Here is the syntax to retrieve the top employee
by salary per manager group:
SELECT Empno, Ename, Job, Mgr, Hiredate, Sal
FROM
(SELECT Empno, Ename, Job, Mgr, Hiredate, Sal,
RANK() OVER
(PARTITION BY MGR
ORDER BY MGR, SAL DESC NULLS LAST) AS Emp_Rank
FROM Emp
ORDER BY MGR, SAL DESC NULLS LAST)
WHERE Emp_Rank = 1;
Listing 8 shows the output of the above code.
EMPNO ENAME JOB
MGR HIREDATE SAL
------ ---------- --------- ---------- --------- ----------
7788 SCOTT ANALYST
7566 19-APR-87 3000
7902 FORD ANALYST
7566 03-DEC-81 3000
7499 ALLEN SALESMAN
7698 20-FEB-81 1600
7934 MILLER CLERK
7782 23-JAN-82 1300
7876 ADAMS CLERK
7788 23-MAY-87 1100
7566 JONES MANAGER
7839 02-APR-81 2975
7369 SMITH CLERK
7902 17-DEC-80 800
7839 KING PRESIDENT
17-NOV-81 5000
Listing 8. Using a ranking function within a group.
As you can see, analytic functions are extremely useful in all types of analysis and computations, and they provide substantial SQL extensions to Oracle 8i.