Monday, August 27, 2012

Useful function to find Percentile in Oracle



CUME_DIST:
In Oracle documentation, Description of this function is somewhat difficult to understand. So I am putting description of this function in my words. Technically If I am wrong and any Oracle expert wants to rectify it please let me know
This function can be used to find percentile of any row value within set of rows. It can take numeric or non-numeric input and returns numeric values.
This function can be used in two ways i.e As Aggregate function or Analytic function.
As a Aggregate function:
Function will take any hypothetical or any row value from a table and It will return hypothetical position of Input value within group of rows.
Example:
I am considering table emp which have records like:


Select * from emp;
EMPNOENAMEJOBMGRHIREDATESALCOMMDEPTNO
7369
SMITHCLERK
7902
17-DEC-80
800
 
20
7499
ALLENSALESMAN
7698
20-FEB-81
1600
300
30
7521
WARDSALESMAN
7698
22-FEB-81
1250
500
30
7566
JONESMANAGER
7839
02-APR-81
2975
 
20
7654
MARTINSALESMAN
7698
28-SEP-81
1250
1400
30
7698
BLAKEMANAGER
7839
01-MAY-81
2850
 
30
7782
CLARKMANAGER
7839
09-JUN-81
2450
 
10
7788
SCOTTANALYST
7566
19-APR-87
3000
 
20
7839
KINGPRESIDENT 17-NOV-81
5000
 
10
7844
TURNERSALESMAN
7698
08-SEP-81
1500
0
30
7876
ADAMSCLERK
7788
23-MAY-87
1100
 
20
7900
JAMESCLERK
7698
03-DEC-81
950
 
30
7902
FORDANALYST
7566
03-DEC-81
3000
 
20
7934
MILLERCLERK
7782
23-JAN-82
1300
 
10


Now If I want to find position for the percentile position for 1500 and 1400 commision salary. I can write query like :


SQL> SELECT CUME_DIST(1500, 1400) WITHIN GROUP
 2    (ORDER BY sal, comm) "Cume-Dist of 15500"
 3    FROM emp;

Cume-Dist of 15500
------------------
       .533333333

This Query will give hypothical percentile of salary-1500 and comm.-1400 within whole table emp according to order of sal and comm.
Here we can see that although we haven’t inserted value 1500 and 1400 values in table , It is giving percentile from table for these values.
Now , If I apply group by clause in above query , It will hypothetical percentile for each job group :


SQL> SELECT job, CUME_DIST(1500, 1400) WITHIN GROUP
 2    (ORDER BY sal, comm) "Cume-Dist of 15500"
 3    FROM emp
 4    group by job;

JOB       Cume-Dist of 15500
--------- ------------------
ANALYST           .333333333
CLERK                      1
MANAGER                  .25
PRESIDENT                 .5
SALESMAN                  .8

As a Analytic function:

As It is analytic function, It will be evaluated at last in Query execution after Order by clause in Main query.

Example :

It is finding relative percentile position for each salary value which are in CLERK job group. With Ordering of salary.

SQL> SELECT job, ename, sal, CUME_DIST()
 2    OVER (PARTITION BY job ORDER BY sal) AS cume_dist
 3    FROM emp
 4    WHERE job LIKE 'CLERK'
 5    ORDER BY job, ename, sal, cume_dist;

JOB       ENAME             SAL  CUME_DIST
--------- ---------- ---------- ----------
CLERK     ADAMS            1100        .75
CLERK     JAMES             950         .5
CLERK     MILLER           1300          1
CLERK     SMITH             800        .25

No comments:

Post a Comment