SQL-related analytical functions are very common in the world of Big Data. For instance, there is always a need to retrieve n-tier levels of data from a database. A good example is top half of a data slice versus bottom half of the same data slice (top n-reporting versus bottom n-reporting). It is even possible to compute inner n-reporting.
Taking Oracle as a database, it is best to illustrate this concept via examples.
First, create a table called myEmployees.
create table myEmployees
( department_id number,
Second, insert data into the table.
insert into myEmployees values ( 10, ‘Dave’, 100);
insert into myEmployees values ( 20, ‘Patel’, 300);
insert into myEmployees values ( 10, ‘Divyesh’, 200);
insert into myEmployees values ( 20, ‘Kassam’, 400);
insert into myEmployees values ( 10, ‘Asher’, 500);
insert into myEmployees values ( 30, ‘Teh’, 600);
Now, let us explain the concept of ROW_NUMBER
In Oracle, ROW_NUMBER is considered to be an analytic function. It assigns unique values for each row returned by a partition. Conceptually, one can think of a “partition by” as a “group of”. So for the query below, ROW_NUMBER assigns a unique value for row returned from a partition of (group of) department ids which have been ordered by employee ids.
Bear in mind, values need not be unique across partitions.
select department_id, last_name, employee_id, row_number()
over (partition by department_id order by employee_id) as emp_id
Its output is shown below.
The query below is a good example of inner n-reporting.
select last_name from
(select last_name, row_number() over (order by last_name) rank from myEmployees)
where rank between 2 and 3;
The nested “select” query in blue text from the SQL statement above returns the data below.
The overall query specifies rank between 2 and 3 inclusive (in red text). For rank=2, we have “Dave” and for rank=3, we have “Divyesh”. This is what we get in the output below.