oracle中row_number和rownum的区别和联系(翻译)

2013/08/06 17:20:57 No Comments

附问题:有以下一个SQL语句:

SELECT  *
FROM    (
        SELECT  t.*, row_number() OVER (ORDER BY ID) rn
        FROM    mytable t
        )
WHERE   rn BETWEEN :start and :end

sql中的order by语句大大降低了处理的速度,如果把order by去掉,相应的执行计划会大大地提高。如果换成下面的sql:

SELECT  t.*, row_number() OVER (ORDER BY ID) rn
FROM    mytable t
WHERE   rownum BETWEEN :start and :end

很明显,这个sql是错的,根本查询不了正确的数据信息。是否有其它的方法可以提高查询速度?
针对以上问题,就必须要了解一下关于row_number和rownum的区别,以及如何来运用这些信息。

首先了解一下rownum是如何进行工作的,根据oracle的官方文档:
如果对rownum进行大于比较,这个比较将直接返回false。如,下列sql语句将不能返回任何数据信息:

SELECT  *
FROM    employees
WHERE   ROWNUM > 1

在查询中,第一条被命中的数据将赋予一个伪列rownum为1,那么这个条件就为false。第二条被命中的数据由于第一条的false将重新成为第一条数据,那么仍然赋值为1,显示这个条件仍然为false。后续所有的数据将重复执行这个逻辑,最后一条数据也没有返回。

这就是为什么之前的第2个查询,应该转换为以下的sql语句:

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    mytable t
        ORDER BY
                paginator, id
        )
WHERE   rn BETWEEN :start and :end

接下来,需要通过创建一些临时数据表来查看这个sql语句的执行性能,我们将创建临时表,追加索引,然后填充数据,最后分析这个sql语句的查询信息。

CREATE TABLE mytable (
        id NUMBER(10) NOT NULL,
        paginator NUMBER(10) NOT NULL,
        value VARCHAR2(50)
)
/
ALTER TABLE mytable
ADD CONSTRAINT pk_mytable_id PRIMARY KEY (id)
/

CREATE INDEX ix_mytable_paginator_id ON mytable(paginator, id)
/

INSERT
INTO    mytable(id, paginator, value)
SELECT  level, level / 10000, 'Value ' || level
FROM    dual
CONNECT BY
        level <= 1000000
/

COMMIT
/

BEGIN
        DBMS_STATS.gather_schema_stats('"20090506_rownum"');
END;
/

这个Sql语句创建一个包括100万条数据的表,并且创建一个联合索引.
同时,在这个查询中,patinator字段是不是惟一的,是为了在之后展示这样一种现象:
在查询中,某些数据可能在不同的分页查询中出现多次,而某些数据则可能根据不会被查询出
这就是所谓的分页混乱。

然后,分别使用row_numer和rownum分别进行查询,返回从900001到900010之间的10条数据信息。
row_number()

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (ORDER BY paginator, id) AS rn
        FROM    mytable t
        )
WHERE   rn BETWEEN 900001 AND 900010
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.8594s)
SELECT STATEMENT 
 VIEW 
  WINDOW NOSORT STOPKEY
   TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
    INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

rownum

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    (
                SELECT  *
                FROM    mytable
                ORDER BY
                        paginator, id
                ) t
        )
WHERE   rn BETWEEN 900001 AND 900010
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.7058)
SELECT STATEMENT 
 VIEW 
  COUNT 
   VIEW 
    TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
     INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

从上文中,可以看出,使用rownum的查询速度略快于row_number函数。
然后再看一个row_number查询,可以看出oracle足够的智能,它可以通过使用联合索引而避免进行排序操作,然后通过使用stopkey操作,可以直接快速查找到相应的数据信息。
rownum查询也同样使用索引,但并没有利用stopkey条件,只是简单的计数操作。
那么,能否同样让rownum使用stopkey呢。在之前的查询中,oracle并不知道这个rn就是在内层查询rownum的别名,我们可以重写查询,在外层查询中使用rownum,这样就可以在外层利用stopkey条件了。这就是我们常见的oracle3层分页的变形:

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    (
                SELECT  *
                FROM    mytable
                ORDER BY
                        paginator, id
                ) t
        )
WHERE   rn >= 900001
        AND rownum <= 10
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.4714s)
SELECT STATEMENT 
 COUNT STOPKEY
  VIEW 
   COUNT 
    VIEW 
     TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
      INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

在这个查询中,oracle利用了stopkey,同时速度只有471ms,比原来更快。

如果row_number和rownum使用同样的执行计划,但为什么rownum明显更快呢。
这是因为:oracle的历史实在是太久了,而不同的时间导致相同的特性却有不同的效果。

rownum在oracle6中被引进,发布时间为1988年,在当时什么资源和条件都不满足的情况下,作为一个简单的计数器,被认为是非常简单和高效的。
而随着时代的发展,更多的需求被提及出来,这时,一个相当于但功能比rownum更强大的函数被引入,这就是row_number函数,它从oracle9i开始被引进。这时,效率已经不再是惟一的条件了,所以row_number的实现也不再以效率为惟一的指标了。

当然,如果你有更多的要求,如分组排序等,则需要使用row_number函数,但如果你仅仅是简单的分页查询,建议使用rownum,这也是为什么在现在的时代rownum还是这么流行(据说在oracle12c中有offset分页操作符了,内部同样使用row_number函数,这样rownum可以退休了)

以下是英文原文:http://explainextended.com/2009/05/06/oracle-row_number-vs-rownum/

From Stack Overflow:
I have an SQL query that looks something like this:

SELECT  *
FROM    (
        SELECT  t.*, row_number() OVER (ORDER BY ID) rn
        FROM    mytable t
        )
WHERE   rn BETWEEN :start and :end

Essentially, it’s the ORDER BY part that’s slowing things down. If I were to remove it, the EXPLAIN cost goes down by an order of magnitude (over 1,000 times).
I’ve tried this:

SELECT  t.*, row_number() OVER (ORDER BY ID) rn
FROM    mytable t
WHERE   rownum BETWEEN :start and :end

, but this doesn’t give correct results.
Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAIN tool?


First, just a quick reminder on how ROWNUM works. From Oracle‘s documentation:
Conditions testing for ROWNUM values greater than a positive integer are always false. For example, this query returns no rows:

SELECT  *
FROM    employees
WHERE   ROWNUM > 1

The first row fetched is assigned a ROWNUM of 1 and makes the condition false. The second row to be fetched is now the first row and is also assigned a ROWNUM of 1 and makes the condition false. All rows subsequently fail to satisfy the condition, so no rows are returned.


That’s why the second query should look like this:

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    mytable t
        ORDER BY
                paginator, id
        )
WHERE   rn BETWEEN :start and :end

Now, let’s see the performance. To do this, we’ll create a sample table:

CREATE TABLE mytable (
        id NUMBER(10) NOT NULL,
        paginator NUMBER(10) NOT NULL,
        value VARCHAR2(50)
)
/
ALTER TABLE mytable
ADD CONSTRAINT pk_mytable_id PRIMARY KEY (id)
/

CREATE INDEX ix_mytable_paginator_id ON mytable(paginator, id)
/

INSERT
INTO    mytable(id, paginator, value)
SELECT  level, level / 10000, 'Value ' || level
FROM    dual
CONNECT BY
        level <= 1000000
/

COMMIT
/

BEGIN
        DBMS_STATS.gather_schema_stats('"20090506_rownum"');
END;
/

This query has 1,000,000 records and an index on (paginator, id).

I deliberately made the paginator non-UNIQUE to demonstrate that the paging query should always include a unique column set into the sort. Otherwise, a record can be selected twice on two different pages, or not selected at all.

Let’s query the table with both queries, returning 10 values from 900,001 to 900,010, and see which one performs better.

First, the ROW_NUMBER():

SELECT  *
FROM    (
        SELECT  t.*, ROW_NUMBER() OVER (ORDER BY paginator, id) AS rn
        FROM    mytable t
        )
WHERE   rn BETWEEN 900001 AND 900010
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.8594s)
SELECT STATEMENT 
 VIEW 
  WINDOW NOSORT STOPKEY
   TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
    INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

And the ROWNUM:

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    (
                SELECT  *
                FROM    mytable
                ORDER BY
                        paginator, id
                ) t
        )
WHERE   rn BETWEEN 900001 AND 900010
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.7058)
SELECT STATEMENT 
 VIEW 
  COUNT 
   VIEW 
    TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
     INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

We can see that ROW_NUMBER() works for 850 ms, while the ROWNUM one for only 700 ms. ROWNUM is slightly more efficient.

But let’s look closer into the plan for ROW_NUMBER(). We see that Oracle is smart enough, first, to avoid sorting on (paginator, id) (since an index is available), and, second, to use a STOPKEY condition which ceases scanning as soon as it finds enough values.

The ROWNUM query uses the index too, but it does not employ STOPKEY condition, it just counts.

How can we enable STOPKEY for a ROWNUM query?

Unfortunately, Oracle’s optimizer cannot understand in this case that RN is an alias for ROWNUM in the inner subquery. That’s why we’ll need to rewrite the query a little so that we will have a limiting condition on ROWNUM the outer subquery and STOPKEY will became usable:

SELECT  *
FROM    (
        SELECT  t.*, ROWNUM AS rn
        FROM    (
                SELECT  *
                FROM    mytable
                ORDER BY
                        paginator, id
                ) t
        )
WHERE   rn >= 900001
        AND rownum <= 10
ID PAGINATOR VALUE RN
900001 90 Value 900001 900001
900002 90 Value 900002 900002
900003 90 Value 900003 900003
900004 90 Value 900004 900004
900005 90 Value 900005 900005
900006 90 Value 900006 900006
900007 90 Value 900007 900007
900008 90 Value 900008 900008
900009 90 Value 900009 900009
900010 90 Value 900010 900010
10 rows fetched in 0.0005s (0.4714s)
SELECT STATEMENT 
 COUNT STOPKEY
  VIEW 
   COUNT 
    VIEW 
     TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
      INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID

We now make lower filtering on RN (which is an alias for ROWNUM from the inner subquery), and upper filtering on ROWNUM in the outer subquery, which counts not the number of total rows returned, but the number of rows satisfying the first condition.

In this case Oracle will use the STOPKEY, and the query now runs for only 471 ms, twice as fast as the original one.

But if ROW_NUMBER and ROWNUM use essentially the same plan, why the latter one is so much faster?

This is because Oracle is very, very old.

ROWNUM was introduced in Oracle 6 that was released in 1988. You remember 1988? You needed your PC to be IBM compatible to run Oracle.

And by that time it already had ROWNUM, so says its ARJ-compressed manual (you remember ARJ?).

Large HTML online manuals for PostgreSQL and MySQL, by the way, still don’t mention any ROWNUM, it needs to be emulated (read my yesterday’s article PostgreSQL: row numbers to see how to do it).

OK. So ROWNUM is 1988, that means it’s both simple and efficient. Just a pure counter.

Analytic functions, on the other hand, were introduced in Oracle 9i in 2001. Y2K was already a history by that time, there was nothing to be afraid of, and ROW_NUMBER was implemented in not so efficient way.

It is of course powerful and flexible, but slow for simple tasks like pagination.

转载请标明出处:i flym
本文地址:https://www.iflym.com/index.php/english-translate/201308060001.html

相关文章:

留下足迹