你可以為列統(tǒng)計(jì)直方圖。這些直方圖為列數(shù)據(jù)的分布情況提供準(zhǔn)確的估算。當(dāng)列數(shù)據(jù)傾斜時(shí),直方圖提供更為優(yōu)化的選擇性估算,為數(shù)據(jù)分布不均勻的情況提供最優(yōu)的執(zhí)行計(jì)劃。
Oracle Database為提供2種類別的列統(tǒng)計(jì)信息直方圖:
-
Height-Balanced Histograms
-
Frequency Histograms
數(shù)據(jù)庫存儲(chǔ)直方圖信息*TAB_COL_STATISTICS視圖(用戶和DBA)。列值范圍:HEIGHTBALANCED, FREQUENCY, or NONE。
2、Height-Balanced Histograms
在height-balanced histogram中,列值被劃分為桶,使得每個(gè)桶包含大致相同數(shù)量的行。直方圖顯示端點(diǎn)在值范圍內(nèi)的位置。
考慮一個(gè)my_col值為1到100之間的列,以及一個(gè)10個(gè)桶的直方圖。如果數(shù)據(jù)my_col均勻分布,則直方圖看起來與圖13-1類似,其中數(shù)字是端點(diǎn)值。例如,第七個(gè)桶具有值在60到70之間的行。
圖13-1具有均勻分布的高度平衡直方圖
每個(gè)桶中的行數(shù)為總行數(shù)的10%。在這個(gè)均勻分布的例子中,40%的行的值在60到100之間。
如果數(shù)據(jù)不均勻分布,則直方圖可能如圖13-2所示。在這種情況下,大多數(shù)行的列的值為5。只有10%的行的值在60到100之間。
圖13-2具有非均勻分布的高度平衡直方圖
您可以使用USER_TAB_HISTOGRAMS表格查看高度平衡的直方圖,如示例13-1所示。
-
BEGIN
-
DBMS_STATS
.
GATHER_table_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 10 quantity_on_hand'
)
;
-
END
;
-
/
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM
-
FROM
USER_TAB_COL_STATISTICS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND'
;
-
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-
------------------------------ ------------ ----------- ---------------
-
QUANTITY_ON_HAND 237 10 HEIGHT BALANCED
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE
-
FROM
USER_TAB_HISTOGRAMS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'QUANTITY_ON_HAND'
-
ORDER
BY
ENDPOINT_NUMBER
;
-
ENDPOINT_NUMBER ENDPOINT_VALUE
-
--------------- --------------
-
0 0
-
1 27
-
2 42
-
3 57
-
4 74
-
5 98
-
6 123
-
7 149
-
8 175
-
9 202
-
10 353
在示例13-1查詢輸出中,一行(1-10)對(duì)應(yīng)于直方圖中的每個(gè)桶。Oracle數(shù)據(jù)庫向該直方圖添加了特殊的第0個(gè)數(shù)據(jù)桶,因?yàn)榈?個(gè)數(shù)據(jù)桶(27)中的值不是quantity_on_hand列的最小值。第0個(gè)桶的最小值為0 quantity_on_hand。
3、
frequency histogra
m
在
frequency histogram中,列的每個(gè)值對(duì)應(yīng)于直方圖的單個(gè)桶。每個(gè)桶包含此單個(gè)值的出現(xiàn)次數(shù)。例如,假設(shè)36行包含列的值1 warehouse_id。端點(diǎn)值1具有端點(diǎn)號(hào)36。
數(shù)據(jù)庫在以下條件下自動(dòng)創(chuàng)建頻率直方圖,而不是高度平衡的直方圖:
-
不同值的數(shù)量小于或等于指定的直方圖桶數(shù)(最多254個(gè))。
-
每個(gè)列值重復(fù)一次。
您可以使用USER_TAB_HISTOGRAMS視圖查看頻率直方圖,如示例13-2所示。
-
BEGIN
-
DBMS_STATS
.
GATHER_TABLE_STATS
(
-
OWNNAME
=
>
'OE'
,
-
TABNAME
=
>
'INVENTORIES'
,
-
METHOD_OPT
=
>
'FOR COLUMNS SIZE 20 warehouse_id'
)
;
-
END
;
-
/
-
SELECT
COLUMN_NAME
,
NUM_DISTINCT
,
NUM_BUCKETS
,
HISTOGRAM
-
FROM
USER_TAB_COL_STATISTICS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID'
;
-
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM
-
------------------------------ ------------ ----------- ---------------
-
WAREHOUSE_ID 9 9 FREQUENCY
-
SELECT
ENDPOINT_NUMBER
,
ENDPOINT_VALUE
-
FROM
USER_TAB_HISTOGRAMS
-
WHERE
TABLE_NAME
=
'INVENTORIES'
AND
COLUMN_NAME
=
'WAREHOUSE_ID'
-
ORDER
BY
ENDPOINT_NUMBER
;
-
ENDPOINT_NUMBER ENDPOINT_VALUE
-
--------------- --------------
-
36 1
-
213 2
-
261 3
-
370 4
-
484 5
-
692 6
-
798 7
-
984 8
-
1112 9
在例13-2中,第一個(gè)桶為warehouse_id1。該值在表中顯示36次,如以下查詢所證實(shí):
oe@PROD> SELECT COUNT(*) FROM inventories WHERE warehouse_id = 1;
COUNT(*)
----------
36
5、
練習(xí)4
、直方圖優(yōu)化練習(xí)
統(tǒng)計(jì)已銷戶用戶數(shù)量,請(qǐng)優(yōu)化以下語句
select count(1) from ht.c_cons where status='close';
-
SQL
>
select
status
,
count
(
1
)
from
ht
.
c_cons
group
by
status
;
-
STATUS
COUNT
(
1
)
-
------------------------------------------------------------ ----------
-
close 19
-
open 9519
-
creating 462
-
SQL
>
create
index
ht
.
idx_c_cons_status
on
ht
.
c_cons
(
status
)
;
-
SQL
>
col
owner
for
a10
-
col
table_name
for
a20
-
col
column_name
for
a20
-
col
data_type
for
a30
-
col
histogram
for
a20
-
select
owner
,
table_name
,
column_name
,
data_type
,
-
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT'
-
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4
-
OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
-
---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
-
AUG
-
17
-
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
-
AUG
-
17
-
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS STATUS
VARCHAR2
5 3 NONE 0 20
-
AUG
-
17
-
SQL
>
exec DBMS_STATS
.
GATHER_TABLE_STATS
(
ownname
=
>
'HT'
,
tabname
=
>
'C_CONS'
,
estimate_percent
=
>
30
,
method_opt
=
>
'for columns size 50 status'
,
no_invalidate
=
>
FALSE
,
degree
=
>
4
,
cascade
=
>
TRUE
)
;
-
PL
/
SQL procedure successfully completed
.
-
SQL
>
col
owner
for
a10
-
SQL
>
col
table_name
for
a20
-
col
column_name
for
a20
-
col
data_type
for
a30
-
col
histogram
for
a20
-
select
owner
,
table_name
,
column_name
,
data_type
,
-
column_id
,
num_distinct
,
histogram
,
NUM_NULLS
,
LAST_ANALYZED
from
-
dba_tab_columns
where
table_name
=
'C_CONS'
and
owner
=
'HT'
-
order
by
column_id
;
SQL
>
SQL
>
SQL
>
SQL
>
2 3 4
-
OWNER TABLE_NAME COLUMN_NAME DATA_TYPE COLUMN_ID NUM_DISTINCT HISTOGRAM NUM_NULLS LAST_ANALYZED
-
---------- -------------------- -------------------- ------------------------------ ---------- ------------ -------------------- ---------- ------------------------------
-
HT C_CONS CONS_NO
NUMBER
1 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS CONS_NAME
VARCHAR2
2 5057 NONE 0 20
-
AUG
-
17
-
HT C_CONS ORG_NAME
VARCHAR2
3 12 NONE 0 20
-
AUG
-
17
-
HT C_CONS BUILD_DATE
DATE
4 10000 NONE 0 20
-
AUG
-
17
-
HT C_CONS STATUS
VARCHAR2
5 3 FREQUENCY 0 20
-
AUG
-
17
-
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'open'
;
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash
value
:
2016425671
-
-------------------------------------------------------------------------------------------
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
|
-
-------------------------------------------------------------------------------------------
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
8
(
0
)
|
00
:
00
:
01
|
-
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
|
-
|
*
2
|
INDEX
FAST FULL SCAN
|
IDX_C_CONS_STATUS
|
9639
|
57834
|
8
(
0
)
|
00
:
00
:
01
|
-
-------------------------------------------------------------------------------------------
-
Predicate Information
(
identified
by
operation id
)
:
-
---------------------------------------------------
-
2
-
filter
(
"STATUS"
=
'open'
)
-
Statistics
-
----------------------------------------------------------
-
1 recursive calls
-
0 db block gets
-
28 consistent gets
-
0 physical reads
-
0 redo
size
-
527 bytes sent via SQL
*
Net
to
client
-
523 bytes received via SQL
*
Net
from
client
-
2 SQL
*
Net roundtrips
to
/
from
client
-
0 sorts
(
memory
)
-
0 sorts
(
disk
)
-
1
rows
processed
-
SQL
>
-
SQL
>
select
count
(
1
)
from
ht
.
c_cons
where
status
=
'close'
;
-
Execution Plan
-
----------------------------------------------------------
-
Plan hash
value
:
2292286995
-
---------------------------------------------------------------------------------------
-
|
Id
|
Operation
|
Name
|
Rows
|
Bytes
|
Cost
(
%
CPU
)
|
Time
|
-
---------------------------------------------------------------------------------------
-
|
0
|
SELECT
STATEMENT
|
|
1
|
6
|
1
(
0
)
|
00
:
00
:
01
|
-
|
1
|
SORT AGGREGATE
|
|
1
|
6
|
|
|
-
|
*
2
|
INDEX
RANGE SCAN
|
IDX_C_CONS_STATUS
|
24
|
144
|
1
(
0
)
|
00
:
00
:
01
|
-
---------------------------------------------------------------------------------------
-
Predicate Information
(
identified
by
operation id
)
:
-
---------------------------------------------------
-
2
-
access
(
"STATUS"
=
'close'
)
-
Statistics
-
----------------------------------------------------------
-
1 recursive calls
-
0 db block gets
-
2 consistent gets
-
0 physical reads
-
0 redo
size
-
526 bytes sent via SQL
*
Net
to
client
-
523 bytes received via SQL
*
Net
from
client
-
2 SQL
*
Net roundtrips
to
/
from
client
-
0 sorts
(
memory
)
-
0 sorts
(
disk
)
-
1
rows
processed