PostgreSQL中ReadBuffer_common函數(shù)有什么作用

這篇文章主要介紹“PostgreSQL中ReadBuffer_common函數(shù)有什么作用”，在日常操作中，相信很多人在PostgreSQL中ReadBuffer_common函數(shù)有什么作用問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”PostgreSQL中ReadBuffer_common函數(shù)有什么作用”的疑惑有所幫助！接下來，請跟著小編一起來學(xué)習(xí)吧！

在余杭等地區(qū)，都構(gòu)建了全面的區(qū)域性戰(zhàn)略布局，加強(qiáng)發(fā)展的系統(tǒng)性、市場前瞻性、產(chǎn)品創(chuàng)新能力，以專注、極致的服務(wù)理念，為客戶提供做網(wǎng)站、成都網(wǎng)站建設(shè) 網(wǎng)站設(shè)計(jì)制作按需網(wǎng)站設(shè)計(jì),公司網(wǎng)站建設(shè),企業(yè)網(wǎng)站建設(shè),成都品牌網(wǎng)站建設(shè),成都營銷網(wǎng)站建設(shè),成都外貿(mào)網(wǎng)站建設(shè),余杭網(wǎng)站建設(shè)費(fèi)用合理。

一、數(shù)據(jù)結(jié)構(gòu)

BufferDesc
共享緩沖區(qū)的共享描述符(狀態(tài))數(shù)據(jù)

/*
 * Flags for buffer descriptors
 * buffer描述器標(biāo)記
 *
 * Note: TAG_VALID essentially means that there is a buffer hashtable
 * entry associated with the buffer's tag.
 * 注意:TAG_VALID本質(zhì)上意味著有一個(gè)與緩沖區(qū)的標(biāo)記相關(guān)聯(lián)的緩沖區(qū)散列表?xiàng)l目。
 */
//buffer header鎖定
#define BM_LOCKED               (1U << 22)  /* buffer header is locked */
//數(shù)據(jù)需要寫入(標(biāo)記為DIRTY)
#define BM_DIRTY                (1U << 23)  /* data needs writing */
//數(shù)據(jù)是有效的
#define BM_VALID                (1U << 24)  /* data is valid */
//已分配buffer tag
#define BM_TAG_VALID            (1U << 25)  /* tag is assigned */
//正在R/W
#define BM_IO_IN_PROGRESS       (1U << 26)  /* read or write in progress */
//上一個(gè)I/O出現(xiàn)錯(cuò)誤
#define BM_IO_ERROR             (1U << 27)  /* previous I/O failed */
//開始寫則變DIRTY
#define BM_JUST_DIRTIED         (1U << 28)  /* dirtied since write started */
//存在等待sole pin的其他進(jìn)程
#define BM_PIN_COUNT_WAITER     (1U << 29)  /* have waiter for sole pin */
//checkpoint發(fā)生,必須刷到磁盤上
#define BM_CHECKPOINT_NEEDED    (1U << 30)  /* must write for checkpoint */
//持久化buffer(不是unlogged或者初始化fork)
#define BM_PERMANENT            (1U << 31)  /* permanent buffer (not unlogged,
                                             * or init fork) */
/*
 *  BufferDesc -- shared descriptor/state data for a single shared buffer.
 *  BufferDesc -- 共享緩沖區(qū)的共享描述符(狀態(tài))數(shù)據(jù)
 *
 * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change
 * the tag, state or wait_backend_pid fields.  In general, buffer header lock
 * is a spinlock which is combined with flags, refcount and usagecount into
 * single atomic variable.  This layout allow us to do some operations in a
 * single atomic operation, without actually acquiring and releasing spinlock;
 * for instance, increase or decrease refcount.  buf_id field never changes
 * after initialization, so does not need locking.  freeNext is protected by
 * the buffer_strategy_lock not buffer header lock.  The LWLock can take care
 * of itself.  The buffer header lock is *not* used to control access to the
 * data in the buffer!
 * 注意:必須持有Buffer header鎖(BM_LOCKED標(biāo)記)才能檢查或修改tag/state/wait_backend_pid字段.
 * 通常來說,buffer header lock是spinlock,它與標(biāo)記位/參考計(jì)數(shù)/使用計(jì)數(shù)組合到單個(gè)原子變量中.
 * 這個(gè)布局設(shè)計(jì)允許我們執(zhí)行原子操作,而不需要實(shí)際獲得或者釋放spinlock(比如,增加或者減少參考計(jì)數(shù)).
 * buf_id字段在初始化后不會(huì)出現(xiàn)變化,因此不需要鎖定.
 * freeNext通過buffer_strategy_lock鎖而不是buffer header lock保護(hù).
 * LWLock可以很好的處理自己的狀態(tài).
 * 務(wù)請注意的是:buffer header lock不用于控制buffer中的數(shù)據(jù)訪問!
 *
 * It's assumed that nobody changes the state field while buffer header lock
 * is held.  Thus buffer header lock holder can do complex updates of the
 * state variable in single write, simultaneously with lock release (cleaning
 * BM_LOCKED flag).  On the other hand, updating of state without holding
 * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag
 * is not set.  Atomic increment/decrement, OR/AND etc. are not allowed.
 * 假定在持有buffer header lock的情況下,沒有人改變狀態(tài)字段.
 * 持有buffer header lock的進(jìn)程可以執(zhí)行在單個(gè)寫操作中執(zhí)行復(fù)雜的狀態(tài)變量更新,
 *   同步的釋放鎖(清除BM_LOCKED標(biāo)記).
 * 換句話說,如果沒有持有buffer header lock的狀態(tài)更新,會(huì)受限于CAS,
 *   這種情況下確保BM_LOCKED沒有被設(shè)置.
 * 比如原子的增加/減少(AND/OR)等操作是不允許的.
 *
 * An exception is that if we have the buffer pinned, its tag can't change
 * underneath us, so we can examine the tag without locking the buffer header.
 * Also, in places we do one-time reads of the flags without bothering to
 * lock the buffer header; this is generally for situations where we don't
 * expect the flag bit being tested to be changing.
 * 一種例外情況是如果我們已有buffer pinned,該buffer的tag不能改變(在本進(jìn)程之下),
 *   因此不需要鎖定buffer header就可以檢查tag了.
 * 同時(shí),在執(zhí)行一次性的flags讀取時(shí)不需要鎖定buffer header.
 * 這種情況通常用于我們不希望正在測試的flag bit將被改變.
 *
 * We can't physically remove items from a disk page if another backend has
 * the buffer pinned.  Hence, a backend may need to wait for all other pins
 * to go away.  This is signaled by storing its own PID into
 * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER.  At present,
 * there can be only one such waiter per buffer.
 * 如果其他進(jìn)程有buffer pinned,那么進(jìn)程不能物理的從磁盤頁面中刪除items.
 * 因此,后臺(tái)進(jìn)程需要等待其他pins清除.這可以通過存儲(chǔ)它自己的PID到wait_backend_pid中,
 *   并設(shè)置標(biāo)記位BM_PIN_COUNT_WAITER.
 * 目前,每個(gè)緩沖區(qū)只能由一個(gè)等待進(jìn)程.
 *
 * We use this same struct for local buffer headers, but the locks are not
 * used and not all of the flag bits are useful either. To avoid unnecessary
 * overhead, manipulations of the state field should be done without actual
 * atomic operations (i.e. only pg_atomic_read_u32() and
 * pg_atomic_unlocked_write_u32()).
 * 本地緩沖頭部使用同樣的結(jié)構(gòu),但并不需要使用locks,而且并不是所有的標(biāo)記位都使用.
 * 為了避免不必要的負(fù)載,狀態(tài)域的維護(hù)不需要實(shí)際的原子操作
 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32())
 *
 * Be careful to avoid increasing the size of the struct when adding or
 * reordering members.  Keeping it below 64 bytes (the most common CPU
 * cache line size) is fairly important for performance.
 * 在增加或者記錄成員變量時(shí),小心避免增加結(jié)構(gòu)體的大小.
 * 保持結(jié)構(gòu)體大小在64字節(jié)內(nèi)(通常的CPU緩存線大小)對于性能是非常重要的.
 */
typedef struct BufferDesc
{
    //buffer tag
    BufferTag   tag;            /* ID of page contained in buffer */
    //buffer索引編號(hào)(0開始)
    int         buf_id;         /* buffer's index number (from 0) */
    /* state of the tag, containing flags, refcount and usagecount */
    //tag狀態(tài),包括flags/refcount和usagecount
    pg_atomic_uint32 state;
    //pin-count等待進(jìn)程ID
    int         wait_backend_pid;   /* backend PID of pin-count waiter */
    //空閑鏈表鏈中下一個(gè)空閑的buffer
    int         freeNext;       /* link in freelist chain */
    //緩沖區(qū)內(nèi)容鎖
    LWLock      content_lock;   /* to lock access to buffer contents */
} BufferDesc;

BufferTag
Buffer tag標(biāo)記了buffer存儲(chǔ)的是磁盤中哪個(gè)block

/*
 * Buffer tag identifies which disk block the buffer contains.
 * Buffer tag標(biāo)記了buffer存儲(chǔ)的是磁盤中哪個(gè)block
 *
 * Note: the BufferTag data must be sufficient to determine where to write the
 * block, without reference to pg_class or pg_tablespace entries.  It's
 * possible that the backend flushing the buffer doesn't even believe the
 * relation is visible yet (its xact may have started before the xact that
 * created the rel).  The storage manager must be able to cope anyway.
 * 注意:BufferTag必須足以確定如何寫block而不需要參照pg_class或者pg_tablespace數(shù)據(jù)字典信息.
 * 有可能后臺(tái)進(jìn)程在刷新緩沖區(qū)的時(shí)候深圳不相信關(guān)系是可見的(事務(wù)可能在創(chuàng)建rel的事務(wù)之前).
 * 存儲(chǔ)管理器必須可以處理這些事情.
 *
 * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have
 * to be fixed to zero them, since this struct is used as a hash key.
 * 注意:如果在結(jié)構(gòu)體中有填充的字節(jié),INIT_BUFFERTAG必須將它們固定為零，因?yàn)檫@個(gè)結(jié)構(gòu)體用作散列鍵.
 */
typedef struct buftag
{
    //物理relation標(biāo)識(shí)符
    RelFileNode rnode;          /* physical relation identifier */
    ForkNumber  forkNum;
    //相對于relation起始的塊號(hào)
    BlockNumber blockNum;       /* blknum relative to begin of reln */
} BufferTag;

二、源碼解讀

ReadBuffer_common函數(shù)是所有ReadBuffer相關(guān)的通用邏輯,其實(shí)現(xiàn)邏輯如下:
1.初始化相關(guān)變量和執(zhí)行相關(guān)判斷(是否擴(kuò)展isExtend?是否臨時(shí)表isLocalBuf?)
2.如為臨時(shí)表,則調(diào)用LocalBufferAlloc獲取描述符;否則調(diào)用BufferAlloc獲取描述符;
同時(shí),設(shè)置是否在緩存命中的標(biāo)記(變量found)
3.如在緩存中命中
3.1如非擴(kuò)展buffer,更新統(tǒng)計(jì)信息,如有需要,鎖定buffer并返回
3.2如為擴(kuò)展buffer,則獲取block
3.2.1如PageIsNew返回F,則報(bào)錯(cuò)
3.2.2如為本地buffer(臨時(shí)表),則調(diào)整標(biāo)記
3.2.3如非本地buffer,則清除BM_VALID標(biāo)記
4.沒有在緩存中命中,則獲取block
4.1如為擴(kuò)展buffer,通過填充0初始化buffer,調(diào)用smgrextend擴(kuò)展
4.2如為普通buffer
4.2.1如模式為RBM_ZERO_AND_LOCK/RBM_ZERO_AND_CLEANUP_LOCK,填充0
4.2.2否則,通過smgr(存儲(chǔ)管理器)讀取block,如需要,則跟蹤I/O時(shí)間,同時(shí)檢查垃圾數(shù)據(jù)
5.已擴(kuò)展了buffer或者已讀取了block
5.1如需要,鎖定buffer
5.2如為臨時(shí)表,則調(diào)整標(biāo)記;否則設(shè)置BM_VALID,中斷IO,喚醒等待的進(jìn)程
5.3更新統(tǒng)計(jì)信息
5.4返回buffer

/*
 * ReadBuffer_common -- common logic for all ReadBuffer variants
 * ReadBuffer_common -- 所有ReadBuffer相關(guān)的通用邏輯
 *
 * *hit is set to true if the request was satisfied from shared buffer cache.
 * *hit設(shè)置為T,如shared buffer中已存在此buffer
 */
static Buffer
ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
                  BlockNumber blockNum, ReadBufferMode mode,
                  BufferAccessStrategy strategy, bool *hit)
{
    BufferDesc *bufHdr;//buffer描述符
    Block       bufBlock;//相應(yīng)的block
    bool        found;//是否命中?
    bool        isExtend;//擴(kuò)展?
    bool        isLocalBuf = SmgrIsTemp(smgr);//本地buffer?
    *hit = false;
    /* Make sure we will have room to remember the buffer pin */
    //確保有空間存儲(chǔ)buffer pin
    ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
    //如為P_NEW,則需擴(kuò)展
    isExtend = (blockNum == P_NEW);
    //跟蹤
    TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
                                       smgr->smgr_rnode.node.spcNode,
                                       smgr->smgr_rnode.node.dbNode,
                                       smgr->smgr_rnode.node.relNode,
                                       smgr->smgr_rnode.backend,
                                       isExtend);
    /* Substitute proper block number if caller asked for P_NEW */
    //如調(diào)用方要求P_NEW，則替換適當(dāng)?shù)膲K號(hào)
    if (isExtend)
        blockNum = smgrnblocks(smgr, forkNum);
    if (isLocalBuf)
    {
        //本地buffer(臨時(shí)表)
        bufHdr = LocalBufferAlloc(smgr, forkNum, blockNum, &found);
        if (found)
            pgBufferUsage.local_blks_hit++;
        else if (isExtend)
            pgBufferUsage.local_blks_written++;
        else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG ||
                 mode == RBM_ZERO_ON_ERROR)
            pgBufferUsage.local_blks_read++;
    }
    else
    {
        //非臨時(shí)表
        /*
         * lookup the buffer.  IO_IN_PROGRESS is set if the requested block is
         * not currently in memory.
         * 搜索buffer.
         * 如請求的block不在內(nèi)存中,則IO_IN_PROGRESS設(shè)置為T
         */
        //獲取buffer描述符
        bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
                             strategy, &found);
        if (found)
            //在內(nèi)存中命中
            pgBufferUsage.shared_blks_hit++;
        else if (isExtend)
            //新的buffer
            pgBufferUsage.shared_blks_written++;
        else if (mode == RBM_NORMAL || mode == RBM_NORMAL_NO_LOG ||
                 mode == RBM_ZERO_ON_ERROR)
            //讀取block
            pgBufferUsage.shared_blks_read++;
    }
    /* At this point we do NOT hold any locks. */
    //這時(shí)候,我們還沒有持有任何鎖.
    /* if it was already in the buffer pool, we're done */
    //---------- 如果buffer已在換沖池中,工作已完成
    if (found)
    {
        //------------- buffer已在緩沖池中
        //已在換沖池中
        if (!isExtend)
        {
            //非擴(kuò)展buffer
            /* Just need to update stats before we exit */
            //在退出前,更新統(tǒng)計(jì)信息
            *hit = true;
            VacuumPageHit++;
            if (VacuumCostActive)
                VacuumCostBalance += VacuumCostPageHit;
            TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
                                              smgr->smgr_rnode.node.spcNode,
                                              smgr->smgr_rnode.node.dbNode,
                                              smgr->smgr_rnode.node.relNode,
                                              smgr->smgr_rnode.backend,
                                              isExtend,
                                              found);
            /*
             * In RBM_ZERO_AND_LOCK mode the caller expects the page to be
             * locked on return.
             * RBM_ZERO_AND_LOCK模式,調(diào)用者期望page鎖定后才返回
             */
            if (!isLocalBuf)
            {
                //非臨時(shí)表buffer
                if (mode == RBM_ZERO_AND_LOCK)
                    LWLockAcquire(BufferDescriptorGetContentLock(bufHdr),
                                  LW_EXCLUSIVE);
                else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
                    LockBufferForCleanup(BufferDescriptorGetBuffer(bufHdr));
            }
            //根據(jù)buffer描述符讀取buffer并返回buffer
            //#define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
            return BufferDescriptorGetBuffer(bufHdr);
        }
        /*
         * We get here only in the corner case where we are trying to extend
         * the relation but we found a pre-existing buffer marked BM_VALID.
         * This can happen because mdread doesn't complain about reads beyond
         * EOF (when zero_damaged_pages is ON) and so a previous attempt to
         * read a block beyond EOF could have left a "valid" zero-filled
         * buffer.  Unfortunately, we have also seen this case occurring
         * because of buggy Linux kernels that sometimes return an
         * lseek(SEEK_END) result that doesn't account for a recent write. In
         * that situation, the pre-existing buffer would contain valid data
         * that we don't want to overwrite.  Since the legitimate case should
         * always have left a zero-filled buffer, complain if not PageIsNew.
         * 程序執(zhí)行來到這里,進(jìn)程嘗試擴(kuò)展relation但發(fā)現(xiàn)了先前已存在的標(biāo)記為BM_VALID的buffer.
         * 這種情況之所以發(fā)生是因?yàn)閙dread對于在EOF之后的讀不會(huì)報(bào)錯(cuò)(zero_damaged_pages設(shè)置為ON),
         *   并且先前嘗試讀取EOF的block遺留了"valid"的已初始化(填充0)的buffer.
         * 不幸的是,我們同樣發(fā)現(xiàn)因?yàn)長inux內(nèi)核的bug(有時(shí)候會(huì)返回lseek/SEEK_END結(jié)果)導(dǎo)致這種情況.
         * 在這種情況下,先前已存在的buffer會(huì)存儲(chǔ)有效的數(shù)據(jù),這些數(shù)據(jù)不希望被覆蓋.
         * 由于合法的情況下應(yīng)該總是留下一個(gè)零填充的緩沖區(qū)，如果不是PageIsNew，則報(bào)錯(cuò)。
         */
        //獲取block
        bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
        if (!PageIsNew((Page) bufBlock))
            //不是PageIsNew,則報(bào)錯(cuò)
            ereport(ERROR,
                    (errmsg("unexpected data beyond EOF in block %u of relation %s",
                            blockNum, relpath(smgr->smgr_rnode, forkNum)),
                     errhint("This has been seen to occur with buggy kernels; consider updating your system.")));
        /*
         * We *must* do smgrextend before succeeding, else the page will not
         * be reserved by the kernel, and the next P_NEW call will decide to
         * return the same page.  Clear the BM_VALID bit, do the StartBufferIO
         * call that BufferAlloc didn't, and proceed.
         * 在成功執(zhí)行前,必須執(zhí)行smgrextend,否則的話page不能被內(nèi)核保留,
         *   同時(shí)下一個(gè)P_NEW調(diào)用會(huì)確定返回同樣的page.
         * 清除BM_VALID位，執(zhí)行BufferAlloc沒有執(zhí)行的StartBufferIO調(diào)用，然后繼續(xù)。
         */
        if (isLocalBuf)
        {
            //臨時(shí)表
            /* Only need to adjust flags */
            //只需要調(diào)整標(biāo)記
            uint32      buf_state = pg_atomic_read_u32(&bufHdr->state);
            Assert(buf_state & BM_VALID);
            buf_state &= ~BM_VALID;
            pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
        }
        else
        {
            //非臨時(shí)表
            /*
             * Loop to handle the very small possibility that someone re-sets
             * BM_VALID between our clearing it and StartBufferIO inspecting
             * it.
             * 循環(huán),直至StartBufferIO返回T為止
             */
            do
            {
                uint32      buf_state = LockBufHdr(bufHdr);
                Assert(buf_state & BM_VALID);
                //清除BM_VALID標(biāo)記
                buf_state &= ~BM_VALID;
                UnlockBufHdr(bufHdr, buf_state);
            } while (!StartBufferIO(bufHdr, true));
        }
    }
    //------------- buffer不在緩沖池中
    /*
     * if we have gotten to this point, we have allocated a buffer for the
     * page but its contents are not yet valid.  IO_IN_PROGRESS is set for it,
     * if it's a shared buffer.
     * 如果到了這個(gè)份上,我們已經(jīng)為page分配了buffer,但其中的內(nèi)容還沒有生效.
     * 如果是共享內(nèi)存,那么設(shè)置IO_IN_PROGRESS標(biāo)記.
     *
     * Note: if smgrextend fails, we will end up with a buffer that is
     * allocated but not marked BM_VALID.  P_NEW will still select the same
     * block number (because the relation didn't get any longer on disk) and
     * so future attempts to extend the relation will find the same buffer (if
     * it's not been recycled) but come right back here to try smgrextend
     * again.
     * 注意:如果smgrextend失敗,我們將以一個(gè)已分配但為設(shè)置為BM_VALID的buffer結(jié)束這次調(diào)用
     */
    //驗(yàn)證
    Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));   /* spinlock not needed */
    //獲取block
    bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
    if (isExtend)
    {
        //-------- 擴(kuò)展block
        /* new buffers are zero-filled */
        //新buffers使用0填充
        MemSet((char *) bufBlock, 0, BLCKSZ);
        /* don't set checksum for all-zero page */
        //對于使用全0填充的page,不要設(shè)置checksum
        smgrextend(smgr, forkNum, blockNum, (char *) bufBlock, false);
        /*
         * NB: we're *not* doing a ScheduleBufferTagForWriteback here;
         * although we're essentially performing a write. At least on linux
         * doing so defeats the 'delayed allocation' mechanism, leading to
         * increased file fragmentation.
         * 注意:這里我們不會(huì)執(zhí)行ScheduleBufferTagForWriteback.雖然我們實(shí)質(zhì)上正在執(zhí)行寫操作.
         * 起碼,在Linux平臺(tái),執(zhí)行這個(gè)操作會(huì)破壞“延遲分配”機(jī)制,導(dǎo)致文件碎片.
         */
    }
    else
    {
        //-------- 普通block
        /*
         * Read in the page, unless the caller intends to overwrite it and
         * just wants us to allocate a buffer.
         * 讀取page,除非調(diào)用者期望覆蓋它并且希望我們分配buffer.
         * 
         */
        if (mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK)
            //如為RBM_ZERO_AND_LOCK或者RBM_ZERO_AND_CLEANUP_LOCK模式,初始化為0
            MemSet((char *) bufBlock, 0, BLCKSZ);
        else
        {
            //其他模式
            instr_time  io_start,//io的起止時(shí)間
                        io_time;
            if (track_io_timing)
                INSTR_TIME_SET_CURRENT(io_start);
            //smgr(存儲(chǔ)管理器)讀取block
            smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
            if (track_io_timing)
            {
                //需要跟蹤io時(shí)間
                INSTR_TIME_SET_CURRENT(io_time);
                INSTR_TIME_SUBTRACT(io_time, io_start);
                pgstat_count_buffer_read_time(INSTR_TIME_GET_MICROSEC(io_time));
                INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
            }
            /* check for garbage data */
            //檢查垃圾數(shù)據(jù)
            if (!PageIsVerified((Page) bufBlock, blockNum))
            {
                //如果page為通過驗(yàn)證
                if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages)
                {
                    //出錯(cuò),則初始化
                    ereport(WARNING,
                            (errcode(ERRCODE_DATA_CORRUPTED),
                             errmsg("invalid page in block %u of relation %s; zeroing out page",
                                    blockNum,
                                    relpath(smgr->smgr_rnode, forkNum))));
                    //初始化
                    MemSet((char *) bufBlock, 0, BLCKSZ);
                }
                else
                    //出錯(cuò),報(bào)錯(cuò)
                    ereport(ERROR,
                            (errcode(ERRCODE_DATA_CORRUPTED),
                             errmsg("invalid page in block %u of relation %s",
                                    blockNum,
                                    relpath(smgr->smgr_rnode, forkNum))));
            }
        }
    }
    //--------- 已擴(kuò)展了buffer或者已讀取了block
    /*
     * In RBM_ZERO_AND_LOCK mode, grab the buffer content lock before marking
     * the page as valid, to make sure that no other backend sees the zeroed
     * page before the caller has had a chance to initialize it.
     * 在RBM_ZERO_AND_LOCK模式下,在標(biāo)記page為有效之前獲取buffer content lock,
     *   確保在調(diào)用者初始化之前沒有其他進(jìn)程看到已初始化為0的page
     *
     * Since no-one else can be looking at the page contents yet, there is no
     * difference between an exclusive lock and a cleanup-strength lock. (Note
     * that we cannot use LockBuffer() or LockBufferForCleanup() here, because
     * they assert that the buffer is already valid.)
     * 由于沒有其他進(jìn)程可以搜索page內(nèi)容,因此獲取獨(dú)占鎖和cleanup-strength鎖沒有區(qū)別.
     * (注意不能在這里使用LockBuffer()或者LockBufferForCleanup(),因?yàn)檫@些函數(shù)假定buffer有效)
     */
    if ((mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK) &&
        !isLocalBuf)
    {
        //鎖定
        LWLockAcquire(BufferDescriptorGetContentLock(bufHdr), LW_EXCLUSIVE);
    }
    if (isLocalBuf)
    {
        //臨時(shí)表
        /* Only need to adjust flags */
        //只需要調(diào)整標(biāo)記
        uint32      buf_state = pg_atomic_read_u32(&bufHdr->state);
        buf_state |= BM_VALID;
        pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
    }
    else
    {
        //普通表
        /* Set BM_VALID, terminate IO, and wake up any waiters */
        //設(shè)置BM_VALID,中斷IO,喚醒等待的進(jìn)程
        TerminateBufferIO(bufHdr, false, BM_VALID);
    }
    //更新統(tǒng)計(jì)信息
    VacuumPageMiss++;
    if (VacuumCostActive)
        VacuumCostBalance += VacuumCostPageMiss;
    //跟蹤
    TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
                                      smgr->smgr_rnode.node.spcNode,
                                      smgr->smgr_rnode.node.dbNode,
                                      smgr->smgr_rnode.node.relNode,
                                      smgr->smgr_rnode.backend,
                                      isExtend,
                                      found);
    //返回buffer
    //#define BufferDescriptorGetBuffer(bdesc) ((bdesc)->buf_id + 1)
    return BufferDescriptorGetBuffer(bufHdr);
}

三、跟蹤分析

測試場景一:Block不在緩沖區(qū)中
腳本:

16:42:48 (xdb@[local]:5432)testdb=# select * from t1 limit 10;

啟動(dòng)gdb,設(shè)置斷點(diǎn)

(gdb) b ReadBuffer_common
Breakpoint 1 at 0x876e28: file bufmgr.c, line 711.
(gdb) c
Continuing.
Breakpoint 1, ReadBuffer_common (smgr=0x2b7cce0, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, 
    strategy=0x0, hit=0x7ffc7761dfab) at bufmgr.c:711
711     bool        isLocalBuf = SmgrIsTemp(smgr);
(gdb)

1.初始化相關(guān)變量和執(zhí)行相關(guān)判斷(是否擴(kuò)展isExtend?是否臨時(shí)表isLocalBuf?)

(gdb) n
713     *hit = false;
(gdb) 
716     ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
(gdb) 
718     isExtend = (blockNum == P_NEW);
(gdb) 
720     TRACE_POSTGRESQL_BUFFER_READ_START(forkNum, blockNum,
(gdb) 
728     if (isExtend)
(gdb) 
731     if (isLocalBuf)
(gdb) 
745         bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
(gdb)

2.調(diào)用BufferAlloc獲取buffer描述符

(gdb) 
747         if (found)
(gdb) p *bufHdr
$1 = {tag = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0}, 
  buf_id = 108, state = {value = 2248409089}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 54, state = {
      value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}}
(gdb) p found
$2 = false
(gdb) 
(gdb) n
750             pgBufferUsage.shared_blks_read++; --> 更新統(tǒng)計(jì)信息
(gdb)

4.沒有在緩存中命中,則獲取block

756     if (found)
(gdb) 
856     Assert(!(pg_atomic_read_u32(&bufHdr->state) & BM_VALID));   /* spinlock not needed */
(gdb) 
858     bufBlock = isLocalBuf ? LocalBufHdrGetBlock(bufHdr) : BufHdrGetBlock(bufHdr);
(gdb) 
860     if (isExtend)
(gdb) p bufBlock
$4 = (Block) 0x7fe8c240e380

4.2如為普通buffer
4.2.1如模式為RBM_ZERO_AND_LOCK/RBM_ZERO_AND_CLEANUP_LOCK,填充0
4.2.2否則,通過smgr(存儲(chǔ)管理器)讀取block,如需要,則跟蹤I/O時(shí)間,同時(shí)檢查垃圾數(shù)據(jù)

(gdb) p mode
$5 = RBM_NORMAL
(gdb) 
(gdb) n
880         if (mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK)
(gdb) 
887             if (track_io_timing)
(gdb) 
890             smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
(gdb) 
892             if (track_io_timing)
(gdb) p *smgr
$6 = {smgr_rnode = {node = {spcNode = 1663, dbNode = 16402, relNode = 51439}, backend = -1}, smgr_owner = 0x7fe8ee2bc7b8, 
  smgr_targblock = 4294967295, smgr_fsm_nblocks = 4294967295, smgr_vm_nblocks = 4294967295, smgr_which = 0, 
  md_num_open_segs = {1, 0, 0, 0}, md_seg_fds = {0x2b0dd78, 0x0, 0x0, 0x0}, next_unowned_reln = 0x0}
(gdb) p forkNum
$7 = MAIN_FORKNUM
(gdb) p blockNum
$8 = 0
(gdb) p (char *) bufBlock
$9 = 0x7fe8c240e380 "\001"
(gdb)

5.已擴(kuò)展了buffer或者已讀取了block
5.1如需要,鎖定buffer
5.2如為臨時(shí)表,則調(diào)整標(biāo)記;否則設(shè)置BM_VALID,中斷IO,喚醒等待的進(jìn)程

(gdb) n
901             if (!PageIsVerified((Page) bufBlock, blockNum))
(gdb) 
932     if ((mode == RBM_ZERO_AND_LOCK || mode == RBM_ZERO_AND_CLEANUP_LOCK) &&
(gdb) n
938     if (isLocalBuf)
(gdb) 
949         TerminateBufferIO(bufHdr, false, BM_VALID);
(gdb)

5.3更新統(tǒng)計(jì)信息
5.4返回buffer

(gdb) 
952     VacuumPageMiss++;
(gdb) 
953     if (VacuumCostActive)
(gdb) 
956     TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
(gdb) 
964     return BufferDescriptorGetBuffer(bufHdr);
(gdb) 
965 }
(gdb)

buf為109

(gdb) 
ReadBufferExtended (reln=0x7fe8ee2bc7a8, forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0) at bufmgr.c:666
666     if (hit)
(gdb) 
668     return buf;
(gdb) p buf
$10 = 109
(gdb)

測試場景二:Block已在緩沖區(qū)中
再次執(zhí)行上面的SQL語句,這時(shí)候相應(yīng)的block已讀入到buffer中

(gdb) del
Delete all breakpoints? (y or n) y
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007fe8ec448903 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
81  T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) b ReadBuffer_common
Breakpoint 2 at 0x876e28: file bufmgr.c, line 711.
(gdb)

found變量為T

...
(gdb) 
745         bufHdr = BufferAlloc(smgr, relpersistence, forkNum, blockNum,
(gdb) 
747         if (found)
(gdb) p found
$11 = true
(gdb) 
(gdb) n
748             pgBufferUsage.shared_blks_hit++;
(gdb)

進(jìn)入相應(yīng)的邏輯
3.如在緩存中命中
3.1如非擴(kuò)展buffer,更新統(tǒng)計(jì)信息,如有需要,鎖定buffer并返回
3.2如為擴(kuò)展buffer,則獲取block
3.2.1如PageIsNew返回F,則報(bào)錯(cuò)
3.2.2如為本地buffer(臨時(shí)表),則調(diào)整標(biāo)記
3.2.3如非本地buffer,則清除BM_VALID標(biāo)記

(gdb) 
756     if (found)
(gdb) 
758         if (!isExtend)
(gdb) 
761             *hit = true;
(gdb) 
762             VacuumPageHit++;
(gdb) 
764             if (VacuumCostActive)
(gdb) 
767             TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
(gdb) 
779             if (!isLocalBuf)
(gdb) 
781                 if (mode == RBM_ZERO_AND_LOCK)
(gdb) 
784                 else if (mode == RBM_ZERO_AND_CLEANUP_LOCK)
(gdb) 
788             return BufferDescriptorGetBuffer(bufHdr);
(gdb) 
965 }
(gdb)

到此，關(guān)于“PostgreSQL中ReadBuffer_common函數(shù)有什么作用”的學(xué)習(xí)就結(jié)束了，希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí)，快去試試吧！若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí)，請繼續(xù)關(guān)注創(chuàng)新互聯(lián)網(wǎng)站，小編會(huì)繼續(xù)努力為大家?guī)砀鄬?shí)用的文章！

分享標(biāo)題：PostgreSQL中ReadBuffer_common函數(shù)有什么作用
文章URL：http://aaarwkj.com/article0/phopio.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供面包屑導(dǎo)航、靜態(tài)網(wǎng)站、企業(yè)建站、關(guān)鍵詞優(yōu)化、網(wǎng)站改版、網(wǎng)站設(shè)計(jì)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

欧美一级特黄大片做受成人-亚洲成人一区二区电影-激情熟女一区二区三区-日韩专区欧美专区国产专区

PostgreSQL中ReadBuffer_common函數(shù)有什么作用

一、數(shù)據(jù)結(jié)構(gòu)

二、源碼解讀

三、跟蹤分析

一、數(shù)據(jù)結(jié)構(gòu)

三、跟蹤分析