Thursday 28 November 2013

Glossary - Definitions of RAC Components/Back ground Process.. 11g Release 2 (11.2)

Glossary :
Automatic Workload Repository (AWR)
A built-in repository that exists in every Oracle database. At regular intervals, Oracle Database makes a snapshot of all of its vital statistics and workload information and stores them in the AWR.

administrator-managed database
A database that you specifically define on which servers it can run, and where services can run within the database.

cache coherency
The synchronization of data in multiple caches so that reading a memory location through any cache will return the most recent data written to that location through any other cache. Sometimes called cache consistency.

Cache Fusion
A diskless cache coherency mechanism in Oracle RAC that provides copies of blocks directly from a holding instance's memory cache to a requesting instance's memory cache.

cardinality
The number of database instances you want running during normal operations.

cluster
Multiple interconnected computers or servers that appear as if they are one server to end users and applications.

cluster file system
A distributed file system that is a cluster of servers that collaborate to provide high performance service to their clients. Cluster file system software deals with distributing requests to storage cluster components.

cluster database
The generic term for a Oracle RAC database.

Cluster Ready Services Daemon (CRSD)
The primary Oracle Clusterware process that performs high availability recovery and management operations, such as maintaining OCR. Also manages application resources and runs as root user (or by a user in the admin group on Mac OS X-based systems) and restarts automatically upon failure.

Cluster Synchronization Services (CSS)
An Oracle Clusterware component that discovers and tracks the membership state of each node by providing a common view of membership across the cluster. CSS also monitors process health, specifically the health of the database instance. The Global Enqueue Service Monitor (LMON), a background process that monitors the health of the cluster database environment and registers and de-registers from CSS. See also, OCSSD.

Cluster Time Synchronization Service
A time synchronization mechanism that ensures that all internal clocks of all nodes in a cluster are synchronized.

Cluster Verification Utility (CVU)
A tool that verifies a wide range of Oracle RAC components such as shared storage devices, networking configurations, system requirements, Oracle Clusterware, groups, and users.

Distributed Transaction Processing (DTP)
The paradigm of distributed transactions, including both XA-type externally coordinated transactions, and distributed-SQL-type (database links in Oracle) internally coordinated transactions.

Event Manager (EVM)
The background process that publishes Oracle Clusterware events. EVM scans the designated callout directory and runs all scripts in that directory when an event occurs.

Event Manager Daemon (EVMD)
A Linux or UNIX event manager daemon that starts the racgevt process to manage callouts.

extended distance cluster
A cluster where the nodes in the cluster are separated by greater distances from two buildings across the street, to across a campus or across a city. For availability reasons, the data needs to be located at both sites, and therefore one needs to look at alternatives for mirroring the storage.

failure group
A failure group is a subset of the disks in a disk group, which could fail at the same time because they share hardware. Failure groups are used to store mirror copies of data.

Fast Application Notification (FAN)
Applications can use FAN to enable rapid failure detection, balancing of connection pools after failures, and re-balancing of connection pools when failed components are repaired. The FAN notification process uses system events that Oracle Database publishes when cluster servers become unreachable or if network interfaces fail.

Fast Connection Failover
Fast Connection Failover provides high availability to FAN integrated clients, such as clients that use JDBC, OCI, or ODP.NET. If you configure the client to use fast connection failover, then the client automatically subscribes to FAN events and can react to database UP and DOWN events. In response, Oracle Database gives the client a connection to an active instance that provides the requested database service.

forced disk write
In Oracle RAC, a particular data block can only be modified by one instance at a time. If one instance modifies a data block that another instance needs, then whether a forced disk write is required depends on the type of request submitted for the block.

General Parallel File System (GPFS)
General Parallel File System (GPFS) is a shared-disk IBM file system product that provides data access from all of the nodes in a homogenous or heterogeneous cluster.

Global Cache Service (GCS)
Process that implement Cache Fusion. It maintains the block mode for blocks in the global role. It is responsible for block transfers between instances. The Global Cache Service employs various background processes such as the Global Cache Service Processes (LMSn) and Global Enqueue Service Daemon (LMD).

Global Cache Service Processes (LMSn)
Processes that manage remote messages. Oracle RAC provides for up to 10 Global Cache Service Processes.

Global Cache Service (GCS) resources
Global resources that coordinate access to data blocks in the buffer caches of multiple Oracle RAC instances to provide cache coherency.

global database name
The full name of the database that uniquely identifies it from any other database. The global database name is of the form database_name.database_domain—for example: OP.US.FOO.COM

global dynamic performance views (GV$)
Dynamic performance views storing information about all open instances in an Oracle RAC cluster. (Not only the local instance.) In contrast, standard dynamic performance views (V$) only store information about the local instance.

Global Enqueue Service (GES)
A service that coordinates enqueues that are shared globally.

Global Enqueue Service Daemon (LMD)
The resource agent process that manages requests for resources to control access to blocks. The LMD process also handles deadlock detection and remote resource requests. Remote resource requests are requests 
originating from another instance.

Global Enqueue Service Monitor (LMON)
The background LMON process monitors the entire cluster to manage global resources. LMON manages instance deaths and the associated recovery for any failed instance. In particular, LMON handles the part of recovery associated with global resources. LMON-provided services are also known as Cluster Group Services.

Global Services Daemon (GSD)
A component that receives requests from SRVCTL to execute administrative job tasks, such as startup or shutdown. The command is executed locally on each node, and the results are returned to SRVCTL. GSD is installed on the nodes by default.

Oracle Grid Infrastructure
The software that provides the infrastructure for an enterprise grid architecture. In a cluster this software includes Oracle Clusterware and Oracle Automatic Storage Management (Oracle ASM). For a standalone server, this software includes Oracle Restart and Oracle ASM. Oracle Database 11g release 2 (11.2) combines these infrastructure products into one software installation called the Oracle Grid Infrastructure home (Grid_home).

Grid Plug and Play Daemon (GPNPD
This process provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes node have the most recent profile.

High Availability Cluster Multi-Processing (HACMP)
High Availability Cluster Multi-Processing is an IBM AIX-based high availability cluster software product. HACMP has two major components: high availability (HA) and cluster multi-processing (CMP).

high availability
Systems with redundant components that provide consistent and uninterrupted service, even in the event of hardware or software failures. This involves some degree of redundancy.

instance
For an Oracle RAC database, each node in a cluster usually has one instance of the running Oracle software that references the database. When a database is started, Oracle Database allocates a memory area called the System Global Area (SGA) and starts one or more Oracle Database processes. This combination of the SGA and the Oracle Database processes is called an instance. Each instance has unique Oracle System Identifier (SID), instance name, rollback segments, and thread ID.

instance membership recovery
The method used by Oracle RAC guaranteeing that all cluster members are functional or active. instance membership recovery polls and arbitrates the membership. Any members that do not show a heartbeat by way of the control file or who do not respond to periodic activity inquiry messages are presumed terminated.

instance name
Represents the name of the instance and is used to uniquely identify a specific instance when clusters share common services names. The instance name is identified by the INSTANCE_NAME parameter in the instance initialization file, initsid.ora. The instance name is the same as the Oracle System Identifier (SID).

instance number
A number that associates extents of data blocks with particular instances. The instance number enables you to start up an instance and ensure that it uses the extents allocated to it for inserts and updates. This will ensure that it does not use space allocated for other instances.

interconnect
The communication link between nodes.

Logical Volume Manager (LVM)
A generic term that describes Linux or UNIX subsystems for online disk storage management.

Interprocess Communication (IPC)
A high-speed operating system-dependent transport component. The IPC transfers messages between instances on different nodes. Also referred to as the interconnect.

Master Boot Record (MBR)
A program that executes when a computer starts. Typically, the MBR resides on the first sector of a local hard disk. The program begins the startup process by examining the partition table to determine which partition to use for starting the system. The MBR program then transfers control to the boot sector of the startup partition, which continues the startup process.

metric
The rate of change in a cumulative statistic.

Network Attached Storage (NAS)
Storage that is attached to a server by way of a network.

Network Time Protocol (NTP)
An Internet standard protocol, built on top of TCP/IP, that ensures the accurate synchronization to the millisecond of the computer clock times in a network of computers.

Network Interface Card (NIC)
A card that you insert into a computer to connect the computer to a network.

node
A node is a computer system on which Oracle RAC and Oracle Clusterware software are installed.

Object Link Manager (OLM)
The Oracle interface that maps symbolic links to logical drives and displays them in the OLM graphical user interface.

OCSSD
A Linux or UNIX process that manages the Cluster Synchronization Services (CSS) daemon. Manages cluster node membership and runs as oracle user; failure of this process results in cluster restart.

Oracle Cluster File Systems
Oracle offers two cluster file systems, OCFS for Windows and OCFS2 for Linux. While OCFS for Windows is a proprietary file system, the source for OCFS2 for Linux is available to all under GNUs' General Public License (GPL). The two file systems are not compatible.

Oracle Cluster Registry (OCR)
The Oracle RAC configuration information repository that manages information about the cluster node list and instance-to-node mapping information. OCR also manages information about Oracle Clusterware resource profiles for customized applications.

Oracle Enterprise Manager Configuration Assistant (EMCA)
A graphical user interface-based configuration assistant that you can use to configure Oracle Enterprise Manager features.

Oracle Grid Naming Service Daemon (GNSD)
The Oracle Grid Naming Service is a gateway between the cluster mDNS and external DNS servers. The gnsd process performs name resolution within the cluster.

Oracle High Availability Services Daemon (OHASD)
This process anchors the lower part of the Oracle Clusterware stack, which consists of processes that facilitate cluster operations.

Oracle Interface Configuration Tool (OIFCFG)
A command-line tool for both noncluster Oracle databases and Oracle RAC databases that enables you to allocate and de-allocate network interfaces to components, direct components to use specific network interfaces, and retrieve component configuration information. The Oracle Universal Installer also uses OIFCFG to identify and display available interfaces.

Oracle Managed Files
A service that automates naming, location, creation, and deletion of database files such as control files, redo log files, data files and others, based on a few initialization parameters. You can use Oracle Managed Files on top of a traditional file system supported by the host operating system, for example, VxFS or ODM. It can simplify many aspects of the database administration by eliminating the need to devise your own policies for such details.

Oracle Notification Service
A publish and subscribe service for communicating information about all FAN events.

Oracle Clusterware
This is clusterware that is provided by Oracle to manage cluster database processing including node membership, group services, global resource management, and high availability functions.

Oracle Universal Installer
A tool to install Oracle Clusterware, the Oracle relational database software, and the Oracle RAC software. You can also use the Oracle Universal Installer to launch the Database Configuration Assistant (DBCA).

policy-managed database
A database that you define as a cluster resource. Management of the database is defined by how you configure the resource, including on which servers the database can run and how many instances of the database are necessary to support the expected workload.

raw device
A disk drive that does not yet have a file system set up. Raw devices are used for Oracle RAC because they enable the sharing of disks. 

raw partition
A portion of a physical disk that is accessed at the lowest possible level. A raw partition is created when an extended partition is created and logical partitions are assigned to it without any formatting. Once formatting is complete, it is called a cooked partition. See also raw device.

Recovery Manager (RMAN)
An Oracle tool that enables you to back up, copy, restore, and recover data files, control files, and archived redo logs. It is included with the Oracle server and does not require separate installation. You can run RMAN as a command line utility from the operating system (O/S) prompt or use the GUI-based Oracle Enterprise Manager Backup Manager.

result cache
A result cache is an area of memory, either in the SGA or client application memory, that stores the result of a database query or query block for reuse. The cached rows are shared across statements and sessions unless they become stale.

Runtime Connection Load Balancing
Enables Oracle Database to make intelligent service connection decisions based on the connection pool that provides the optimal service for the requested application based on current workloads. The JDBC, ODP.NET, and OCI clients are integrated with the load balancing advisory; you can use any of these client environments to provide runtime connection load balancing.

scalability
The ability to add additional nodes to Oracle RAC applications and achieve markedly improved scale-up and speed-up.

Secure Shell (SSH)
A program for logging into a remote computer over a network. You can use SSH to execute commands on a remote system and to move files from one system to another. SSH uses strong authentication and secure communications over insecure channels.

Server Control Utility (SRVCTL)
Server Management (SRVM) comprises the components required to operate Oracle Enterprise Manager in Oracle RAC. The SRVM components, such as the Intelligent Agent, Global Services Daemon, and SRVCTL, enable you to manage cluster databases running in heterogeneous environments through an open client/server architecture using Oracle Enterprise Manager.

server
A computer system that has no Oracle software installed upon it.

server group
A logical partition of nodes in a cluster into a group that hosts applications, databases, or both. Server groups can be members of other server groups.

service level
A measure of the performance of a system.

services
Entities that you can define in Oracle RAC databases that enable you to group database workloads and route work to the optimal instances that are assigned to offer the service.

shared everything
A database architecture in which all instances share access to all of the data.

single client access name (SCAN)
Oracle Database 11g database clients use SCAN to connect to the database. SCAN can resolve to multiple IP addresses, reflecting multiple listeners in the cluster handling public client connections.

singleton services
Services that run on only one instance at any one time. By defining the Distributed Transaction Property (DTP) property of a service, you can force the service to be a singleton service.

split brain syndrome
Where two or more instances attempt to control a cluster database. In a two-node environment, for example, one instance attempts to manage updates simultaneously while the other instance attempts to manage updates.

system identifier (SID)
The Oracle system identifier (SID) identifies a specific instance of the running Oracle software. For an Oracle RAC database, each node within the cluster has an instance referencing the database.

transparent application failover (TAF)
A runtime failover for high-availability environments, such as Oracle RAC and Oracle RAC Guard, TAF refers to the failover and re-establishment of application-to-service connections. It enables client applications to automatically reconnect to the database if the connection fails, and optionally resume a SELECT statement that was in progress. This reconnect happens automatically from within the Oracle Call Interface library.

voting disk
A file that manages information about node membership.

Wallet
A wallet is a data structure used to store and manage security credentials for an individual entity.

Monday 25 November 2013

PRVF-5439: NTP daemon does not have slewing option “-x” set on node

PRVF-5439: NTP daemon does not have slewing option “-x” set on node


When installing Oracle 11gR2 Grid Infrastructure either the cluster verify utility or the prerequisite checks find that the NTP daemon does not have the slewing option set.
1
2
3
4
5
6
7
- PRVF-5439 : NTP daemon does not slewing option "-x" set
  on node "ordrac1"
  - Cause: NTP daemon on the specified node does not have the
  slewing option set
  - Action: Shutdown and restart the NTP daemon with the
  slewing option set. For more information on the NTP daemon
  slewing option refer the NTP daemon's man pages.
What is slewing?
The NTP daemon will periodically update the system clock with the time from a reference clock. If the time on the reference clock is behind the time on the system clock, the system clock will be set backwards in one large decrement. Such swift changes in time can lead to Oracle shutting down the node due to inconsistent timers. To avoid this problem, NTP can be configured to slew the clock. When slewing the clock the time on system is incremented slower until the system clock is in sync with the time on the reference system.
How to set up slewing
Stop the NTP service on the node.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@odrac1 ~]# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
[root@odrac1 ~]#
[/text/]
Edit the file /etc/sysconfig/ntpd
1
# Drop root to id 'ntp:ntp' by default.
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid"
 
# Set to 'yes' to sync hw clock after successful ntpdate
SYNC_HWCLOCK=no
 
# Additional options for ntpdate
NTPDATE_OPTIONS=""
Change the line
1
OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid" 
to
1
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"
After saving the file restart the NTP service. Re-execute the cluster verify utility or the prerequisite checks.


slewing in clusters Linux:

In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down. 

Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer. 

To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon. 

To check the current running configuration of xntpd for the -x flag:


#ps -aef | grep xntpd | grep -v grep

    root  9306258  3866670   0   Nov 12      -  0:17 /usr/sbin/xntpd 

To change with slewing options

[:root:/home/root:] chssys -s xntpd -a "-x"
0513-077 Subsystem has been changed.

To stop ntpd service/deamon:

[:root:/home/root:] stopsrc -s xntpd
0513-044 The /usr/sbin/xntpd Subsystem was requested to stop.

To start ntpd service/deamon:

[:root:/home/root:] startsrc -s xntpd
0513-059 The xntpd Subsystem has been started. Subsystem PID is 13041866.

[:root:/home/root:] ps -ef|grep ntpd
root 13041866  3801266   0 09:36:05      -  0:00 /usr/sbin/xntpd -x