Components of the SQL Server Engine:
Figure shows the general architecture of SQL Server,
which has four major components
(Three of whose subcomponents are listed):
1. Protocols,
2. The relational engine (also called the Query Processor),
3.The storage engine,
2. The relational engine (also called the Query Processor),
3.The storage engine,
4.The SQLOS
Every batch submitted to SQL Server for execution, from
any client application, must interact with these four components. (For
simplicity, I‘ve made some minor omissions and simplifications and ignored
certain ―helper‖
modules among the subcomponents).
Protocols
When an
application communicates with the SQL Server Database Engine, the application
programming interfaces (APIs) exposed by the protocol layer formats the
communication using a
Microsoft-defined
format called a tabular data stream (TDS) packet.
There
are Net-Libraries on both the server and client computers that encapsulate the
TDS packet inside a standard communication protocol, such as TCP/IP or Named
Pipes. On the server side of the communication, the Net-Libraries are part of
the Database Engine, and that protocol layer is illustrated in Figure.
Each
client connects to SQL Server with a single protocol. If the client program
does not know which protocols SQL Server is listening on, you can configure the
client to attempt multiple protocols sequentially. The following protocols are
available:
· Shared
Memory The
simplest protocol to use, with no configurable settings. Clients using the
Shared Memory protocol can connect only to a SQL Server instance running on the
same computer, so this protocol is not useful for most database activity. Use
this protocol for troubleshooting when you suspect that the other protocols are
configured incorrectly. Clients using MDAC 2.8 or earlier cannot use the Shared
Memory protocol. If such a connection is attempted, the client is switched to
the Named Pipes protocol.
· Named
Pipes A
protocol developed for local area networks (LANs). A portion of memory is used
by one process to pass information to another process, so that the output of
one is the input of the other. The second process can be local (on the same
computer as the first) or remote (on a networked computer).
· TCP/IP The most widely used
protocol over the Internet. TCP/IP can communicate across interconnected
networks of computers with diverse hardware architectures and operating
systems. It includes standards for routing network traffic and offers advanced
security features. Enabling SQL Server to use TCP/IP requires the most configuration
effort, but most networked computers are already properly configured.
· Virtual Interface Adapter (VIA) A protocol that works
with VIA hardware. This is a specialized protocol; configuration details are
available from your hardware vendor.
Tabular
Data Stream Endpoints
SQL Server 2005 also introduces a new concept for defining
SQL Server connections: the connection is represented on the server end by a
TDS endpoint. During setup, SQL Server creates an endpoint for each of the four
Net-Library protocols supported by SQL Server, and if the protocol is enabled,
all users have access to it. For disabled protocols, the endpoint still exists
but cannot be used. An additional endpoint is created for the dedicated
administrator connection (DAC), which can be used only by members of the
sysadmin fixed server role.
The
Relational Engine
As
mentioned earlier, the relational engine is also called the query processor. It
includes the components of SQL Server that determine exactly what your query
needs to do and the best way to do it. By far the most complex component of the
query processor, and maybe even of the entire SQL Server product, is the query
optimizer, which determines the best execution plan for the queries in the
batch. The relational engine also manages the execution of queries as it
requests data from the storage engine and processes the results returned.
Communication between the relational engine and the storage engine is generally
in terms of OLE DB row sets. (Row set is
the OLE DB term for a result
set.) The storage engine comprises the components needed to actually access
and modify data on disk.
The
Command Parser
The
command parser handles Transact-SQL language events sent to SQL Server. It
checks for proper syntax and translates Transact-SQL commands into an internal
format that can be operated on. This internal format is known as a query tree.
If the parser doesn‘t recognize the syntax, a syntax error is immediately
raised that identifies where the error occurred. However, non-syntax error
messages cannot be explicit about the exact source line that caused the error. Because
only the command parser can access the source of the statement, the statement
is no longer available in source format when the command is actually executed.
The
Query Optimizer
The
query optimizer takes the query tree from the command parser and prepares it
for execution. Statements that can‘t be optimized, such as flow-of-control and
DDL commands, are compiled into an internal form.
The
statements that are optimizable are marked as such and then passed to the
optimizer. The optimizer is mainly concerned with the DML statement SELECT, INSERT, UPDATE, and DELETE,
which can be processed in more than one way, and it is the optimizer‘s job to
determine which of the many possible ways is the best. It compiles an entire
command batch, optimizes queries that are optimizable, and checks security.
The
query optimization and compilation result in an execution plan. The first step
in producing such a plan is to normalize each query, which potentially
breaks down a single query into multiple, fine-grained queries. After the
optimizer normalizes a query, it optimizes it, which means it determines a
plan for executing that query.
Query
optimization is cost based; the optimizer chooses the plan that it determines
would cost the least based on internal metrics that include estimated memory
requirements, CPU utilization, and number of required I/Os.
The
Relational Engine
As
mentioned earlier, the relational engine is also called the query processor. It
includes the components of SQL Server that determine exactly what your query
needs to do and the best way to do it. By far the most complex component of the
query processor, and maybe even of the entire SQL Server product, is the query
optimizer, which determines the best execution plan for the queries in the
batch. The relational engine also manages the execution of queries as it
requests data from the storage engine and processes the results returned.
Communication between the relational engine and the storage engine is generally
in terms of OLE DB row sets. (Row set is
the OLE DB term for a result
set.) The storage engine comprises the components needed to actually access
and modify data on disk.
The
Command Parser
The
command parser handles Transact-SQL language events sent to SQL Server. It
checks for proper syntax and translates Transact-SQL commands into an internal
format that can be operated on. This internal format is known as a query tree.
If the parser doesn‘t recognize the syntax, a syntax error is immediately
raised that identifies where the error occurred. However, non-syntax error
messages cannot be explicit about the exact source line that caused the error.
Because only the command parser can access the source of the statement, the
statement is no longer available in source format when the command is actually
executed.
The
Query Optimizer
The
query optimizer takes the query tree from the command parser and prepares it
for execution. Statements that can‘t be optimized, such as flow-of-control and
DDL commands, are compiled into an internal form.
The
statements that are optimizable are marked as such and then passed to the
optimizer. The optimizer is mainly concerned with the DML statement SELECT, INSERT, UPDATE, and DELETE,
which can be processed in more than one way, and it is the optimizer‘s job to
determine which of the many possible ways is the best. It compiles an entire
command batch, optimizes queries that are optimizable, and checks security.
The
query optimization and compilation result in an execution plan. The first step
in producing such a plan is to normalize each query, which potentially
breaks down a single query into multiple, fine-grained queries. After the
optimizer normalizes a query, it optimizes it, which means it determines a
plan for executing that query.
Query
optimization is cost based; the optimizer chooses the plan that it determines
would cost the least based on internal metrics that include estimated memory
requirements, CPU utilization, and number of required I/Os.
The
optimizer considers the type of statement requested, checks the amount of data
in the various tables affected, looks at the indexes available for each table,
and then looks at a sampling of the data values kept for each index or column
referenced in the query.
The SQL
Manager
The SQL
manager is responsible for everything related to managing stored procedures and
their plans. It determines when a stored procedure needs recompilation, and it
manages the caching of procedure plans so that other processes can reuse them.
The SQL manager also handles auto parameterization of queries. In SQL Server
2008, certain kinds of ad hoc queries are treated as if they were parameterized
stored procedures, and query plans are generated and saved for them. SQL Server
can save and reuse plans in several other ways, but in some situations using a
saved plan might not be a good idea.
The
Database Manager
The
database manager handles access to the metadata needed for query compilation
and optimization, making it clear that none of these separate modules can be
run completely separately from the others. The metadata is stored as data and
is managed by the storage engine, but metadata elements such as the data types
of columns and the available indexes on a table must be available during the
query compilation and optimization phase, before actual query execution starts.
The
Query Executor
The
query executor runs the execution plan that the optimizer produced, acting as a
dispatcher for all the commands in the execution plan. This module steps
through each command of the execution plan.
until
the batch is complete. Most of the commands require interaction with the
storage engine to modify or retrieve data and to manage transactions and
locking.
The
Storage Engine
The SQL
Server storage engine has traditionally been considered to include all the
components involved with the actual processing of data in your database. SQL
Server 2005 separates out some of these components into a module called the
SQLOS. In fact, the SQL Server storage engine team at Microsoft actually
encompasses three areas: access methods, transaction management, and the SQLOS.
Transaction
Services
A core
feature of SQL Server is its ability to ensure that transactions are atomic–that is, all or nothing.
In addition, transactions must be durable, which means that if a transaction
has been committed, it must be recoverable by SQL Server no matter what–even if
a total system failure occurs 1 millisecond after the commit was acknowledged.
There are actually four properties that transactions must adhere to, called the
ACID properties: atomicity, consistency, isolation, and durability.
Locking
Operations Locking is a
crucial function of a multi-user database system such as SQL Server, even if
you are operating primarily in the snapshot isolation level with optimistic
concurrency. SQL Server lets you manage multiple users simultaneously and
ensures that the transactions observe the properties of the chosen isolation
level. Even though readers will not block writers and writers will not block
readers in snapshot isolation, writers do acquire locks and can still block
other writers, and if two writers try to change the same data concurrently, a
conflict will occur that must be resolved. The locking code acquires and
releases various types of locks, such as share locks for reading, exclusive
locks for writing, intent locks taken at a higher granularity to signal a
potential ―plan‖ to
perform some operation, and extent locks for space allocation. It manages
compatibility between the lock types, resolves deadlocks, and escalates locks
if needed. The locking code controls table, page, and row locks as well as
system data locks.
The SQLOS
Whether
the components of the SQLOS layer are actually part of the storage engine
depends on whom you ask. In addition, trying to figure out exactly which
components are in the SQLOS layer can be rather like herding cats.
I
have seen several technical presentations on the topic at conferences and have
exchanged e-mail and even spoken face to face with members of the product team,
but the answers vary. The manager who said he was responsible for the SQLOS
layer defined the SQLOS as everything he was responsible for, which is a rather
circular definition.
Earlier
versions of SQL Server have a thin layer of interfaces between the storage
engine and the actual operating system through which SQL Server makes calls to
the OS for memory allocation, scheduler resources, thread and worker
management, and synchronization objects. However, the services in SQL Server
that needed to access these interfaces can be in any part of the engine.
No comments:
Post a Comment