SAS provides many features for manipulating and processing data. Much of the functionality it provides is similar to that offered by traditional database products (e.g. Oracle). In fact many organisations use it primarily in that way. However SAS has two particular strengths. Firstly the range of built-in functions is enormous and offers huge potential for fine grained manipulation numeric and textual data. Secondly it has a wide range of statistical procedures. Just about everything you could conceive of is covered. Some of these procedures come with the base package, others require you purchase additional modules. If SAS has a weakness, it is its rather idiosyncratic syntax, particularly of the data step, the core means for manipulating data. There are some tutorials available which will help get you started. Perseverance with the data step is rewarded by the fact that it is extremely flexible. With it the user is able to iterate over a dataset, simultaneously read from multiple sources and write to multiple output sets. The SQL programmer may be pleased to learn that the SAS procedure "proc sql" gives access to a largely recognisable variant of the SQL dialect. SAS SQL is enhanced by the fact that most SAS functions can be used inline with SQL code. A brief description of the key parts of the SAS system is given below.
- SAS/Base - the core system, provides the data step, proc sql and some statistical procedures
- SAS/STAT - a range of statistical procedures (e.g regression, linear models, logistic regression, clustering)
- SAS/ETS - procedures for time series analysis - seasonal decomposition, smoothing, interpolation etc
- SAS/Access - facility to allow transparent connection to other data sources (e.g. Oracle, Teradata, SQL Server, ODBC)
- SAS/Connect - allows parallel processing across multiple machines
- SAS/IML - matrix manipulation language
Set oriented language, used to manipulate and manage data stored in relational database systems (e.g. Oracle, Teradata, SQL Server). The language style is very different to the often encountered procedural languages (e.g. Basic). There is no flow control. The user specifies how a collection of data is to be treated and the database system (via its optimizer) works out the best method to achieve this. Look at our SQL tutorials here. Key components of the SQL language are
- the select statement
- set operations
- window functions
- other data manipulation statements
- data definition statements
- transaction control statements
- security control statements
A relatively recent extension to the sql language is the ability to write "recursive" queries. With a little ingenuity these can open up the possibility of using SQL to create simulations (i.e. where the output of one period is dependent on the output of previous periods), an option not previously possible without resorting to cursors.