C# via RoslynAPI - The Big Picture

This article shows everything C# program can contain. I mean it, EVERYTHING. After reading this article you'll understand what is possible in ANY C# program. And nobody will tell you that you are wrong. But be aware that the data presented here is based on the data, presented in Microsoft.CodeAnalysis.CSharp v3.9.0 NuGet package, which is RoslynAPI for C# language. And I will even dare to say that this NuGet package is the latest stable C# Language Specification. And after reading this article to the end, you should understand why.

Before We Go (Prerequisites)

You should already know what is SyntaxTree in RoslynAPI. For this, I recommend reading the official documentation that can be found here: Get started with syntax analysis (Roslyn APIs) | Microsoft Docs

Also, my previous article may help to grasp the core ideas of this one. It can be found here: C# RoslynAPI SyntaxTree - Variables Are Not Variables (Gotchas)

All syntax trees in examples have gotten by using SyntaxTreeVisualizer extension. And it's important to know, that this extension omits "Syntax" postfix for all syntax nodes. That is why when we see "StructDeclaration" we should read it as StructDeclarationSyntax which is a particular class inside Microsoft.CodeAnalysis.CSharp.Syntax namespace.

Let's consider the most basic C# language unit in RoslynAPI - CSharpSyntaxNode. Why is it the most basic? Because any C# Syntax Unit like a condition or statement, class or method is inherited from the CSharpSyntaxNode. For example, here is the inheritance chain for a class:

CSharpSyntaxNode
MemberDeclarationSyntax
BaseTypeDeclarationSyntax
TypeDeclarationSyntax
ClassDeclarationSyntax

Now, when we know where it all starts, we can dig deeper.

And when I say deeper I mean we should go down the inheritance chain of CSharpSyntaxNode. Why? Because if we know that any C# construct is represented by it, we can see all possible constructs in C#. So, our next steps are to see all public SyntaxNodes that the RoslynAPI exposes.

Going Down the Rabbit Hole

Be aware, the first level of the hierarchy contains quite a lot of information. But don't be discouraged by it, take a brief look and go on reading. So, here it is:

CSharpSyntaxNode - Top-Level Inheritance Hierarchy

Here we can notice that all of those top-level classes are abstract and directly inherited from CSharpSyntaxNode. Except for the lower node. I grouped all directly derived from CSharpSyntaxNode sealed classes at the end of the tree and called them, well, "sealed". And it is the only one "group" in this list that doesn't exist in the Microsoft.CodeAnalysis.CSharp package. It contains all not abstract, sealed syntax nodes. We will take a look at them later.

ExpressionOrPatternSyntax

Let's start at the top where we can see abstract ExpressionOrPatternSyntax that also contains abstract ExpressionSyntax and PatternSyntax:

The ExpressionOrPatternSyntax does not have any direct children and can be considered as a "Type-Marker" or a shared container.

ExpressionSyntax

What is ExpressionSyntax? Have you ever seen a refactoring suggestion by Visual Studio, Resharper or Rider saying, "Use expression body for properties"?

"Use expression body for properties" - Visual Studio Suggestion

So, EVERY single construct that you write after => and before ; (semicolon) is ExpressionSyntax. In this case it is this._nvim. There are so many types derived from it, so it deserves a separate article.

Notice 1: everything including => and ; is actually ArrowExpressionClauseSyntax which is pure container for expressions.

Notice 2: a property is not the only place where ExpressionSyntax can be found. For example, it can be used inside ExpressionStatementSyntax too.

The expression syntax has another abstract child types:

Abstract Children of ExpressionSyntax Class

as well as non-abstract:

Non-Abstract Children of ExpressionSyntax Class

Once again, don't be overwhelmed. For now, this list is presented only to understand the scale.

To understand all quirks about C# syntax in RoslynAPI we should see at least one example of C# expressions. And the simplest one is BinaryExpressionSyntax. Everything it does is combines two other expressions:

and it's corresponding syntax tree:

We can notice a couple of important facts:

BinaryExpressionSyntax is ExpressionSyntax
As we will see later, IdentifierNameSyntax is also ExpressionSyntax. It means any ExpressionSyntax may contain other expressions as child nodes.

The second fact has a pretty handy consequence. It means that in C#, we can combine almost any expression with another expression. For example, we can nest one BinaryExpression into another BinaryExpression.

We can also see in the syntax visualizer that this node is "AddExpression". But it is not the name of the class that represents this node. When we select it in the syntax visualizer below the syntax tree, we can see the following:

BinaryExpressionSyntax - Node Properties in the SyntaxVisualizer

It says that the actual type of the node is BinaryExpressionSyntax, but the kind is AddExpression.

From here we could also conclude:

The syntax visualizer is showing us the kind of a node instead of its type for some nodes.
The same types of syntax nodes can have different kinds.

For example, here are some possible kinds of a binary expression:

AddExpression
SubtractExpression
MultiplyExpression
DivideExpression
ModuloExpression

TypeSyntax

This child category of ExpressionSyntax contains any type that can be specified in expressions. It has two main sub-categories:

NameSyntax
Direct children of TypeSyntax

TypeSyntax < NameSyntax < ExpressionSyntax - Inheritance Hierarchy

NameSyntax nodes and its children nodes represent names that can refer to any identifiable constructs like targets of invocations (e.g. a method name). We already saw an example of IdentifierNameSyntax above that belongs to this category.
Let's take a look at one more example of a NameSyntax - GenericNameSyntax. It's a part of the following ObjectCreatingExpressionSyntax:

and it's corresponding syntax tree:

Recap: every type derived from NameSyntax is:

ExpressionSyntax

TypeSyntax

NameSyntax

Direct children of TypeSyntax represent a reference to a specific type, used in expressions. For example, it's being used to specify a return type in any given method.

Let's see what is TypeSyntax by analyzing its direct child - ArrayTypeSyntax:

and here the syntax tree:

Recap: every child node of the ArrayType is both ExpresssionSyntax and TypeSyntax.

InstanceExpressionSyntax

The following abstract child of ExpressionSyntax is easy to understand. It contains only two sealed types:

ThisExpressionSyntax is simply any expression that uses this keyword. For example:

has the following expression tree:

And the BaseExpressionSyntax, as you might guess is used in every expression that accesses a base member. For example:

has the following syntax tree:

To summarize, this category contains nodes to access instance and base members. No more, no less.

AnonymouseFunctionExpressionSyntax

The following abstract child of ExpressionSyntax is AnonimouseFunctionExpressionSyntax:

AnonimouseFunctionExpressionSyntax < ExpressionSyntax - Inheritance Hierarchy

If you know what anonymous functions and lambda expressions are, you can already be familiar with what members of this category do. In case you would like to see an example, here is SimpleLambdaExpressionSyntax:

SimpleLambdaExpressionSyntax - Code Snippet

and the corresponding syntax tree:

SimpleLambdaExpressionSyntax - Syntax Tree

it consist of a lambda parameter and an expression body. Not quite simple, huh?

BaseObjectCreationExpressionSyntax

The following abstract child of ExpressionSyntax is BaseObjectCreatingExpressionSyntax:

ObjectCreationExpressionSyntax - Code Snippet

and the corresponding syntax tree:

ObjectCreationExpressionSyntax - Syntax Tree

PatternSyntax

Now, we should already have a basic understanding of what ExpressionSyntax is in C#. But what about PatternSyntax? PatternSyntax nodes are responsible for, you guessed it, describing a pattern matching in C#.

It was introduced in C# 7.0. You can read more about it:
C# 8.0 - Pattern Matching in C# 8.0 | Microsoft Docs

And you can read about changes introduced to pattern matching in C# 9.0:
Pattern matching changes - C# 9.0 specification proposals | Microsoft Docs

Those syntax nodes should also be self-explanatory. The following example demonstrates the concept:

and the corresponding syntax tree:

In other words, a pattern matching "matches" objects that correspond to a specific pattern. You can think of it as regular expressions for C# objects.

StatementSyntax

I call this group of nodes "method body members":

Why "method body members"? Because those are the ONLY nodes that can be a direct child of a BlockSyntax of MethodDeclarationSyntax (and other types, inherited from BaseMethodDeclarationSyntax). That's it, anything that can be specified inside a method body is a statement. Of cause, until it's an expression-body, in which case, only expressions can be specified in there.

Interesting facts:

A method body cannot contain an expression. But it may contain the ExpressionStatementSyntax which can contain any expression.
Many loop and conditional statements consist of syntax nodes scattered around different syntax node categories. For example, if statement consists of IfStatementSyntax which has the following child nodes that can be accessed via properties: (1) condition (ExpressionSyntax), (2) statement (StatementSyntax which is actually BlockSyntax), (4, optional) ElseClauseSyntax (direct child of CSharpSyntaxNode)
Statements can contain nested statements via BlockSyntax, UsingStatementSyntax, LockStatementSyntax, CheckedStatementSyntax, etc.
BlockSyntax is a pure container for other statements.
Except ThrowStatementSyntax there is also ThrowExpressionSyntax (direct child of CSharpSyntaxNode)
Except SwitchStatementSyntax there is also SwitchExpressionSyntax
LabeledStatementSyntax is a parent for statements to which it applies (not a sibling).
Multiple semicolon statements separated by white space are considered as multiple EmptyStatementSyntax. For example ;; are two EmptyStatementSyntax nodes.
Else-If is actually a nested IfStatementSyntax inside ElseClauseSyntax. Strictly speaking, IfStatementSyntax cannot contain the else-if statement because there is no else-if syntax node in RoslynAPI.

Let's take a look at IfStatementSyntax:

and it's syntax tree:

So, statements are workhorses of methods. Here is what they can be.

Loop Statements:

for
ForEach
while
do ... while
break
continue

Condition Statements:

if
switch

Control-Flow Statements:

return
yield
try
throw
label
goto

Arithmetic:

checked
unchecked

Garbage Collection Statements:

fixed
using

Multithreading:

lock

Unmanaged Statements:

unsafe

Other:

Local Functions
Expressions (via ExpressionStatementSyntax)

Notice 1: Similarly to IfStatementSyntax, CatchStatementSyntax and FinallyStatementSyntax are not inherited from StatementSyntax. Instead, they are direct children of CSharpSyntaxNode. Why? Remember "method body members" alias? Most likely the Roslyn team did so because you can't specify catch or finally keywords directly in a method body. They are only valid as children of TryStatementSyntax.

Notice 2: There is may be a confusion between [statements and keywords in C# language] and [RoslynAPI syntax classes]. For example, a one can think that catch is a statement, after reading official C# documentation:

but in RoslynAPI there are no "catch" nor "try-catch" statements. As mentioned above in RoslynAPI "catch" is represented by CatchClauseSyntax which is not a child of ExpressionSyntax class. So, a one who uses RoslynAPI may think of CatchClauseSyntax as about a separate entity because it's a direct child of CSharpSyntaxNode. In any case, we can safely call a "catch" a keyword. Have you noticed a catch here?

Notice 3: There is no unchecked statement syntax node in RoslynAPI. unchecked is using CheckedStatementSyntax but it has UncheckedStatement kind.

Notice 4: Many statements have expression variant. For example, there is CheckedStatementSyntax and CheckedExpressionSyntax.

Here is another example to understand the difference between statements and expressions of a similar kind:

and it's corresponding syntax tree:

Notice: The kind of unchecked expression is UncheckedExpression

This separation is subtle and very important to understand. Because you can't use expressions (e.g. checked-expression) directly in a method body but you can use statements (e.g. checked-statement) there. Similarly, you can't use statements in expressions (e.g. checked-statement in a property's expression body) but you can use expressions there (e.g. checked-expression).

MemberDeclarationSyntax

This category unites classes, namespaces, enums, interfaces, fields, properties, methods, etc. Basically it contains everything that can have a member. For example, a class can have the following members:

Fields
Properties
Methods
Events
Indexers

RoslynAPI defines four main member categories (abstract):

Abstract MemberDeclarationSyntax - Inheritance Hierarchy

as well as non-abstract sealed direct children:

Non-Abstract MemberDeclarationSyntax - Inheritance Hierarchy

As you can see there are constructs that are unique and doesn't have any other sub-types. Why? If we consider namespaces, then it's obvious that there is only one type of a namespace declaration. And if C# language design team decide to create another tricky kind of a namespace then the Roslyn team may introduce another base type.

We will see examples in the following sections.

TypeDeclarationSyntax

I would say those nodes are the heart of the C# language. Those ones represent central OOP features of the language, like classes:

Let's see how the most common TypeDeclarationSyntax looks like:

and it's syntax tree:

That's it, ClassDeclarationSyntax includes everything starting from access modifiers and ending with CloseBraceToken. Also, you may notice that those green specifiers in SyntaxVisualizer (like PublicKeyword) are actually syntax tokens and not syntax nodes.

BasePropertyDeclarationSyntax

This sub-category of MemberDeclarationSyntax should be self-explanatory:

Though this category is tiny, it can show a couple of sleeve tricks. As you'll see later, EventDeclarationSyntax is not the only possible construct related to an event declaration. In RoslynAPI there is also EventFieldDeclarationSyntax. Wait, isn't there is only one event keyword in C#? Well, in RoslynAPI a single keyword can be used in several syntax nodes. Here is an example:

and the corresponding syntax tree:

You can have noticed several nuances here:

The difference between EventFieldDeclarationSyntax and EventDeclarationSyntax is that EventDeclarationSyntax is both a property and an event, as well as it has a block body which is BlockSyntax. In contrast, EventFieldDeclarationSyntax is both a field (as you'll see later) and an event, as well as it does not have a body.
EventDeclarationSyntax has accessors which are AccessorDeclarationSyntax with syntax AddAccessorDeclaration and RemoveAccessorDeclaration kinds.

Here is a kind of AccessorDeclarationSyntax that uses "add" keyword:

AccessorDeclarationSyntax - AddAccessorDeclaration Kind

Notice 2: AccessorDeclarationSyntax can have a block-body (BlockSyntax) or expression-body (ExpressionSyntax) but NOT both.

And here are properties:

PropertyDeclarationSyntax - Code Snippet

and the syntax tree:

Properties are similar to events, but there are a couple of nuances too:

Similarly to EventDeclarationSyntax, PropertyDeclarationSyntax has accessors of type AccesorDeclarationSyntax but they have GetAccessorDeclaration and SetAccessorDeclaration kinds.
Similarly to EventDeclarationSyntax, PropertyDeclarationSyntax accessors can have both types of bodies - a block-body and an expression-body (but not both at the same time).
In case of a read-only property is omitted (set to null) but it has an expression-body.
In case of an auto property AccessorListSyntax is presented but it does not have any accessors.

Notice: Though, PropertyDeclarationSyntax can have an expression-body, EventDeclarationSyntax cannot.

Now, let's take a look at indexers:

and the corresponding syntax tree:

Indexers are similar to properties but there are a couple of nuances too.

Similarly to properties, indexers can be read-only or have block-body or expression-body accessors.
Indexers posses BracketedParameterListSyntax but PropertyDeclarationSyntax are not.

So, we've just taken a look at `BasePropertyDeclarationSyntax` nodes. Though properties, events and indexers are slightly different, they have a lot in common from the syntax point of view. Now, you can even brag to someone that events and indexers are actually properties because they share the same base class in RoslynAPI.

BaseMethodDeclarationSyntax

I will only show a single example for this group because those nodes are similar. They can have a block-body or an expression-body but not both. Here is an example of a method:

and its syntax tree:

Here what you may notice:

The main parts of a method are (1) `TypeParameterListSyntax`, (2) `IdentifierToken` - I haven't marked it on the screenshot but it contains the name of a method, (3) a method body (can be `BlockSyntax` or `ExpressionSyntax`), (4) `TypeParameterConstraintClauseSyntax`.
`TypeParameterListSyntax` is a container for `TypeParameterSyntax`
`ParameterListSyntax` is a container for `ParameterSyntax`

OperatorDeclarationSyntax and ConversionOperatorSyntax a bit different in a way, that the conversion operator is responsible for operator overloading and the operator declaration is responsible for implicit and explicit type conversion.

InterpolatedStringTextSyntax and InterpolationSyntax

InterpolatedStringContentSyntax - Inheritance Hierarchy

Those are the parts of string interpolation feature in C#. Here is an example:

InterpolatedStringContentSyntax - Code Snippet

and the corresponding syntax tree:

InterpolatedStringContentSyntax - Syntax Tree

VariableDesignationSyntax

Basically, it's like a variable declaration which is used to temporarily designate variables where you won't or can't declare a full variable:

You may already notice it in the examples above inside SimpleLambdaExpressionSyntax:

VariableDesignationSyntax - Code Snippet

and here the corresponding syntax tree:

Have you noticed that little "u" character? Yep, this is VariableDesignationSyntax.

QueryClauseSyntax

I don't use query clauses at all in my code and prefer to use LINQ. But it's there, and C# team is carefully carrying it from release to release. I suppose it's there to introduce SQL-like queries. I won't describe each group in these categories. Feel free to explore it on your own.

You can read more about it here: Query expression basics (LINQ in C#) | Microsoft Docs

Here is an example from the Microsoft Docs (the link above) to continue the flow:

QueryClauseSyntax - Inheritance Hierarchy

SelectOrGroupClauseSyntax - Inheritance Hierarchy

ArgumentListSyntax, ArgumentSyntax, ParameterListSyntax and ParameterSyntax

Those ones are closely related, though they do not share a base type. Those are also closely related to the following direct children of CSharpSyntaxTree:

Sealed, Parameters Related:

TypeParameterListSyntax
TypeParameterSyntax
FunctionPointerParameterListSyntax
TypeParameterConstraintClauseSyntax
CrefParameterSyntax

Sealed Arguments Releated:

AttributeArgumentListSyntax
AttributeArgumentSyntax
TypeArgumentListSyntax
ArgumentSyntax
OmittedTypeArgumentSyntax

This group is responsible for passing arguments to invocations. In other words, it answers the question "What should be passed to a function call as an argument?"

Notice: ArgumentSyntax does not have a base class, and it is a direct child of CSharpSyntaxNode. But ParameterSyntax has BaseParameterSyntax.

BaseArgumentListSyntax - Inheritance Hierarchy

And very closely related to the arguments group is parameter group. It is responsible for defining parameters of functions. In other words, it answers the question "What a particular function can accept?" You can also think about it as about a "contract" between you and a function, meaning when you agree to a certain contract, you should comply with it when making calls to that function.

BaseParameterListSyntax and BaseParameterSyntax - Inheritance Hierarchy

Here is an example that uses both an argument and a parameter in a single expression:

ParameterSyntax and ArgumentSyntax - Syntax Tree

I hope, you can clearly see the difference between arguments and parameters here:

An argument is something that you passes during a method call (or while accessing something). In this example, you can see that we pass SimpleLambdaExpression as an argument to Where function invocation.
A parameter is something you define on the callee side (like a variable declaration). In this example, you can see that "x" is a parameter of the SimpleLambdaExpression. Also, in the example above you can see that ArgumentSyntax is contained inside ArgumentListSyntax. Similarly, parameters are contained in the ParameterListSyntax.

TypeParameterList and TypeParameter

Those are the base part of the generic types in C#. They allow you to specify a generic type for any type or method.

Notice: they are sealed, non-abstract and direct children of CSharpSyntaxNode.

TypeParameterSyntax and TypeParameterListSyntax - Inheritance Hierarchy

Let's see an example:

TypeParameterSyntax and TypeParameterListSyntax - Code Snippet

and the syntax tree for the class:

TypeParameterSyntax and TypeParameterListSyntax - Syntax Tree (1)

and the syntax tree for a generic method parameter:

TypeParameterSyntax and TypeParameterListSyntax - Syntax Tree (2)

You may notice that TypeParameter is nested inside TypeParameterList. We need TypeParameterList because C# allows specifying multiple generic parameters for a type.

BaseListSyntax, BaseTypeSyntax and PrimaryConstructorBaseTypeSyntax

BaseListSyntax is the list of all base types for a specified type. It also a parent container for BaseTypeSyntax. Those are the core of C# inheritance. They allow us to specifies base types for a type we define.

Here we can see that the ClassMemberStateBase class is inherited from another LocalContextState class:

and here is how it's defined in the syntax tree:

Now, have you wondered what is PrimaryConstructorBaseTypeSyntax? It is also related to inheritance in C#. They apply only to C# 9.0, where records were introduced:

PrimaryConstructorBaseTypeSyntax - Code Snippet

and it's syntax tree:

PrimaryConstructorBaseTypeSyntax - Syntax Tree

But what is even more interesting is that there is no separate syntax for a "primary constructor" in RoslynAPI, despite that you can specify PrimaryConstructorBaseTypeSyntax. Let's see a syntax tree for the R1 type:

Notice, the record does not have a "PrimaryConstructor" though RoslynAPI has PrimaryConstructorBaseTypeSyntax. Even Mads Torgersen mentioned the "primary constructor" in his blog post C# 9.0 on the record | .NET Blog in the comment:

I suppose, reusing ParameterListSyntax in RecordDeclarationSyntax just saved a couple of story points to the Roslyn team.

SwitchLabelSyntax

Those nodes represent a labels in C# language:

As you can see, there are three of them, each one specific to a context where it can be applied.

DirectiveTriviaSyntax

This group represents directives in C#:

StructuredTriviaSyntax and BranchingDirectiveTriviaSyntax

This groups represent different trivia constructs:

StructuredTriviaSyntax - Inheritance Hierarchy

BranchingDirectiveTriviaSyntax - Inheritance Hierarchy

CrefSyntax, XmlNodeSyntax and XmlAttributeSyntax

Those are primarily used in XML comments and some of them can reference actual classes:

BaseCRefParameterListSyntax - Inheritance Hierarchy

XmlAttributeSyntax - Inheritance Hierarchy

Direct Children of CSharpSyntaxNode

You've already seen a lot of nodes from this group. It contains different nodes that belong to other syntax groups but have no explicit relationships to those groups. Earlier, we seen that IfStatementSyntax has ElseClauseSyntax from this list. Also, we can find independent syntax nodes here. One of them is CompilationUnitSyntax. It is a node-container for all C# code that we define in a .cs file. I won't give additional examples for those, because this article has already become pretty long.

Conclusion

In this article, we've seen every single public syntax node that available in RolsynAPI. You may have noticed that there were not a single syntax node related to asynchronous programming (except AwaitExpressionSyntax) nor multi-threading (except lock statement). Have you wonder how is it even possible to see everything that possible in C# but not seeing any of those constructs (e.g. async/await)? It's because a programming language is just the way to describe things that a compiler can understand. When you invoke a function, all you have to know is the name of a function, arguments (a calling convention) and how to interpret the result. And it doesn't matter whether it's your personally crafted method or a System.Threading.Thread.Start - it's just a name that a compiler can understand. That is why RoslynAPI doesn't need some kind of "AsyncFunctionSyntaxNode".

Last but not least - RoslynAPI provides a consistent source of truth for naming C# language constructs. You may miss something while reading a textual language specification, but you can't miss a class like CSharpSyntaxNode. And you may be sure it is correct because (in case of RoslynAPI) it can parse any valid (and even not valid) C# source code.

I hope this article gave you a fresh perspective on C# language as well as a feeling of completeness. DotNet/C#/RoslynAPI teams did a great job of crafting the language till it reached its current state. People, that started their journey from C# v1.0, remember countless changes and improvements that were introduced during all those years. Nowadays, we can easily manipulate any C# source code by using RoslynAPI. But it was not always possible. So, I also hope, you'll appreciate RoslynAPI a bit more too.

P.S.: If you have any questions or suggestions feel free to contact me on twitter.

C# via RoslynAPI - The Big Picture

Pavel Sapehin

Pavel Sapehin

Before We Go (Prerequisites)

Going Down the Rabbit Hole

ExpressionOrPatternSyntax

ExpressionSyntax

TypeSyntax

InstanceExpressionSyntax

AnonymouseFunctionExpressionSyntax

BaseObjectCreationExpressionSyntax

PatternSyntax

StatementSyntax

MemberDeclarationSyntax

TypeDeclarationSyntax

BasePropertyDeclarationSyntax

BaseMethodDeclarationSyntax

InterpolatedStringTextSyntax and InterpolationSyntax

VariableDesignationSyntax

QueryClauseSyntax

ArgumentListSyntax, ArgumentSyntax, ParameterListSyntax and ParameterSyntax

TypeParameterList and TypeParameter

BaseListSyntax, BaseTypeSyntax and PrimaryConstructorBaseTypeSyntax

SwitchLabelSyntax

DirectiveTriviaSyntax

StructuredTriviaSyntax and BranchingDirectiveTriviaSyntax

CrefSyntax, XmlNodeSyntax and XmlAttributeSyntax

Direct Children of CSharpSyntaxNode

Conclusion

Useful AutoHotkey Scripts For Developers

C# RoslynAPI SyntaxTree - Variables Are Not Variables (Gotchas)