C# via RoslynAPI - The Big Picture
This article shows everything C# program can contain. I mean it, EVERYTHING. After reading this article you'll understand what is possible in ANY C# program. And nobody will tell you that you are wrong. But be aware that the data presented here is based on the data, presented in Microsoft.CodeAnalysis.CSharp
v3.9.0 NuGet package, which is RoslynAPI for C# language. And I will even dare to say that this NuGet package is the latest stable C# Language Specification. And after reading this article to the end, you should understand why.
Before We Go (Prerequisites)
You should already know what is SyntaxTree in RoslynAPI. For this, I recommend reading the official documentation that can be found here: Get started with syntax analysis (Roslyn APIs) | Microsoft Docs
Also, my previous article may help to grasp the core ideas of this one. It can be found here: C# RoslynAPI SyntaxTree - Variables Are Not Variables (Gotchas)
All syntax trees in examples have gotten by using SyntaxTreeVisualizer extension. And it's important to know, that this extension omits "Syntax" postfix for all syntax nodes. That is why when we see "StructDeclaration" we should read it as
StructDeclarationSyntax
which is a particular class insideMicrosoft.CodeAnalysis.CSharp.Syntax
namespace.
Let's consider the most basic C# language unit in RoslynAPI - CSharpSyntaxNode
. Why is it the most basic? Because any C# Syntax Unit like a condition or statement, class or method is inherited from the CSharpSyntaxNode
. For example, here is the inheritance chain for a class:
CSharpSyntaxNode
MemberDeclarationSyntax
BaseTypeDeclarationSyntax
TypeDeclarationSyntax
ClassDeclarationSyntax
Now, when we know where it all starts, we can dig deeper.
And when I say deeper I mean we should go down the inheritance chain of CSharpSyntaxNode
. Why? Because if we know that any C# construct is represented by it, we can see all possible constructs in C#. So, our next steps are to see all public SyntaxNodes that the RoslynAPI exposes.
Going Down the Rabbit Hole
Be aware, the first level of the hierarchy contains quite a lot of information. But don't be discouraged by it, take a brief look and go on reading. So, here it is:
Here we can notice that all of those top-level classes are abstract and directly inherited from CSharpSyntaxNode
. Except for the lower node. I grouped all directly derived from CSharpSyntaxNode
sealed classes at the end of the tree and called them, well, "sealed". And it is the only one "group" in this list that doesn't exist in the Microsoft.CodeAnalysis.CSharp
package. It contains all not abstract, sealed syntax nodes. We will take a look at them later.
ExpressionOrPatternSyntax
Let's start at the top where we can see abstract ExpressionOrPatternSyntax that also contains abstract ExpressionSyntax and PatternSyntax:
The ExpressionOrPatternSyntax does not have any direct children and can be considered as a "Type-Marker" or a shared container.
ExpressionSyntax
What is ExpressionSyntax? Have you ever seen a refactoring suggestion by Visual Studio, Resharper or Rider saying, "Use expression body for properties"?
So, EVERY single construct that you write after =>
and before ;
(semicolon) is ExpressionSyntax
. In this case it is this._nvim
. There are so many types derived from it, so it deserves a separate article.
Notice 1: everything including
=>
and;
is actuallyArrowExpressionClauseSyntax
which is pure container for expressions.
Notice 2: a property is not the only place where ExpressionSyntax can be found. For example, it can be used inside ExpressionStatementSyntax too.
The expression syntax has another abstract child types:
as well as non-abstract:
Once again, don't be overwhelmed. For now, this list is presented only to understand the scale.
To understand all quirks about C# syntax in RoslynAPI we should see at least one example of C# expressions. And the simplest one is BinaryExpressionSyntax
. Everything it does is combines two other expressions:
and it's corresponding syntax tree:
We can notice a couple of important facts:
BinaryExpressionSyntax
isExpressionSyntax
- As we will see later,
IdentifierNameSyntax
is alsoExpressionSyntax
. It means anyExpressionSyntax
may contain other expressions as child nodes.
The second fact has a pretty handy consequence. It means that in C#, we can combine almost any expression with another expression. For example, we can nest one BinaryExpression
into another BinaryExpression
.
We can also see in the syntax visualizer that this node is "AddExpression". But it is not the name of the class that represents this node. When we select it in the syntax visualizer below the syntax tree, we can see the following:
It says that the actual type of the node is BinaryExpressionSyntax, but the kind is AddExpression.
From here we could also conclude:
- The syntax visualizer is showing us the kind of a node instead of its type for some nodes.
- The same types of syntax nodes can have different kinds.
For example, here are some possible kinds of a binary expression:
AddExpression
SubtractExpression
MultiplyExpression
DivideExpression
ModuloExpression
TypeSyntax
This child category of ExpressionSyntax
contains any type that can be specified in expressions. It has two main sub-categories:
NameSyntax
- Direct children of
TypeSyntax
NameSyntax
nodes and its children nodes represent names that can refer to any identifiable constructs like targets of invocations (e.g. a method name). We already saw an example of IdentifierNameSyntax
above that belongs to this category.
Let's take a look at one more example of a NameSyntax
- GenericNameSyntax
. It's a part of the following ObjectCreatingExpressionSyntax
:
and it's corresponding syntax tree:
Recap: every type derived from
NameSyntax
is:
ExpressionSyntax
TypeSyntax
NameSyntax
Direct children of TypeSyntax
represent a reference to a specific type, used in expressions. For example, it's being used to specify a return type in any given method.
Let's see what is TypeSyntax
by analyzing its direct child - ArrayTypeSyntax
:
and here the syntax tree:
Recap: every child node of the
ArrayType
is bothExpresssionSyntax
andTypeSyntax
.
InstanceExpressionSyntax
The following abstract child of ExpressionSyntax
is easy to understand. It contains only two sealed types:
ThisExpressionSyntax
is simply any expression that uses this
keyword. For example:
has the following expression tree:
And the BaseExpressionSyntax
, as you might guess is used in every expression that accesses a base member. For example:
has the following syntax tree:
To summarize, this category contains nodes to access instance and base members. No more, no less.
AnonymouseFunctionExpressionSyntax
The following abstract child of ExpressionSyntax
is AnonimouseFunctionExpressionSyntax
:
If you know what anonymous functions and lambda expressions are, you can already be familiar with what members of this category do. In case you would like to see an example, here is SimpleLambdaExpressionSyntax
:
and the corresponding syntax tree:
it consist of a lambda parameter and an expression body. Not quite simple, huh?
BaseObjectCreationExpressionSyntax
The following abstract child of ExpressionSyntax
is BaseObjectCreatingExpressionSyntax
:
and the corresponding syntax tree:
PatternSyntax
Now, we should already have a basic understanding of what ExpressionSyntax
is in C#. But what about PatternSyntax
? PatternSyntax
nodes are responsible for, you guessed it, describing a pattern matching in C#.
It was introduced in C# 7.0. You can read more about it:
C# 8.0 - Pattern Matching in C# 8.0 | Microsoft Docs
And you can read about changes introduced to pattern matching in C# 9.0:
Pattern matching changes - C# 9.0 specification proposals | Microsoft Docs
Those syntax nodes should also be self-explanatory. The following example demonstrates the concept:
and the corresponding syntax tree:
In other words, a pattern matching "matches" objects that correspond to a specific pattern. You can think of it as regular expressions for C# objects.
StatementSyntax
I call this group of nodes "method body members":
Why "method body members"? Because those are the ONLY nodes that can be a direct child of a BlockSyntax
of MethodDeclarationSyntax
(and other types, inherited from BaseMethodDeclarationSyntax
). That's it, anything that can be specified inside a method body is a statement. Of cause, until it's an expression-body, in which case, only expressions can be specified in there.
Interesting facts:
- A method body cannot contain an expression. But it may contain the
ExpressionStatementSyntax
which can contain any expression. - Many loop and conditional statements consist of syntax nodes scattered around different syntax node categories. For example, if statement consists of
IfStatementSyntax
which has the following child nodes that can be accessed via properties: (1) condition (ExpressionSyntax
), (2) statement (StatementSyntax
which is actuallyBlockSyntax
), (4, optional)ElseClauseSyntax
(direct child ofCSharpSyntaxNode
) - Statements can contain nested statements via
BlockSyntax
,UsingStatementSyntax
,LockStatementSyntax
,CheckedStatementSyntax
, etc. BlockSyntax
is a pure container for other statements.- Except
ThrowStatementSyntax
there is alsoThrowExpressionSyntax
(direct child ofCSharpSyntaxNode
) - Except
SwitchStatementSyntax
there is alsoSwitchExpressionSyntax
LabeledStatementSyntax
is a parent for statements to which it applies (not a sibling).- Multiple semicolon statements separated by white space are considered as multiple
EmptyStatementSyntax
. For example;;
are twoEmptyStatementSyntax
nodes. - Else-If is actually a nested
IfStatementSyntax
insideElseClauseSyntax
. Strictly speaking,IfStatementSyntax
cannot contain the else-if statement because there is no else-if syntax node in RoslynAPI.
Let's take a look at IfStatementSyntax
:
and it's syntax tree:
So, statements are workhorses of methods. Here is what they can be.
Loop Statements:
- for
- ForEach
- while
- do ... while
- break
- continue
Condition Statements:
- if
- switch
Control-Flow Statements:
- return
- yield
- try
- throw
- label
- goto
Arithmetic:
- checked
- unchecked
Garbage Collection Statements:
- fixed
- using
Multithreading:
- lock
Unmanaged Statements:
- unsafe
Other:
- Local Functions
- Expressions (via ExpressionStatementSyntax)
Notice 1: Similarly to
IfStatementSyntax
,CatchStatementSyntax
andFinallyStatementSyntax
are not inherited fromStatementSyntax
. Instead, they are direct children ofCSharpSyntaxNode
. Why? Remember "method body members" alias? Most likely the Roslyn team did so because you can't specifycatch
orfinally
keywords directly in a method body. They are only valid as children ofTryStatementSyntax
.
Notice 2: There is may be a confusion between [statements and keywords in C# language] and [RoslynAPI syntax classes]. For example, a one can think that
catch
is a statement, after reading official C# documentation:
but in RoslynAPI there are no "catch" nor "try-catch" statements. As mentioned above in RoslynAPI "catch" is represented byCatchClauseSyntax
which is not a child ofExpressionSyntax
class. So, a one who uses RoslynAPI may think ofCatchClauseSyntax
as about a separate entity because it's a direct child ofCSharpSyntaxNode
. In any case, we can safely call a "catch" a keyword. Have you noticed a catch here?
Notice 3: There is no
unchecked
statement syntax node in RoslynAPI.unchecked
is usingCheckedStatementSyntax
but it hasUncheckedStatement
kind.
Notice 4: Many statements have expression variant. For example, there is CheckedStatementSyntax and CheckedExpressionSyntax.
Here is another example to understand the difference between statements and expressions of a similar kind:
and it's corresponding syntax tree:
Notice: The kind of
unchecked
expression isUncheckedExpression
This separation is subtle and very important to understand. Because you can't use expressions (e.g. checked-expression) directly in a method body but you can use statements (e.g. checked-statement) there. Similarly, you can't use statements in expressions (e.g. checked-statement in a property's expression body) but you can use expressions there (e.g. checked-expression).
MemberDeclarationSyntax
This category unites classes, namespaces, enums, interfaces, fields, properties, methods, etc. Basically it contains everything that can have a member. For example, a class can have the following members:
- Fields
- Properties
- Methods
- Events
- Indexers
RoslynAPI defines four main member categories (abstract):
as well as non-abstract sealed direct children:
As you can see there are constructs that are unique and doesn't have any other sub-types. Why? If we consider namespaces, then it's obvious that there is only one type of a namespace declaration. And if C# language design team decide to create another tricky kind of a namespace then the Roslyn team may introduce another base type.
We will see examples in the following sections.
TypeDeclarationSyntax
I would say those nodes are the heart of the C# language. Those ones represent central OOP features of the language, like classes:
Let's see how the most common TypeDeclarationSyntax looks like:
and it's syntax tree:
That's it, ClassDeclarationSyntax
includes everything starting from access modifiers and ending with CloseBraceToken
. Also, you may notice that those green specifiers in SyntaxVisualizer (like PublicKeyword
) are actually syntax tokens and not syntax nodes.
BasePropertyDeclarationSyntax
This sub-category of MemberDeclarationSyntax
should be self-explanatory:
Though this category is tiny, it can show a couple of sleeve tricks. As you'll see later, EventDeclarationSyntax
is not the only possible construct related to an event declaration. In RoslynAPI there is also EventFieldDeclarationSyntax
. Wait, isn't there is only one event keyword in C#? Well, in RoslynAPI a single keyword can be used in several syntax nodes. Here is an example:
and the corresponding syntax tree:
You can have noticed several nuances here:
- The difference between
EventFieldDeclarationSyntax
andEventDeclarationSyntax
is thatEventDeclarationSyntax
is both a property and an event, as well as it has a block body which isBlockSyntax
. In contrast,EventFieldDeclarationSyntax
is both a field (as you'll see later) and an event, as well as it does not have a body. EventDeclarationSyntax
has accessors which areAccessorDeclarationSyntax
with syntaxAddAccessorDeclaration
andRemoveAccessorDeclaration
kinds.
Here is a kind of AccessorDeclarationSyntax
that uses "add" keyword:
Notice 2:
AccessorDeclarationSyntax
can have a block-body (BlockSyntax
) or expression-body (ExpressionSyntax
) but NOT both.
And here are properties:
and the syntax tree:
Properties are similar to events, but there are a couple of nuances too:
- Similarly to
EventDeclarationSyntax
,PropertyDeclarationSyntax
has accessors of typeAccesorDeclarationSyntax
but they haveGetAccessorDeclaration
andSetAccessorDeclaration
kinds. - Similarly to
EventDeclarationSyntax
,PropertyDeclarationSyntax
accessors can have both types of bodies - a block-body and an expression-body (but not both at the same time). - In case of a read-only property is omitted (set to
null
) but it has an expression-body. - In case of an auto property
AccessorListSyntax
is presented but it does not have any accessors.
Notice: Though,
PropertyDeclarationSyntax
can have an expression-body,EventDeclarationSyntax
cannot.
Now, let's take a look at indexers:
and the corresponding syntax tree:
Indexers are similar to properties but there are a couple of nuances too.
- Similarly to properties, indexers can be read-only or have block-body or expression-body accessors.
- Indexers posses
BracketedParameterListSyntax
butPropertyDeclarationSyntax
are not.
So, we've just taken a look at `BasePropertyDeclarationSyntax` nodes. Though properties, events and indexers are slightly different, they have a lot in common from the syntax point of view. Now, you can even brag to someone that events and indexers are actually properties because they share the same base class in RoslynAPI.
BaseMethodDeclarationSyntax
I will only show a single example for this group because those nodes are similar. They can have a block-body or an expression-body but not both. Here is an example of a method:
and its syntax tree:
Here what you may notice:
- The main parts of a method are (1) `TypeParameterListSyntax`, (2) `IdentifierToken` - I haven't marked it on the screenshot but it contains the name of a method, (3) a method body (can be `BlockSyntax` or `ExpressionSyntax`), (4) `TypeParameterConstraintClauseSyntax`.
- `TypeParameterListSyntax` is a container for `TypeParameterSyntax`
- `ParameterListSyntax` is a container for `ParameterSyntax`
OperatorDeclarationSyntax
andConversionOperatorSyntax
a bit different in a way, that the conversion operator is responsible for operator overloading and the operator declaration is responsible for implicit and explicit type conversion.
InterpolatedStringTextSyntax and InterpolationSyntax
Those are the parts of string interpolation feature in C#. Here is an example:
and the corresponding syntax tree:
VariableDesignationSyntax
Basically, it's like a variable declaration which is used to temporarily designate variables where you won't or can't declare a full variable:
You may already notice it in the examples above inside SimpleLambdaExpressionSyntax:
and here the corresponding syntax tree:
Have you noticed that little "u" character? Yep, this is VariableDesignationSyntax
.
QueryClauseSyntax
I don't use query clauses at all in my code and prefer to use LINQ. But it's there, and C# team is carefully carrying it from release to release. I suppose it's there to introduce SQL-like queries. I won't describe each group in these categories. Feel free to explore it on your own.
You can read more about it here: Query expression basics (LINQ in C#) | Microsoft Docs
Here is an example from the Microsoft Docs (the link above) to continue the flow:
ArgumentListSyntax, ArgumentSyntax, ParameterListSyntax and ParameterSyntax
Those ones are closely related, though they do not share a base type. Those are also closely related to the following direct children of CSharpSyntaxTree:
Sealed, Parameters Related:
- TypeParameterListSyntax
- TypeParameterSyntax
- FunctionPointerParameterListSyntax
- TypeParameterConstraintClauseSyntax
- CrefParameterSyntax
Sealed Arguments Releated:
- AttributeArgumentListSyntax
- AttributeArgumentSyntax
- TypeArgumentListSyntax
- ArgumentSyntax
- OmittedTypeArgumentSyntax
This group is responsible for passing arguments to invocations. In other words, it answers the question "What should be passed to a function call as an argument?"
Notice:
ArgumentSyntax
does not have a base class, and it is a direct child ofCSharpSyntaxNode
. ButParameterSyntax
hasBaseParameterSyntax
.
And very closely related to the arguments group is parameter group. It is responsible for defining parameters of functions. In other words, it answers the question "What a particular function can accept?" You can also think about it as about a "contract" between you and a function, meaning when you agree to a certain contract, you should comply with it when making calls to that function.
Here is an example that uses both an argument and a parameter in a single expression:
I hope, you can clearly see the difference between arguments and parameters here:
- An argument is something that you passes during a method call (or while accessing something). In this example, you can see that we pass
SimpleLambdaExpression
as an argument toWhere
function invocation. - A parameter is something you define on the callee side (like a variable declaration). In this example, you can see that "x" is a parameter of the
SimpleLambdaExpression
. Also, in the example above you can see thatArgumentSyntax
is contained insideArgumentListSyntax
. Similarly, parameters are contained in theParameterListSyntax
.
TypeParameterList and TypeParameter
Those are the base part of the generic types in C#. They allow you to specify a generic type for any type or method.
Notice: they are sealed, non-abstract and direct children of
CSharpSyntaxNode
.
Let's see an example:
and the syntax tree for the class:
and the syntax tree for a generic method parameter:
You may notice that TypeParameter
is nested inside TypeParameterList
. We need TypeParameterList
because C# allows specifying multiple generic parameters for a type.
BaseListSyntax, BaseTypeSyntax and PrimaryConstructorBaseTypeSyntax
BaseListSyntax
is the list of all base types for a specified type. It also a parent container for BaseTypeSyntax
. Those are the core of C# inheritance. They allow us to specifies base types for a type we define.
Here we can see that the ClassMemberStateBase class is inherited from another LocalContextState
class:
and here is how it's defined in the syntax tree:
Now, have you wondered what is PrimaryConstructorBaseTypeSyntax? It is also related to inheritance in C#. They apply only to C# 9.0, where records were introduced:
and it's syntax tree:
But what is even more interesting is that there is no separate syntax for a "primary constructor" in RoslynAPI, despite that you can specify PrimaryConstructorBaseTypeSyntax
. Let's see a syntax tree for the R1 type:
Notice, the record does not have a "PrimaryConstructor" though RoslynAPI has
PrimaryConstructorBaseTypeSyntax
. Even Mads Torgersen mentioned the "primary constructor" in his blog post C# 9.0 on the record | .NET Blog in the comment:
I suppose, reusing ParameterListSyntax
in RecordDeclarationSyntax
just saved a couple of story points to the Roslyn team.
SwitchLabelSyntax
Those nodes represent a labels in C# language:
As you can see, there are three of them, each one specific to a context where it can be applied.
DirectiveTriviaSyntax
This group represents directives in C#:
StructuredTriviaSyntax and BranchingDirectiveTriviaSyntax
This groups represent different trivia constructs:
CrefSyntax, XmlNodeSyntax and XmlAttributeSyntax
Those are primarily used in XML comments and some of them can reference actual classes:
Direct Children of CSharpSyntaxNode
You've already seen a lot of nodes from this group. It contains different nodes that belong to other syntax groups but have no explicit relationships to those groups. Earlier, we seen that IfStatementSyntax
has ElseClauseSyntax
from this list. Also, we can find independent syntax nodes here. One of them is CompilationUnitSyntax
. It is a node-container for all C# code that we define in a .cs
file. I won't give additional examples for those, because this article has already become pretty long.
Conclusion
In this article, we've seen every single public syntax node that available in RolsynAPI. You may have noticed that there were not a single syntax node related to asynchronous programming (except AwaitExpressionSyntax
) nor multi-threading (except lock
statement). Have you wonder how is it even possible to see everything that possible in C# but not seeing any of those constructs (e.g. async/await
)? It's because a programming language is just the way to describe things that a compiler can understand. When you invoke a function, all you have to know is the name of a function, arguments (a calling convention) and how to interpret the result. And it doesn't matter whether it's your personally crafted method or a System.Threading.Thread.Start - it's just a name that a compiler can understand. That is why RoslynAPI doesn't need some kind of "AsyncFunctionSyntaxNode".
Last but not least - RoslynAPI provides a consistent source of truth for naming C# language constructs. You may miss something while reading a textual language specification, but you can't miss a class like CSharpSyntaxNode. And you may be sure it is correct because (in case of RoslynAPI) it can parse any valid (and even not valid) C# source code.
I hope this article gave you a fresh perspective on C# language as well as a feeling of completeness. DotNet/C#/RoslynAPI teams did a great job of crafting the language till it reached its current state. People, that started their journey from C# v1.0, remember countless changes and improvements that were introduced during all those years. Nowadays, we can easily manipulate any C# source code by using RoslynAPI. But it was not always possible. So, I also hope, you'll appreciate RoslynAPI a bit more too.
P.S.: If you have any questions or suggestions feel free to contact me on twitter.