Introduction

Code generation is a very interesting topic. Instead of just writing code you can write code to write code. You can do code generation at compile time (new fancy source generators) and at runtime (expressions, emit IL). Anyway the idea to create methods and classes at runtime sounds like a magic to me. Runtime code generation feature is used quite heavily under the hood of DI frameworks, ORMs, different types of object mappers etc. Now when I have a sufficient understanding of the topic I realized that in the past I had some tasks which could be solved in more efficient and elegant way by using code generation. Unfortunately during those times I knew nothing about it. Searching the internet gave me results with quite high entry threshold and they didn’t give an entire understanding of the feature. Most of examples in articles are quite trivial so it’s still unclear how to apply it in practice. Here as the first step I want to describe a particular problem which could be solved with metaprogramming and then to give an overview of different code generation approaches. There will be a lot of code.

Task description

Let’s imagine our application receives a data from some source as an array of strings (for simplicity only string, integer and datetime values are expected in an input array):

["John McClane", "1994-11-05T13:15:30", "4455"]

I need a generic way to parse this input into the instance of a particular class. This is an interface to create a parser delegate (i.e. it accepts an array of strings as the input and returns an instance of T as the output):

public interface IParserFactory
{
    Func<string[], T> GetParser<T>() where T : new();
}

I use ParserOutputAttribute to identify classes used as parser’s output. And I use ArrayIndexAttribute to understand which property corresponds to each of the array elements:

[ParserOutput]
public class Data
{
    [ArrayIndex(0)] public string Name { get; set; } // will be "John McClane"
    [ArrayIndex(2)] public int Number { get; set; } // will be 4455
    [ArrayIndex(1)] public DateTime Birthday { get; set; } // will be 1994-11-05T13:15:30
}

If array element can’t be parsed to the target type then it’s ignored. So as a general idea I don’t want to limit implementation by Data class only. I want to produce a parser delegate for any type with the proper attributes.

Plain C#

First of all I want to write a plain C# code without code generation or reflection at all for a known type:

var data = new Data();
if (0 < inputArray.Length)
{
    data.Name = inputArray[0];
}
if (1 < inputArray.Length && DateTime.TryParse(inputArray[1], out var bd))
{
    data.Birthday = bd;
}
if (2 < inputArray.Length && int.TryParse(inputArray[2], out var n))
{
    data.Number = n;
}
return data;

Quite simple, right? But now I want to generate the same code for an arbitrary type at runtime or compile time. Let’s go!

Reflection

In the first approach with reflection I’m not going to generate a parser delegate. Instead I’m going to create an instance of the target type and set its properties using reflection API.

public class ReflectionParserFactory : IParserFactory
{
	public Func<string[], T> GetParser<T>() where T : new()
	{
		return ArrayIndexParse<T>;
	}

	private static T ArrayIndexParse<T>(string[] data) where T : new()
	{
        // create a new instance of target type
		var instance = new T();
		var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);

        //go through all public and non-static properties
        //read and parse corresponding element in array and if success - set property value
		for (int i = 0; i < props.Length; i++)
		{
			var attrs = props[i].GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
			if (attrs.Length == 0) continue;

			int order = ((ArrayIndexAttribute)attrs[0]).Order;
			if (order < 0 || order >= data.Length) continue;

			if (props[i].PropertyType == typeof(string))
			{
				props[i].SetValue(instance, data[order]);
				continue;
			}

			if (props[i].PropertyType == typeof(int))
			{
				if (int.TryParse(data[order], out var intResult))
				{
					props[i].SetValue(instance, intResult);
				}

				continue;
			}

			if (props[i].PropertyType == typeof(DateTime))
			{
				if (DateTime.TryParse(data[order], out var dtResult))
				{
					props[i].SetValue(instance, dtResult);
				}
			}
		}
		return instance;
	}
}

It works and it’s quite readable. But it’s slow (check benchmarks section below too). If you want to call this code very often it could be an issue. I want to implement something more sophisticated using real code generation.

Code generation

Expression trees

From the official documentation:

Expression trees represent code in a tree-like data structure, where each node is an expression, for example, a method call or a binary operation such as x < y. You can compile and run code represented by expression trees.

Expression trees give primitive building blocks like Expression.Call to call a method, Expression.Loop to add some repeating logic etc. Then using these blocks we build a parser delegate as a tree of instructions and finally compile it into the delegate at runtime.

public class ExpressionTreeParserFactory : IParserFactory
{
	public Func<string[], T> GetParser<T>() where T : new()
	{
		var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);

        //declare an input parameter of the delegate
		ParameterExpression inputArray = Expression.Parameter(typeof(string[]), "inputArray");
        //declare an output parameter of the delegate
		ParameterExpression instance = Expression.Variable(typeof(T), "instance");

        //create a new instance of target type
		var block = new List<Expression>
		{
			Expression.Assign(instance, Expression.New(typeof(T).GetConstructors()[0]))
		};
		var variables = new List<ParameterExpression> {instance};

        //go through all public and non-static properties
		foreach (var prop in props)
		{
			var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
			if (attrs.Length == 0) continue;

			int order = ((ArrayIndexAttribute)attrs[0]).Order;
			if (order < 0) continue;

            //validate an index from ArrayIndexAttribute
			var orderConst = Expression.Constant(order);
			var orderCheck = Expression.LessThan(orderConst, Expression.ArrayLength(inputArray));

			if (prop.PropertyType == typeof(string))
			{
                //set string property
				var stringPropertySet = Expression.Assign(
					Expression.Property(instance, prop),
					Expression.ArrayIndex(inputArray, orderConst));

				block.Add(Expression.IfThen(orderCheck, stringPropertySet));
				continue;
			}

            //get parser method from the list of available parsers (currently we parse only Int and DateTime)
			if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
			{
				continue;
			}

			var parseResult = Expression.Variable(prop.PropertyType, "parseResult");
			var parserCall = Expression.Call(parser, Expression.ArrayIndex(inputArray, orderConst), parseResult);
			var propertySet = Expression.Assign(
				Expression.Property(instance, prop),
				parseResult);

            //set property if an element of array is successfully parsed
			var ifSet = Expression.IfThen(parserCall, propertySet);

			block.Add(Expression.IfThen(orderCheck, ifSet));
			variables.Add(parseResult);
		}

		block.Add(instance);

        //compile lambda expression into delegate
		return Expression.Lambda<Func<string[], T>>(
			Expression.Block(variables.ToArray(), Expression.Block(block)), 
			inputArray).Compile();
	}
}

Emit IL

Dotnet compiler transforms your C# code into intermediate language (CIL or just IL) and then dotnet runtime translates IL into machine instructions. For instance, using sharplab.io you can easily check how generated IL will look like: C# and corresponding IL from https://sharplab.io/

Here we are going to write (“emit”) IL instructions directly and then compile them into the delegate at runtime.

public class EmitIlParserFactory : IParserFactory
{
	public Func<string[], T> GetParser<T>() where T : new()
	{
		var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);

		var dm = new DynamicMethod($"from_{typeof(string[]).FullName}_to_{typeof(T).FullName}", 
			typeof(T), new [] { typeof(string[]) }, typeof(EmitIlParserFactory).Module);
		var il = dm.GetILGenerator();

        //create a new instance of target type
		var instance = il.DeclareLocal(typeof(T));
		il.Emit(OpCodes.Newobj, typeof(T).GetConstructors()[0]);
		il.Emit(OpCodes.Stloc, instance);

        //go through all public and non-static properties
		foreach (var prop in props)
		{
			var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
			if (attrs.Length == 0) continue;

			int order = ((ArrayIndexAttribute)attrs[0]).Order;
			if (order < 0) continue;

			var label = il.DefineLabel();

			if (prop.PropertyType == typeof(string))
			{
                //check whether order from ArrayIndexAttribute is a valid index of the input array
				il.Emit(OpCodes.Ldc_I4, order);
				il.Emit(OpCodes.Ldarg_0);
				il.Emit(OpCodes.Ldlen);
				il.Emit(OpCodes.Bge_S, label);

                //set string property
				il.Emit(OpCodes.Ldloc, instance);
				il.Emit(OpCodes.Ldarg_0);
				il.Emit(OpCodes.Ldc_I4, order);
				il.Emit(OpCodes.Ldelem_Ref);
				il.Emit(OpCodes.Callvirt, prop.GetSetMethod());

				il.MarkLabel(label);
				continue;
			}

            //get parser method from the list of available parsers (currently we parse only Int and DateTime)
			if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
			{
				continue;
			}

            //check whether order from ArrayIndexAttribute is a valid index of the input array
			il.Emit(OpCodes.Ldc_I4, order);
			il.Emit(OpCodes.Ldarg_0);
			il.Emit(OpCodes.Ldlen);
			il.Emit(OpCodes.Bge_S, label);

			var parseResult = il.DeclareLocal(prop.PropertyType);

			il.Emit(OpCodes.Ldarg_0);
			il.Emit(OpCodes.Ldc_I4, order);
			il.Emit(OpCodes.Ldelem_Ref);
			il.Emit(OpCodes.Ldloca, parseResult);
			il.EmitCall(OpCodes.Call, parser, null);
			il.Emit(OpCodes.Brfalse_S, label);
            
            //set property if an element of array is successfully parsed
			il.Emit(OpCodes.Ldloc, instance);
			il.Emit(OpCodes.Ldloc, parseResult);
			il.Emit(OpCodes.Callvirt, prop.GetSetMethod());

			il.MarkLabel(label);
		}

		il.Emit(OpCodes.Ldloc, instance);
		il.Emit(OpCodes.Ret);

        //create delegate from il instructions
		return (Func<string[], T>)dm.CreateDelegate(typeof(Func<string[], T>));
	}
}

Sigil

This approach is quite similar to the previous one, but now we use sigil which gives us a syntax sugar and more understandable error messages.

public class SigilParserFactory : IParserFactory
{
	public Func<string[], T> GetParser<T>() where T : new()
	{
		var props = typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public);

		var il = Emit<Func<string[], T>>.NewDynamicMethod($"from_{typeof(string[]).FullName}_to_{typeof(T).FullName}");

		var instance = il.DeclareLocal<T>();
		il.NewObject<T>();
		il.StoreLocal(instance);

		foreach (var prop in props)
		{
			var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
			if (attrs.Length == 0) continue;

			int order = ((ArrayIndexAttribute)attrs[0]).Order;
			if (order < 0) continue;

			var label = il.DefineLabel();

			if (prop.PropertyType == typeof(string))
			{
				il.LoadConstant(order);
				il.LoadArgument(0);
				il.LoadLength<string>();
				il.BranchIfGreaterOrEqual(label);

				il.LoadLocal(instance);
				il.LoadArgument(0);
				il.LoadConstant(order);
				il.LoadElement<string>();
				il.CallVirtual(prop.GetSetMethod());

				il.MarkLabel(label);
				continue;
			}

			if (!TypeParsers.Parsers.TryGetValue(prop.PropertyType, out var parser))
			{
				continue;
			}

			il.LoadConstant(order);
			il.LoadArgument(0);
			il.LoadLength<string>();
			il.BranchIfGreaterOrEqual(label);

			var parseResult = il.DeclareLocal(prop.PropertyType);
			
			il.LoadArgument(0);
			il.LoadConstant(order);
			il.LoadElement<string>();
			il.LoadLocalAddress(parseResult);
			il.Call(parser);
			il.BranchIfFalse(label);

			il.LoadLocal(instance);
			il.LoadLocal(parseResult);
			il.CallVirtual(prop.GetSetMethod());

			il.MarkLabel(label);
		}

		il.LoadLocal(instance);
		il.Return();

		return il.CreateDelegate();
	}
}

Cache compiled parsers

We have implemented three approaches to create a parser delegate: expression tree, emit IL and sigil. In all cases we have the same problem: IParserFactory.GetParser does a hard job (builiding an expression tree or emitting IL and then creating delegate) every time you call it. Solution is quite simple - just cache it:

public class CachedParserFactory : IParserFactory
{
	private readonly IParserFactory _realParserFactory;
	private readonly ConcurrentDictionary<string, Lazy<object>> _cache;

	public CachedParserFactory(IParserFactory realParserFactory)
	{
		_realParserFactory = realParserFactory;
		_cache = new ConcurrentDictionary<string, Lazy<object>>();
	}

	public Func<string[], T> GetParser<T>() where T : new()
	{
		return (Func<string[], T>)(_cache.GetOrAdd($"aip_{_realParserFactory.GetType().FullName}_{typeof(T).FullName}", 
			new Lazy<object>(() => _realParserFactory.GetParser<T>(), LazyThreadSafetyMode.ExecutionAndPublication)).Value);
	}
}

Now we reuse compiled versions of delegates which is more efficient.

Roslyn based approaches

Roslyn is a dotnet compiler platform which doesn’t only compile code but gives an ability to do syntax analysis and to generate code.

Roslyn runtime code generation

Roslyn approach is quite interesting because it gives an ability to write plain C# (as a string though) instead of writing IL instructions or combining expression tree blocks:

public static class RoslynParserInitializer
{
    public static IParserFactory CreateFactory()
    {
        //get all types marked with ParserOutputAttribute
        var targetTypes =
            (from a in AppDomain.CurrentDomain.GetAssemblies()
                from t in a.GetTypes()
                let attributes = t.GetCustomAttributes(typeof(ParserOutputAttribute), true)
                where attributes != null && attributes.Length > 0
                select t).ToArray();

        var typeNames = new List<(string TargetTypeName, string TargetTypeFullName, string TargetTypeParserName)>();
        var builder = new StringBuilder();
        builder.AppendLine(@"
using System;
using Parsers.Common;

public class RoslynGeneratedParserFactory : IParserFactory 
{");
        //go through all types
        foreach (var targetType in targetTypes)
        {
            var targetTypeName = targetType.Name;
            var targetTypeFullName = targetType.FullName;
            var targetTypeParserName = targetTypeName + "Parser";
            typeNames.Add((targetTypeName, targetTypeFullName, targetTypeParserName));

            //generate private parser method for each target type
            builder.AppendLine($"private static T {targetTypeParserName}<T>(string[] input)");

            builder.Append($@"
{{
var {targetTypeName}Instance = new {targetTypeFullName}();");

            var props = targetType.GetProperties(BindingFlags.Instance | BindingFlags.Public);
            
            //go through all properties of the target type
            foreach (var prop in props)
            {
                var attrs = prop.GetCustomAttributes(typeof(ArrayIndexAttribute)).ToArray();
                if (attrs.Length == 0) continue;

                int order = ((ArrayIndexAttribute)attrs[0]).Order;
                if (order < 0) continue;

                if (prop.PropertyType == typeof(string))
                {
                    builder.Append($@"
if({order} < input.Length)
{{
{targetTypeName}Instance.{prop.Name} = input[{order}];
}}
");
                }

                if (prop.PropertyType == typeof(int))
                {
                    builder.Append($@"
if({order} < input.Length && int.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
                }

                if (prop.PropertyType == typeof(DateTime))
                {
                    builder.Append($@"
if({order} < input.Length && DateTime.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
                }
            }

            builder.Append($@"
object obj = {targetTypeName}Instance;
return (T)obj;
}}");
        }

        builder.AppendLine("public Func<string[], T> GetParser<T>() where T : new() {");
        foreach (var typeName in typeNames)
        {
            builder.Append($@"
if (typeof(T) == typeof({typeName.TargetTypeFullName}))
{{
return {typeName.TargetTypeParserName}<T>;
}}
");
        }
        builder.AppendLine("throw new NotSupportedException();}");

        builder.AppendLine("}");

        var syntaxTree = CSharpSyntaxTree.ParseText(builder.ToString());

        //reference assemblies
        string assemblyName = Path.GetRandomFileName();
        var refPaths = new List<string> {
            typeof(Object).GetTypeInfo().Assembly.Location,
            typeof(Enumerable).GetTypeInfo().Assembly.Location,
            Path.Combine(Path.GetDirectoryName(typeof(GCSettings).GetTypeInfo().Assembly.Location), "System.Runtime.dll"),
            typeof(RoslynParserInitializer).GetTypeInfo().Assembly.Location,
            typeof(IParserFactory).GetTypeInfo().Assembly.Location,
            Path.Combine(Path.GetDirectoryName(typeof(GCSettings).GetTypeInfo().Assembly.Location), "netstandard.dll"),
        };
        refPaths.AddRange(targetTypes.Select(x => x.Assembly.Location));

        var references = refPaths.Select(r => MetadataReference.CreateFromFile(r)).ToArray();

        // compile dynamic code
        var compilation = CSharpCompilation.Create(
            assemblyName,
            syntaxTrees: new[] { syntaxTree },
            references: references,
            options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary));

        //compile assembly
        using (var ms = new MemoryStream())
        {
            var result = compilation.Emit(ms);

            //to get a proper errors
            if (!result.Success)
            {
                throw new Exception(string.Join(",", result.Diagnostics.Where(diagnostic =>
                    diagnostic.IsWarningAsError ||
                    diagnostic.Severity == DiagnosticSeverity.Error).Select(x => x.GetMessage())));
            }
            ms.Seek(0, SeekOrigin.Begin);

            // load assembly from memory
            var assembly = AssemblyLoadContext.Default.LoadFromStream(ms);

            var factoryType = assembly.GetType("RoslynGeneratedParserFactory");
            if (factoryType == null) throw new NullReferenceException("Roslyn generated parser type not found");

            //create an instance of freshly generated parser factory
            return (IParserFactory)Activator.CreateInstance(factoryType);
        }
    }
}

Source generator

Source generator gives a very interesting ability of building parser’s delegate during the compilation step, i.e. in advance. So in that case we don’t have any runtime overhead to build a parser delegate at the first time which is amazing:

[Generator]
public class ParserSourceGenerator : ISourceGenerator
{
	public void Initialize(GeneratorInitializationContext context)
	{
		//uncomment to debug
		//System.Diagnostics.Debugger.Launch();
	}

	public void Execute(GeneratorExecutionContext context)
	{
		var compilation = context.Compilation;
		var parserOutputTypeSymbol = compilation.GetTypeByMetadataName("Parsers.Common.ParserOutputAttribute");
		var attributeIndexTypeSymbol = compilation.GetTypeByMetadataName("Parsers.Common.ArrayIndexAttribute");
		var typesToParse = new List<ITypeSymbol>();

		foreach (var syntaxTree in compilation.SyntaxTrees)
		{
			var semanticModel = compilation.GetSemanticModel(syntaxTree);

            //get all types marked with ParserOutputAttribute
			typesToParse.AddRange(syntaxTree.GetRoot()
				.DescendantNodesAndSelf()
				.OfType<ClassDeclarationSyntax>()
				.Select(x => semanticModel.GetDeclaredSymbol(x))
				.OfType<ITypeSymbol>()
				.Where(x => x.GetAttributes().Select(a => a.AttributeClass)
					.Any(b => b == parserOutputTypeSymbol)));
		}

		var typeNames = new List<(string TargetTypeName, string TargetTypeFullName, string TargetTypeParserName)>();
		var builder = new StringBuilder();
		builder.AppendLine(@"
using System;
using Parsers.Common;
namespace BySourceGenerator
{
public class Parser : IParserFactory 
{");

        //go through all types
		foreach (var typeSymbol in typesToParse)
		{
			var targetTypeName = typeSymbol.Name;
			var targetTypeFullName = GetFullName(typeSymbol);
			var targetTypeParserName = targetTypeName + "Parser";
			typeNames.Add((targetTypeName, targetTypeFullName, targetTypeParserName));
			builder.AppendLine($"private static T {targetTypeParserName}<T>(string[] input)");

			builder.Append($@"
{{
var {targetTypeName}Instance = new {targetTypeFullName}();");

			var props = typeSymbol.GetMembers().OfType<IPropertySymbol>();

            //go through all properties of the target type
			foreach (var prop in props)
			{
				var attr = prop.GetAttributes().FirstOrDefault(x => x.AttributeClass == attributeIndexTypeSymbol);
				if (attr == null || !(attr.ConstructorArguments[0].Value is int)) continue;

				int order = (int) attr.ConstructorArguments[0].Value;
				if (order < 0) continue;

				if (GetFullName(prop.Type) == "System.String")
				{
					builder.Append($@"
if({order} < input.Length)
{{
{targetTypeName}Instance.{prop.Name} = input[{order}];
}}
");
				}

				if (GetFullName(prop.Type) == "System.Int32")
				{
					builder.Append($@"
if({order} < input.Length && int.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
				}

				if (GetFullName(prop.Type) == "System.DateTime")
				{
					builder.Append($@"
if({order} < input.Length && DateTime.TryParse(input[{order}], out var parsed{prop.Name}))
{{
{targetTypeName}Instance.{prop.Name} = parsed{prop.Name};
}}
");
				}
			}

			builder.Append($@"
object obj = {targetTypeName}Instance;
return (T)obj;
}}");
		}

		builder.AppendLine("public Func<string[], T> GetParser<T>() where T : new() {");
		foreach (var typeName in typeNames)
		{
			builder.Append($@"
if (typeof(T) == typeof({typeName.TargetTypeFullName}))
{{
return {typeName.TargetTypeParserName}<T>;
}}
");
		}

		builder.AppendLine("throw new NotSupportedException();}");

		builder.AppendLine("}}");

		var src = builder.ToString();
		context.AddSource(
			"ParserGeneratedBySourceGenerator.cs",
			SourceText.From(src, Encoding.UTF8)
		);
	}

	private static string GetFullName(ITypeSymbol typeSymbol) =>
		$"{typeSymbol.ContainingNamespace}.{typeSymbol.Name}";
}

Benchmarks

The post wouldn’t be comprehensive without benchmarks. I would like to compare two things:

  • warm up step, i.e. generation of parser;
  • invocation of already generated parser.

Benchmarks are measured using BenchmarkDotNet. μs - microsecond, ns - nanosecond, 1 μs = 1000 ns.


BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1237 (21H1/May2021Update)
Intel Core i7-8550U CPU 1.80GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.401
  [Host]     : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  DefaultJob : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT

Generation of parser

Method Mean ErrorStdDevGen 0Gen 1Gen 2Allocated
EmitIl22.02 μs0.495 μs1.429 μs1.28170.64090.03055 KB
ExpressionTree683.68 μs13.609 μs31.268 μs2.92970.9766-14 KB
Sigil642.63 μs12.305 μs29.243 μs112.3047--460 KB
Roslyn71,605.64 μs2,533.732 μs7,350.817 μs1000.0000--5,826 KB

Invocation of parser

Method MeanErrorStdDevRatioRatioSDGen 0Allocated
EmitIl374.7 ns7.75 ns22.36 ns1.020.080.009540 B
ExpressionTree378.1 ns7.56 ns20.57 ns1.030.080.009540 B
Reflection13,625.0 ns272.60 ns750.81 ns37.292.290.77823,256 B
Sigil378.9 ns7.69 ns21.06 ns1.030.070.009540 B
Roslyn404.2 ns7.55 ns17.80 ns1.100.070.009540 B
SourceGenerator384.4 ns7.79 ns21.46 ns1.050.080.009540 B
ManuallyWritten367.8 ns7.36 ns15.68 ns1.000.000.009540 B

All approaches besides direct usage of reflection give results almost identical to manually written C# parser.

Source code

Here is github repository with parser factories, unit tests and benchmarks.