Caching an IEnumerable<T> instance
An IEnumerable<T>
can be costly to enumerate. If you need to do it twice or more, you'll pay the cost each time you enumerate it. For instance, if you use Directory.EnumerateFiles()
, it will enumerate the file system each time you start enumerating the results. You can use ToList
or ToArray
to read all the items and have a data structure you can enumerate very quickly. However, you have to read the full sequence first, and then you can read it for processing. This means you lose the capability of streaming the items.
The idea of the following code is to wrap an IEnumerable<T>
instance, and store the items in a list when they are enumerating. In my case, the enumerable may be iterated from multiple threads, so it must be thread-safe. This means the underlying IEnumerable<T>
must be enumerated only once even when multiple threads enumerate the items at the same time.
The code is simple and does not deserve lots of comments. Here's the code:
public static class CachedEnumerable
{
public static CachedEnumerable<T> Create<T>(IEnumerable<T> enumerable)
{
return new CachedEnumerable<T>(enumerable);
}
}
public sealed class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly List<T> _cache = new List<T>();
private readonly IEnumerable<T> _enumerable;
private IEnumerator<T> _enumerator;
private bool _enumerated = false;
public CachedEnumerable(IEnumerable<T> enumerable)
{
_enumerable = enumerable ?? throw new ArgumentNullException(nameof(enumerable));
}
public IEnumerator<T> GetEnumerator()
{
var index = 0;
while (true)
{
if (TryGetItem(index, out var result))
{
yield return result;
index++;
}
else
{
// There are no more items
yield break;
}
}
}
private bool TryGetItem(int index, out T result)
{
// if the item is in the cache, use it
if (index < _cache.Count)
{
result = _cache[index];
return true;
}
lock (_cache)
{
if(_enumerator == null && !_enumerated)
{
_enumerator = _enumerable.GetEnumerator();
}
// Another thread may have get the item while we were acquiring the lock
if (index < _cache.Count)
{
result = _cache[index];
return true;
}
// If we have already enumerate the whole stream, there is nothing else to do
if (_enumerated)
{
result = default;
return false;
}
// Get the next item and store it to the cache
if (_enumerator.MoveNext())
{
result = _enumerator.Current;
_cache.Add(result);
return true;
}
else
{
// There are no more items, we can dispose the underlying enumerator
_enumerator.Dispose();
_enumerator = null;
_enumerated = true;
result = default;
return false;
}
}
}
public void Dispose()
{
if (_enumerator != null)
{
_enumerator.Dispose();
_enumerator = null;
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
Here's how to use the CachedEnumerable<T>
:
static void Main(string[] args)
{
var enumerable = MyEnumerable();
using var cachedEnumerable = CachedEnumerable.Create(enumerable);
Parallel.ForEach(cachedEnumerable, item => Console.WriteLine(item));
foreach (var item in cachedEnumerable)
{
Console.WriteLine(item);
}
}
static IEnumerable<int> MyEnumerable()
{
// Should be called only once
yield return 1;
yield return 2;
yield return 3;
}
Do you have a question or a suggestion about this post? Contact me!