sloppycode.net
Inside the System.IO namespace
An in depth look at Streams,Writers and Readers in System.IO


Download the solution file for the examples below

The System.IO namespace can appear (on the surface) to have a lot of ways of performing the same or very similar tasks. This small article and example code should help to de-mystify it. All of the classes in the System.IO namespace can be found in the mscorlib assembly.

Microsoft already has a lot of quickstarts on reading and writing files:

Basic File I/O:
http://msdn2.microsoft.com/en-us/library/336wast5.aspx

Common IO tasks:
http://msdn2.microsoft.com/en-us/library/ms404278.aspx

which includes:

- How to write text to a file
- Read text from a file
- Read from a binary file
- Write to a binary

This article is a more in depth look at the classes of System.IO, but should hopefully also serve as a reference.

Stream
The MSDN documentation gives you a fair bit of information about the Stream class. The Stream class is essentially your place to store the data you want to read and write to, or a backing store in MSDN parlance. This can be in memory (MemoryStream), a file (FileStream),a zip file (GZipStream) or a remote server (NetworkStream). The base Stream class implements IDisposable which allows it to be wrapped in a using clause. The Dispose method simply calls Close so you don't have to worry about the cleanup (more details on this below).

As Stream is an abstract class so you'll use its derived classes for any functionality, such as a FileStream for dealing with a file.

Stream has the following methods

- BeginRead
- BeginWrite
- Close
- EndRead
- EndWrite
- Flush
- Read
- ReadByte
- Seek
- SetLength
- Write
- WriteByte

The following properties

- CanRead
- CanSeek
- CanTimeout
- CanWrite
- Length
- Position
- ReadTimeout
- WriteTimeout

As the MSDN documentation points out, you can perform random access in the Stream class with the Seek() method, but this isn't always possible - the NetworkStream class doesn't allow it as you never have the whole data to deal with, only the current packet or set of packets. Below shows classes that are derived from the Stream class.

stream
The classes you will typically deal with on a day to day basis are BufferedStream, FileStream, MemoryStream, NetworkStream. Unless you are dealing with a byte array yourself, using one of the Reader/Writers listed below is the easiest way to use the various Stream classes.

BufferedStream
This is intended to improve performance of (file) read/write operations by storing the bytes in memory as a cache. The BufferedStream is an example of the Decorator pattern. You wrap a stream inside a BufferedStream in order to benefit from its functionality.

No new methods or properties can be found in BuffereredStream, however it overrides the Stream class' methods/properties to implement its own cache.

Whilst it's recommended you use a BufferedStream for large files on disk, you can also just set a large buffer on your FileStream and the same benefit will be had. To quote a Microsoft developer who worked on the System.IO namespace:

"..there is zero benefit from wrapping a BufferedStream around a FileStream. We copied BufferedStream’s buffering logic into FileStream about 4 years ago to encourage better default performance"

When I was writing the file reading logic in Statmagic (an open source project for parsing web log files), my strategy was to use a large buffer and skip using the BufferedStream. I assumed that my application would be running on a server with 4gb+ of SDRAM, or a desktop machine where the size of RAM far exceeds the size of the log file. Of course running it on a mobile device would require a different approach. Many websites have 100mb or GB log files per day but from tests I ran, it tears through even 2gb log files. The default size of the buffer in Statmagic is 16mb, which I got after some experimentation and reading of this discussion. In tests, this runs fine in both the single threaded reads and the multi-threaded reads where more than 1 log file is read at the same time. The single-threaded reads run slightly faster on a SATA (raid0'd) hard drive, though I haven't tried it on a server setup yet.

FileStream
As the name implies, this is for reading and writing files. Some of the FileStream's functionality is available via the File class (which just uses a FileStream under the hood).

NetworkStream
NetworkStream is used for reading binary from a socket typically via UDP or TCP. I spent some of my spare time many years ago writing C# libraries to read UDP packets from game servers for Quake 3, Half Life, Unreal. The main project went the way of a lot OS projects and remained unfinished, however the main bulk of the UDP reading logic was complete. Unfortunately there is no NetworkStream implementation with the UdpClient class that I used, instead you are fed the packet data as a byte array.

The code below is an example of doing a HTTP GET with a NetworkStream. There is of course easier ways of doing this with the WebClient class, but this demonstrates how you might use the NetworkStream class. Attempting to use the Random access methods such as Seek() with the NetworkStream class will throw a NotImplemented exception, as I mentioned above you never have the whole data to work with, so Seek'ing makes no sense.

MemoryStream
The MemoryStream class only adds one new method, WriteTo() which copies the contents of the stream to a new stream and a Capacity property which is the size of the stream in memory.

MemoryStream always deals with a byte array, which means if you want to manipulate string data you'll be working with the Encoding class (or possibly the Convert class too). Once you've create a MemoryStream you can't change its capacity. One gotcha with the class is the Write() method.
Write(byte[] buffer,offset,length);
The offset parameter is actually the offset you want it to start from in your byte array, not the offset in the Stream.

An aside about Encodings and Unicode in .NET
One thing that can trip you up when reading character streams in .NET is using the wrong encoding to read byte representations of text. This really only happens if you are on a western computer using the default encoding or ascii. Below is some example code, some characters might appear as '?' in your browser, use a Unicode text editor like Metapad to view the code in or the solution file. GZipStream/DeflateStream
These were added in .NET 2 to the new System.IO.Compression namespace to provide compression and decompression, in particular with ZIP files. There are no helper readers or writers for the 2 classes, so common tasks like zipping a folder are quite cumbersome. The examples below don't stray much from the MSDN documentation, I've chunked the functionality to make it a bit clearer and concise.

. NB The GZipStream doesn't support adding files to an archive, as MSDN states:

"...however, this class does not inherently provide functionality for adding files to or extracting files from .zip archives"

The GZipStream is purely for compressing a stream of bytes, it's not intended to act as a zipping library like the SharpZipLib.

Writing Flushing your Streams
Flush is available in all Stream and Writer classes. Quite often it's essential that you Flush() before you read back, otherwise the data you have previously written to the stream doesn't get written to the Stream's buffer. A good example of this is the NetworkStream example further down.

Flush performs the following actions on the Stream or *Writer classes:

StreamWriter - Calls write if necessary on the stream, and then flush on the stream.
TextWriter - Does nothing
StringWriter - Does nothing
BinaryWriter - Calls flush on the stream

Stream - Abstract method
BufferedStream - Calls write or read, depending on the current write position. Write actually calls Write and then Flush on the underlying stream, while reed seeks to the end of the stream.
NetworkStream - Does nothing
FileStream - Does a lot!
MemoryStream - Does nothing
Manipulating Streams - the Reader/Writer helper classes
On first glance there seem to be a lot of different ways of doing the same thing with reading and writing inside the IO namespace in .NET. Writing and reading from a Stream object can be done with any of the classes in the image below.

readerwriter
The class hierachy can infact be simplified into just 2 types of classes: BinaryReader/Writer and TextReader/Writer and their derived classes.

BinaryReader/BinaryWriter
The BinaryReader/Writer classes are intended for writing simple types to files. The classes support bool,float,integerss,strings and more along the same kind of lines as the Java DataInputStream and DataOutputStream classes. They support different encoding types. You might be mistaken for thinking they are intended for writing and reading data in as bytes, however this is generally the jobs of the StreamReader/Writer classes (or just raw manipulation of the Stream itself).

BinaryWriter
BinaryWriter has 4 methods:Close,Flush,Seek,Write and a BaseStream property for the stream it's writing to. It implements IDisposable so can be wrapped inside a using() clause just like the Stream classes could, which it most usually will be.

Strings

The BinaryWriter writes a string using the encoding you specify in the constructor, or the default encoding (Windows 1252 for most Western users as mentioned above). How does it know where to read to? It prepends the string length and then the string itself. For example:
This comes out as
0b 48 65 6c 6c 6f 20 77 6f 72 6c 64
So 0b (11) is the length of "Hello world" followed by the string. If you have written a different format such an integer and then read it back in, it will just try to read this back using the symbol table for the encoding you have. For example: will come back as two strings: "Hello world" and "\vHello worl". Changing the line (byte) 11 to (byte) 255 gets you an EndOfStreamException.

Ints
The above is displayed as:
01 00 00 00 0a 00 00 00 64 00 00 00 e8 03 00 00 10 27 00 00 15 cd 5b 07
The output is Little Endian Format, which puts the Least Significant Bit (LSB) first, or in other words the digits are read right to left. As you can see ints are 0 zero padded and assumed to be 32bit integers rather than scaling it up and down according to the number (e.g. only using 1 byte for the 1, 10, 100, 1000, 10000).

If you want to write integers/floats out using Big Endian, Jon Skeet has written a utility for doing so here.

Bools
Bools are written as 1 byte integers of value 1 or 0. This appears to waste space, but makes sense as the Reader would be unable to tell whether something like 10000000 (8 bits) was true,false,false,false,false,false,false,false, i.e. 8 bool values, or just 1 true value and 7 empty values.

Small ints
Small int only uses 2 bytes, so the above is written as (padded zeros)0x6400

Floats
Looking at the source code of BinaryWriter, floats (single) are written the following way using unsafe code: This is takes value (a float) and deferences it, getting the memory address's contents which is a hexadecimal.

So for 0.01f you get 0x3c23d70a.

The lines that follow split this value into its 4 byte parts, the first is the last byte (0a - conversion to a byte drops the other 3 bytes). It then extracts d7 by bit shifting to the right by 8 bits, and then again by 16 bits, and finally 24 bits.

Writing a double uses a similar technique too. If you're curious how 0.01 came to be represented as 0x3c23d70a, take a look at this tool to see how IEEE-754 floating point numbers (single/double value types) are stored in memory in the CLR. 1 bit is used for the sign, 8 bits for the exponent, and the remaining 23 bits (for single precion/float) for the mantissa or significand.

BinaryReader
This is a straight forward case of reading back the values you've written. If you try to replace int32 values with int16 it reads them back as 0. The same applies for bool values, which takes the byte and converts from there. It is a forward only reader. You can optionally just read the entire byte stream back as shown in the comments below, and then process as you want, but this is suited for a stream that you didn't write or know the format of.

TextReader/TextWriter
The abstract TextReader and TextWriter are the basis for the StreamReader/Writer and StringReader/Writer classes. These 4 classes are geared towards reading/writing text, as the base classes imply.

TextReader supplies the following methods

- Close
- Peek
- Read
- ReadBlock
- ReadLine
- ReadToEnd

TextWriter provides:

- Close
- Flush
- Write
- WriteLine

Anyone that has written custom server controls will be familiar with the TextWriter as the HttpTextWriter is derived from it.

StringWriter
StringWriter uses a StringBuilder to write strings in a very similar way to StringBuilder, but obviously without being able to read back (it actually uses a StringBuilder behind the scenes). A common use is with the XmlTextReader in the System.XML namespace, which takes a StringWriter in one of its constructors. This is the easiest way of writing XML in memory without worrying about using MemoryStreams.

Under the hood the StringWriter really doesn't do anything more complex than stringBuilder.Append() using the StringBuilder you provide. From TextWriter, it provides a large set of Write() overloads that take various .NET value types, converting them to their string equivalent.

StringReader
StringReader takes a string for its constructor and then allows you to read from the string using the methods the TextReader base class hands to it. It doesn't add any new methods from the base class. Behind the scenes, the to ReadToEnd() simply returns the string, or does a substring if you are advanced pass position zero. It tracks the position you're at in the character array for this. For to ReadLine() it checks (hardcodes infact) for \r and then \n and returns the previous line it captured.
The output from the above is:

 line
 of text
ther line
StreamReader
StreamReader as the name implies, reads streams. However its purpose is to read text-based streams rather than binary ones, which is why it's derived from TextReader. It doesn't add any new methods from TextReader, although it does give you 3 new properties: BaseStream,CurrentEncoding,EndOfStream. It defaults to UT8 if no encoding is set.

The ReadToEnd() method uses a StringBuilder internally to read through the backing string. ReadLine() is just like the StringReader class's implementation, but using a StringBuilder rather than a string to build up the string.

Below is the example which featured in the NetworkStream section, this time using a StreamReader (and Writer) and also a different URL that returns more (HTML) textual data. StreamWriter
The StreamWriter does the text equivalent of the BinaryWriter, writing various datatypes to the stream you give it, but as plaintext rather than a byte representation like the BinaryWriter does. The example below illustrates how it's used to write both plain text, and how it translates to a byte array with MemoryStream.

Closing writers and streams
All of the classes mentioned in this article (except File of course) implement IDisposable. This allows you to wrap them in the using() clause which ensures they are efficiently disposed and collected by the GC.

In dealing with the streams and the helper classes this way, you don't need to worry about calling Close, as the Dipose() methods do this for you, either directly on the stream or on the underlying stream in the case of the reader/writer helpers.

System.IO and System.XML
The commonest use inside the framework class library for the stream read/writers is inside the System.XML namespace. The XmlTextReader and XmlReader classes both take a TextReader (the latter in its constructor, former in the static Create method).

You can use a StringReader in conjunction with these classes to read in-memory XML strings, as the code below shows
IO Exception handling
The IO exception hierachy (below) is fairly straight forward, with specialized classes for catching specific errors like the common error of not finding the file.

ioexception.gif

With several of the IO classes like FileStream you will need to nest catching to gracefully close the stream, for example


Of course you could catch the whole lot in an IOException, but this is widely accept as a bad practice in exception handling; you should only catch what you are expecting to fail, leave the rest to the caller, or a more generic exception handler such as Application.ThreadException in windows forms app.

Other exceptions you need to watch out for when performing IO operations, that don't inherit from IOException:

System.UnauthorizedAccessException.
One scenario this can happen is if you try to open a file that is readonly, and want to write to it. File.Open does this when you don't set the FileAccess.Read as a parameter, as it sets the FileStream to FileAccess.ReadWrite.

{"The process cannot access the file 'C:\\xxx' because it is being used by another process."}
These errors occur when something else is opening and using your file. They throw an IOException rather than an UnauthorizedAccessException.

System.NotSupportedException.
This can occur if you try to Write() to a stream that has been opened with read access only.

System.ArgumentException.
This is thrown if you try to read from a stream at a point that doesn't exist in it, the "offset and length were out of bounds" error.

System.OutOfMemoryException.
Reading very large files (eg dvd-size) or buggy in-memory manipulation with MemoryStreams can be culprits for this. Test it for yourself using:
byte[] b = new byte[int.MaxValue];


Shared Source Common Language Infrastructure can be viewed online at Koders.com. It estimates the project cost was $14m and apparently 0.21% of it is Perl!



› Home
› C#
› Snippets
› Articles
› Tools
› Taglines
› ASP
› Dictionary Object
› FSO
› Unix cheat sheet
› Gaming
› CSS
› Yak
› Umbraco
› About
› Contact
› Privacy
› Projects
› Search