a vanguard against confusion: a better way

Showing posts with label a better way. Show all posts

2010-08-02

My new favourite tool

During the course of my work, I use a hex editor a lot.

Specifically, I use a hex editor most for reverse engineering binary file formats that have no documentation or for fixing corrupted files, and the like. One thing that I've always wanted, was some way to view the binary contents as structured data.. Like "Starting at this byte offset, consider the next four bytes to be an integer, and show me that integer, then, using that integer, take that many bytes immediately following it, consider them to be string data, and decode as UTF8 or EBCDIC... etc.

All that is fairly complex, and generally well beyond the facilities of anything short of a fairly low level and full-featured programming language.

Well, looks like the folks over at SweetScape realized this is a workflow that at least SOME people need to have... and so they built the perfect tool for doing that. It's called the 010 Editor. It has this great feature called Binary Templates as well as scripts. It's the bomb, and it's accelerated my reverse engineering work by at least an order of magnitude.

I'm buying a license, and anyone know know me will realize that's a pretty big deal. I'm a big fan of FOSS and generally try to use as much of it as possible, avoiding commercial apps... but this is a big exception. SweetScape is a small company run by a father and son team. Hardly "The Man". Lowell and Graeme Sweet -- you rock the block and treat the bytes right. ;)

2009-10-22

An Infinite Stream Of Bytes

No, I'm not about to wax poetic about the deep ontological issues raised in The Matrix, or speak meaningfully about how transient the modern world of communication is and how the artifacts of our lifetime have become ephemeral such that our posterity will not be able to remember us, even if they wanted to.

Instead I'm going to post a code snippet that solves an annoying little scenario that comes up every now and again when writing parsers.

Basically, it goes like this:

You're writing a parser, and you need to check every byte in a stream of bytes coming from a file/network/etc.. You might need to read forward or read backward a little, to match a multi-byte pattern or value within n bytes of another value. You figure instead of "peeking and seeking" against the stream (what it's read-only!?!?), your parser can just stored the state, and still only look at a single byte at a time. That's great and all, and you do a quick implementation using stream.ReadByte, which seems to work...

Except it's slow. You know from experience that block reads are way faster, and you want to read a block of data that's say 1k or 4k from your stream, and then parse that, fetch another block, parse that, etc... but what if your pattern straddles two blocks? What if the first byte of a two byte sequence is the last byte in a block and the next block's first byte is the second character? Now your parser needs to stop what it's doing, exit the loop, go grab some more data, then restart it's iteration over that.. You could build all that behaviour into your parser (for every parser that you write).. but it's non-trival to deal with. In fact it's a real pain in the butt to refactor a parser to work that way.

Also, you think to yourself "Man... It would be SOOOOooooo much nicer if I could just write a foreach loop, and like get every byte in the stream in one bit long iteration... Why doesn't System.IO.Stream implement IEnumerable?!?" It totally makes sense that it should...

Anyhow, story's over. Here's the code to solve it:


public static IEnumerable<byte> GetBytesFromStream(Stream stream)
{
    const int blockSize = 1024;

    byte[] buffer = new byte[blockSize];
    int bytesRead;

    while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
    {
        for (int i = 0; i < bytesRead; i++)
        {
            yield return buffer[i];
        }
    }
}

And in case it's not obvious, I'll explain what this little guy does. It does a block read from the stream (adjust your blocksize to suit or make it a parameter), iterates over the block, uses the yield keyword to return bytes via the IEnumerable<T> interface. The while loop checks the return value of stream.Read() to see if it returns zero, which means, basically, the stream is done (EOF). If there was a partial read (e.g. less than your blocksize buffer) bytesRead will be the amount that DID successfully read, and so your for loop that is iterating over the block uses bytesRead to ensure we only return valid data (if we had used buffer.Length or blockSize, and had a partial read, the stuff after the "new data" would be data from the last read. NOT COOL!).

You could stick this method in your utility class if you'd like, or make a wrapper class that wraps Stream and implements IEnumerable<byte>... whatever you want. Maybe you want to be all modern and cool and make it an extension method for Stream.

Here's an example wrapper class:


public class EnumerableStream : Stream, IEnumerable<byte>
{
    private readonly Stream _baseStream;

    public EnumerableStream(Stream stream)
    {
        _baseStream = stream;
    }

    public IEnumerator<byte> GetEnumerator()
    {
        var bytes = GetBytesFromStream(_baseStream);
        return bytes.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
    
    private static IEnumerable<byte> GetBytesFromStream(Stream stream)
    {
        const int blockSize = 1024;

        byte[] buffer = new byte[blockSize];
        int bytesRead;

        while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
        {
            for (int i = 0; i < bytesRead; i++)
            {
                yield return buffer[i];
            }
        }
    }

    public override bool CanRead
    {
        get { return _baseStream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return _baseStream.CanSeek; }
    }

    public override bool CanWrite
    {
        get { return _baseStream.CanWrite; }
    }

    public override void Flush()
    {
        _baseStream.Flush();
    }

    public override long Length
    {
        get { return _baseStream.Length; }
    }

    public override long Position
    {
        get
        {
            return _baseStream.Position;
        }
        set
        {
            _baseStream.Position = value;
        }
    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        return _baseStream.Read(buffer, offset, count);
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return _baseStream.Seek(offset, origin);
    }

    public override void SetLength(long value)
    {
        _baseStream.SetLength(value);
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        _baseStream.Write(buffer, offset, count);
    }
}

And an example of the extension method way...


public static class StreamExtensions
{
    public static IEnumerable<byte> GetBytes(this Stream stream)
    {
        const int blockSize = 1024;

        byte[] buffer = new byte[blockSize];
        int bytesRead;

        while ((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
        {
            for (int i = 0; i < bytesRead; i++)
            {
                yield return buffer[i];
            }
        }
    }
}

Enjoy!

2007-06-08

IComparable and Egocentrism

Today, on the ride home from work on the MAX train (local light-rail here in Portland, OR), I overheard a girl talking to some young Hispanic men. She was babbling on in a typically "White American" way, about cultural differences, and how "We're really more alike than we are different." and that popular media tries to force differences down our cultural throats through advertisements and TV (evil incarnate).

While her stance is in many ways similar to my own thinking, I still felt compelled to consider how I would respond if I were having the conversation with her... It would go something like this...

Why do we put such a fine point on our differences? Why do we go to war over skin colours, eating habits, clothing choices, and other such nonsense? Because human beings are intrinsically scared shitless of sameness. Internally, we must compare everything. We are so bound up in the process of comparison logic, that it permeates our every action. Is this bad or good? Better or worse? Bigger or smaller? Subordinate or superordinate? Our base class is IComparable.

These thoughts consume our lowest level drives.. To be a good person.. To get ahead in life... To be comfortable (as opposed to NOT comfortable, and any degree of comfortable is better than any degree of uncomfortable). To be powerful.. not just powerful, but specifically more powerful than you were before, or more powerful than the other guy.

So we focus on our differences, because through our differences we can find something, ANYTHING to make us special, better, to return 1 on our .CompareTo() call for at least one property.

This got me to thinking about the implementation of IComparable in .NET/C#. Isn't it quite egocentric? To presume that the scope of knowledge within a single object type is sufficient to allow it to be compared to any other type? To consider that I know how to compare myself to any other thing, even if I don't know what that thing is? That notion is quite absurd. What I find interesting about the implementation is the .CompareTo() takes an untyped object as a parameter. Doesn't it follow that an object of a given type should only be able to compare itself to something else of the same type? That in order to compare to an object of some other type it must at least be able to be converted to that type first, so that it can be compared on equal terms?

There's a lot of discussion about that implementation. It could be argued that it's valid, but nonetheless, it's completely egocentric. How do you resolve a scenario, where both foo.CompareTo(bar) and bar.CompareTo(foo) both return 1? Which one sorts higher in the call to SortedList.Sort()? or do they simply not change position relative to one another ever? So first come first serve?

What if IComparable worked differently? I envision it this way... Image a static object called System.Judge. System.Judge has a method .Compare which takes any two objects that implement IComparable. The interface for IComparable requires the object to maintain a property .CompareValues which contains a list of all the values it maintains that it is willing to offer up during comparison, organized by Type, Name, Value. The Judge accesses foo.CompareValues.Types to get a list of types that it is willing to be compared to. Judge calls that from both objects, until it finds a list of compatible types to start comparison with. For all comparable matching types, a comparison result is achieved, and then an average of comparison is evaluated, and the object with the highest average of comparison success is considered the victor. The .CompareTo call would naturally be nested calls on the various IComparable types presented until finally a value type with a fixed, built-in comparision method is found and stops the nesting compare calls.

This sytem would of course be more complicated, and require a lot more processing for each call, resulting in much slower performance.. Ah but the logic would be sound, and that, my friends, is much more valuable than processing time.

Good night.