Sunday, 23 May 2010

Retrospective Learning

This post was inspired by a question I found on Stack Overflow recently:

In the theme of the stackoverflow podcast, here's a fun question: should I learn C? I expect Jeff & Joel will have something to say on this.

Some info on my background:

•Primarily a Java programmer on "enterprisy" systems.
•Favorite languages: python, scheme
•7 years programming experience
•A very small amount of C++ experience, practically no C experience
•No immediate "need" to learn C
So should I learn C? If so, why? If not, why?

My answer was essentially no.

Over the last year or two I have come to the conclusion that technology simply advances too quickly for a single person to learn everything he or she may need ahead of time. But the real question is do we need to?

Case in point: A little over a year ago I was job hunting and I had to sit a brain bench* test in C# .net 2.0. I'd just passed the development fundamentals MCP in .net 2.0 and I felt like I had a good handle on the subject so I fired up the brain bench website and started the test.

The first question had me reeling; it was so obscure, so specific, and so difficult it nearly blew my mind. I mentally shook myself and decided to skip to the next question. Again, so difficult I couldn't begin to answer it. The questions seemed to require the testee to have a very fine grained knowledge of the .net 2.0 documentation. I'd studied enough to pass an MCP, but there's no way I could remember every esoteric detail of the entire framework...

Now I'm in a bit of a panic; I'm job hunting and I don't want a poor score in this test to destroy my reputation. I mean I've already been developing for a few years with mostly positive feedback from my peers - I like to think I'm pretty good at what I do!

What are the rules for Brainbench tests?
Unlike many standardized tests, individuals are encouraged to use reference materials [...]

The test rules state that references can be used, but the test must be completed by the testee alone. There are 3 minutes per question and the clock is rapidly counting down on impossible question #3. I didn't imagine I'd have to, but I fire up the msdn website and do a quick search... Success! I select the correct answer from the options and move on to impossible question #4. Again the msdn holds the key to victory.

So I traverse the sequence of 26 remaining impossible questions and manage to find an answer to all of them. In the end I achieve a pretty decent score. My transcript number is 7736779 if you're interested in taking a look (brainbench.com).

Which brings me back to my original question: Is there a difference between knowing something and being able to find out in less than 3 minutes? Or, if I can find the correct answer to a seemingly impossible question in less than 3 minutes is that functionally equivalent to knowing the answer? Is there any point in forcing ourselves to study?

Or am I just lazy?

* In my opinion brain bench tests are artificially difficult and utterly fail to measure a person's abillity to develop software.

Saturday, 22 August 2009

The 7 Essential Visual Studio Tools

Visual Studio 2008 is the best of the development environments that I've had the pleasure of working with so far. For all that people enjoy putting them down, in my opinion the developers at Microsoft have delivered an excellent product that is both fast and easy to use! (Which isn't to say that it can't be improved - argument vs format string validation is sweet!) This week I'm posting about some of the useful goodies that come with using Visual Studio - things that you technically don't need to know to get the job done but are very handy anyway. I'm listing them in the order that I personally use most frequently:
  1. Find all references. Lists everywhere that a method or variable is used, including the definition. This is really useful when you're trying to work out where a variable is being assigned, or cleared. Right click a method or variable and select Find all references.
  2. Extract method. You've written a bunch of code that gets the job done but it's all inline. It looks horrible, and it's not reusable. Highlight all the code you'd like to break out into a new method, right click and select Extract method. Provide a method name and Visual Studio will create a method for you and even specify the parameters and return value!
  3. The Exceptions window. Press alt + ctrl + e to show this window. Not sure where exactly an exception is being thrown from? Check the Common Language Runtime option in this window and then execute the code.
  4. Call stack window. While debugging, lets you see at a glance the path of method calls that have been made to get you to your current point in the code. Double clicking a method in this window takes you to where the method was called, and let's you see the state of all the local variables. Really handy for debugging. Access through the Debug > Windows menu.
  5. Threads window. While debugging, lets you see how many threads you have executing. Double clicking a thread in this window takes you to its current point of execution in the code. Also accessed through the Debug > Windows menu.
  6. Automatic unit testing. Right click in your code and select Create unit tests... This brings up a dialog that lets you pick which methods to write unit tests for. Visual Studio will then generate a basic unit test for every method you select.
  7. Record/play macro. So you need to perform a repetitive typing operation on your code. Hit shift + ctrl + r to start recording a macro, perform the operation, and hit shift + ctrl + r again. Now every time you press shift + ctrl + p visual studio will perform your typing operation again!

That's probably enough to get a fresh faced youngster up to speed. I'm sure any old hands out there have plenty more to add to this list.

Monday, 10 August 2009

Forrays into Linq #2 - Effective use of Extension Methods

Today I'm talking a little bit about extension methods - where they come from, how to implement them and what I feel they are good for.

Extension methods were added to the .net framework to streamline native support for Linq. As you probably already know, Linq methods can be used on any object implementing the IEnumerable or IQueryable interfaces. What you may not have noticed is that the IEnumerable and IQueryable interfaces don't know anything about Linq. If you think about it, the easiest way to make objects that implement these interfaces accessible to Linq would have been to simply add method definitions to them. Then when a person creates a class that implements one or more of these interfaces they would simply create all the methods defined in the interface. Ctrl + '.' makes that a snap, no? Well it would if there weren't more Linq methods than you can shake a stick at. So, the clever people at Microsoft decided it would be best to expose the Linq methods in an entirely new way.

So, now that we have a new way of creating functionality how do we actually do it? Well, the syntax is simple if not entirely obvious in what it is actually doing. (Personally I feel that .net 3.5 has introduced a lot of ugliness into C# all in the name of Linq; one of the reasons I left C++ for C# was that there were less symbols cluttering up the code. Referencing and dereferencing pointers, accessing members and methods of an object through a pointer, and all that didn't make C++ particularly accessible. But I digress.) Let's say we want to add a method to the generic List class. I think it would be quite nice if we had a IsNullOrEmpty method like the static String method of the same name:


List<string> myListOfStrings;
if (myListOfStrings.IsNullOrEmpty())
return;

To do this we must first create a public, static class that houses our event methods - the name of the class is unimportant to the compiler but we should give it a good name so we know what's going on when we come back to it in 6 months time. Then we add a public, static method where the first parameter uses the keyword 'this':


public static class MyListExtensions
{
public static bool IsNullOrEmpty<T>(this List<T> thisList)
{
if (thisList == null || thisList.Count == 0)
return true;
return false;
}
}

A quick explanation of the method footprint: The first parameter is the instance of the object type you are extending. The 'this' keyword tells visual studio and the compiler that the method is an extension method for the type List. Next time you call up intellisense for a generic List object it will include this method with the (extension) tag. You can define as many parameters as you want after the 'this' parameter and have any return type. Interestingly the IsNullOrEmpty method on the String class is static because normally if you try to call a method, or access a property, on a null object you will get a NullReferenceException. This demonstrates that although extension methods look a lot like instance methods, they are fundamentally very different.

Some things to keep in mind when writing extension methods:

  1. Extension methods with the same footprint as existing methods will not get called. Try extending the Object class with a method called IsNullOrEmpty. If you use that method on the String type your extension method won't be called - the existing method will be.
  2. Create at least one separate namespace for your extension methods. It's probably worth subdividing your extensions namespace by type/functionality. That way you and your colleagues will only have to browse through the extension methods you are explicitly interested in while coding.

Personally I think extension methods are very useful in making code more readable. For instance, when we use Events one of the things we always have to do is check for subscribers. Ie:


if (MyEvent != null)
MyEvent(this, new EventArgs());

It's a pattern that is used everywhere but it really is ugly and not particularly human-readable. Now though, we can create an extension method on the EventHandler class and then write code that would perhaps look more like this:


if (MyEvent.HasSubscribers())
MyEvent(this, new EventArgs());

Much better!

Wednesday, 15 July 2009

Working with Windows - Converting local paths to UNC format

This week I'm posting a solution to a problem that I couldn't find an answer to using the internet, which is to say converting a local folder path to its UNC equivalent. The problem was borne from a system that used a central platform (server) for exporting data that is controlled remotely. So if a user wants something from the system he or she provides a path that tells it where to put that something. So far so good, but if the user gives a rooted path of the format "C:\..." the requested item gets saved to the server's local file system, which is a high security zone. Not good.



So onto the solution - a way to get a folder path from the user that can only be a fully qualified UNC path. As with most problems we start by looking around to see if there are any technologies that can make our job easier. The chances are somebody out there has done this, or something very similar, before and been kind enough to post it in the internet. A quick inspection of the FolderBrowserDialog reveals it can be initialised in such a way to give the user a view of his or her My Network Places, thus forcing the user to select a network path. Sounds good but in practise we find that if the user selects a shared folder from the My Network Places that points to a folder on his or own local system the FolderBrowserDialog will automatically convert it to a local file path. So close, but so tragically flawed (for our purposes)!



Scanning the documentation for the FolderBrowserDialog doesn't reveal a way to stop it from being too damned clever, so we begin the hunt for a way to change the local file path back into a UNC file path. We already know that the path points to an existing network resource so we don't need to do any validation, we just need to run a conversion on it. After a little digging at pinvoke we discover a method in the netapi32 library that will give us information on all of the shared folders on the local system including the name of the shared resource on the network. The method we're interested in is called EnumNetShares and someone has even been kind enough to post a C# implementation. So, now we can build up a picture of the required workflow:

  1. Show the user the My Network Places FolderBrowserDialog to get a UNC folder path.
  2. If the file path provided by the FolderBrowserDialog has the format '\\MachineName\...' finish here.
  3. Use the EnumNetShares netapi32 method to retrieve information on all the shares on the local machine.
  4. Find a share that has a local path that matches the beginning of the path picked by the user.
  5. Build up a UNC path from the local machine name, the path of the share, and the path picked by the user.

Mission complete. For those of you who prefer to read code rather than paragraphs a code snippet is listed below. I copied the GetEnumShares class from the pinvoke site.





using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;

namespace ConsoleApplication1
{
class Utilities
{
/// <summary>
/// Takes a local file path and translates it into a UNC file path where possible.
/// </summary>
/// <param name="path">Path to convert to UNC.</param>
/// <returns>If possible UNC path otherwise the original file path.</returns>
public static string GetUniversalPath(string path)
{
string universalPath = path;
try
{
DirectoryInfo di = new DirectoryInfo(path);
GetNetShares gns = new GetNetShares();
foreach (GetNetShares.ShareInfo shareInfo in gns.EnumNetShares("127.0.0.1"))
{
if (shareInfo.Path.Length > 0 && path.StartsWith(shareInfo.Path, StringComparison.OrdinalIgnoreCase))
{
string pathRemainder = path.Substring(shareInfo.Path.Length);
return BuildPath(String.Concat(@"\\", Environment.MachineName), shareInfo.ShareName, pathRemainder);
}
}
}
catch
{ }

return universalPath;
}
public static string BuildPath(string pathPart1, string pathPart2, string pathPart3)
{
StringBuilder pathBuilder = new StringBuilder();
pathBuilder.Append(pathPart1);
if (pathPart1[pathPart1.Length - 1] != '\\')
pathBuilder.Append("
\\");
pathBuilder.Append(pathPart2);
if (pathPart2[pathPart2.Length - 1] != '\\')
pathBuilder.Append("
\\");
pathBuilder.Append(pathPart3);

return pathBuilder.ToString();
}
public class GetNetShares
{
#region External Calls
[DllImport("
Netapi32.dll", SetLastError = true)]
static extern int NetApiBufferFree(IntPtr Buffer);
[DllImport("
Netapi32.dll", CharSet = CharSet.Unicode)]
private static extern int NetShareEnum(
StringBuilder ServerName,
int level,
ref IntPtr bufPtr,
uint prefmaxlen,
ref int entriesread,
ref int totalentries,
ref int resume_handle
);
#endregion
#region External Structures
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct ShareInfo
{
public string ShareName;
public uint ShareType;
public string Remark;
public uint Permissions;
public uint MaxUses;
public uint CurrentUses;
public string Path;
public string Password;

public ShareInfo(string netname, uint type, string remark, uint permissions, uint max_uses, uint current_uses, string path, string password)
{
ShareName = netname;
ShareType = type;
Remark = remark;
Permissions = permissions;
MaxUses = max_uses;
CurrentUses = current_uses;
Path = path;
Password = password;
}
}
#endregion
const uint MAX_PREFERRED_LENGTH = 0xFFFFFFFF;
const int SuccessCode = 0;
private enum NetErrorResults : uint
{
Success = 0,
BASE = 2100,
UnknownDevDir = (BASE + 16),
DuplicateShare = (BASE + 18),
BufTooSmall = (BASE + 23),
}
private enum SHARE_TYPE : uint
{
DiskTree = 0,
PrintQ = 1,
Device = 2,
IPC = 3,
Special = 0x80000000,
}
public List<ShareInfo> EnumNetShares(string Server)
{
List<ShareInfo> ShareInfos = new List<ShareInfo>();
int entriesread = 0;
int totalentries = 0;
int resume_handle = 0;
int nStructSize = Marshal.SizeOf(typeof(ShareInfo));
IntPtr bufPtr = IntPtr.Zero;
StringBuilder server = new StringBuilder(Server);
int ret = NetShareEnum(server, 2, ref bufPtr, MAX_PREFERRED_LENGTH, ref entriesread, ref totalentries, ref resume_handle);
if (ret == SuccessCode)
{
IntPtr currentPtr = bufPtr;
for (int i = 0; i < entriesread; i++)
{
ShareInfo shi1 = (ShareInfo)Marshal.PtrToStructure(currentPtr, typeof(ShareInfo));
ShareInfos.Add(shi1);
currentPtr = new IntPtr(currentPtr.ToInt32() + nStructSize);
}
NetApiBufferFree(bufPtr);
return ShareInfos;
}
else
{
return new List<ShareInfo>();
}
}
}
}
}

Thursday, 9 July 2009

Forays into Linq - Querying a text file.

Hi! I'm Phil and this is my first ever programming themed blog post. I am a software developer of around 7 years currently working in .net 3.5. It is my intention to post about programming matters that I find interesting as I encounter them. Hopefully the result will be as enjoyable to you as it is to me!

To start off I'd like to blog about a program I wrote recently. It was interesting to me as I was able to leverage a couple existing technologies to write something quite concise.

We had a problem with some "corrupted" text files - some MS Office documents had been converted to text files and apparently contained both the body of the text and the accompanying meta data. What was needed was a text file that contained just the body of the text.

A cursory look over some of the files showed that the meta data was always 72 characters wide and consisted of alphanumeric characters and the symbols '/', and '+'. In that case it is a simple matter to write a Regular Expression that will match a line of of meta data:

// regex recognises a row of corrupted data.
Regex corruptedText = new Regex(@"^[a-zA-Z0-9+/]{72}$");
So, we have a method for spotting lines of meta data and now what we need to do is iterate over every line of the file and copy all the lines that don't match to a new text file. Or do we? It would be nice if we could just ask the file to give me all the lines of text that don't match the regular expression, wouldn't it? It turns out that using Linq we can do exactly that!

Linq can be used to query any collection that implements the IEnumerable<> interface. If we had such a collection that had an item for each line of text we could create a Linq expression that used the regular expression to give us all the lines of text that weren't meta data:

IEnumerable<string> goodlines = textFile.Where(dataLine => !corruptedText.IsMatch(dataLine)).Select(dataLine => dataLine);
Where textFile represents a collection implementing IEnumerable. This will result in every instance of dataLine that doesn't get matched by the regular expression being placed in the goodlines collection. Now we have a complete method for outputting the data we want from a collection all that's left is to get the text from the file into said collection. It turns out that this is also easy enough to almost be trivial:

/// <summary>
/// File reader that enumerates every line of text in a file.
/// </summary>
class StreamReaderEx : StreamReader, IEnumerable
{
public StreamReaderEx(string path)
: base(path)
{ }
#region IEnumerable Members
public IEnumerator GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
}
Let's take a minute to look at what we have here. The StreamReaderEx class inherits from the StreamReader class and implements the IEnumerable interface. Inheriting from the StreamReader class gives us all the functionality we need to read text files from the file system, and implementing the IEnumerator interface lets Linq in on the action too. We want our Linq expression to query each line of data so our implementation of IEnumerable<>.GetEnumerator() should do a yield return on each line of text from the text file. Using the yield return keywords means that the compiler will generate the enumerator class for us in the background.

So there we have it! The result is we have an application that is mostly declarative rather than imperative. This means that rather than write detailed instructions for every step of the process we can just ask for a specific result and the compiler will interpret that request and transform it into detailed instructions for us. The advantages of being declarative rather than imperative are that the code is faster to write, easier to maintain, and more concise, and the compiler can make optimisations as it sees fit. The disadvantage is that we lose control of the fine details of the execution.

The full code listing is shown below for the sake of completeness. Unfortunately all the formatting around the code gets messed up by the blog editor; I'm looking into a way of tidying that up.

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace CleanTextFiles
{
class Program
{
static void Main(string[] args)
{
// regex recognises a row of corrupted data. Regex
corruptedText = new Regex(@"^[a-zA-Z0-9+/]{72}$");
foreach (string corruptedFilePath in Directory.GetFiles(@"D:\Data\"))
{
string cleanedFilePath = String.Concat(Path.GetDirectoryName(corruptedFilePath),
@"
\clean\",
Path.GetFileName(corruptedFilePath));
Directory.CreateDirectory(Path.GetDirectoryName(cleanedFilePath));
using (StreamReaderEx reader = new StreamReaderEx(corruptedFilePath))
using (StreamWriter writer = new StreamWriter(cleanedFilePath))
{
WriteLines(reader, writer, corruptedText);
}
}
}
/// <summary>
/// Write every line of text from reader into writer that doesn't match the corruptedText regex pattern.
/// </summary>
static void WriteLines(StreamReaderEx reader, StreamWriter writer, Regex corruptedText)
{
IEnumerable goodlines = reader.Where(dataLine => !corruptedText.IsMatch(dataLine)).Select(dataLine => dataLine);
foreach (string goodLine in goodlines)
writer.WriteLine(goodLine);
}
}
/// <summary>
/// File reader that enumerates every line of text in a file.
/// </summary>
class StreamReaderEx : StreamReader, IEnumerable
{
public StreamReaderEx(string path)
: base(path)
{ }
#region IEnumerable Members
public IEnumerator GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
}

}