Wednesday 15 July 2009

Working with Windows - Converting local paths to UNC format

This week I'm posting a solution to a problem that I couldn't find an answer to using the internet, which is to say converting a local folder path to its UNC equivalent. The problem was borne from a system that used a central platform (server) for exporting data that is controlled remotely. So if a user wants something from the system he or she provides a path that tells it where to put that something. So far so good, but if the user gives a rooted path of the format "C:\..." the requested item gets saved to the server's local file system, which is a high security zone. Not good.



So onto the solution - a way to get a folder path from the user that can only be a fully qualified UNC path. As with most problems we start by looking around to see if there are any technologies that can make our job easier. The chances are somebody out there has done this, or something very similar, before and been kind enough to post it in the internet. A quick inspection of the FolderBrowserDialog reveals it can be initialised in such a way to give the user a view of his or her My Network Places, thus forcing the user to select a network path. Sounds good but in practise we find that if the user selects a shared folder from the My Network Places that points to a folder on his or own local system the FolderBrowserDialog will automatically convert it to a local file path. So close, but so tragically flawed (for our purposes)!



Scanning the documentation for the FolderBrowserDialog doesn't reveal a way to stop it from being too damned clever, so we begin the hunt for a way to change the local file path back into a UNC file path. We already know that the path points to an existing network resource so we don't need to do any validation, we just need to run a conversion on it. After a little digging at pinvoke we discover a method in the netapi32 library that will give us information on all of the shared folders on the local system including the name of the shared resource on the network. The method we're interested in is called EnumNetShares and someone has even been kind enough to post a C# implementation. So, now we can build up a picture of the required workflow:

  1. Show the user the My Network Places FolderBrowserDialog to get a UNC folder path.
  2. If the file path provided by the FolderBrowserDialog has the format '\\MachineName\...' finish here.
  3. Use the EnumNetShares netapi32 method to retrieve information on all the shares on the local machine.
  4. Find a share that has a local path that matches the beginning of the path picked by the user.
  5. Build up a UNC path from the local machine name, the path of the share, and the path picked by the user.

Mission complete. For those of you who prefer to read code rather than paragraphs a code snippet is listed below. I copied the GetEnumShares class from the pinvoke site.





using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;

namespace ConsoleApplication1
{
class Utilities
{
/// <summary>
/// Takes a local file path and translates it into a UNC file path where possible.
/// </summary>
/// <param name="path">Path to convert to UNC.</param>
/// <returns>If possible UNC path otherwise the original file path.</returns>
public static string GetUniversalPath(string path)
{
string universalPath = path;
try
{
DirectoryInfo di = new DirectoryInfo(path);
GetNetShares gns = new GetNetShares();
foreach (GetNetShares.ShareInfo shareInfo in gns.EnumNetShares("127.0.0.1"))
{
if (shareInfo.Path.Length > 0 && path.StartsWith(shareInfo.Path, StringComparison.OrdinalIgnoreCase))
{
string pathRemainder = path.Substring(shareInfo.Path.Length);
return BuildPath(String.Concat(@"\\", Environment.MachineName), shareInfo.ShareName, pathRemainder);
}
}
}
catch
{ }

return universalPath;
}
public static string BuildPath(string pathPart1, string pathPart2, string pathPart3)
{
StringBuilder pathBuilder = new StringBuilder();
pathBuilder.Append(pathPart1);
if (pathPart1[pathPart1.Length - 1] != '\\')
pathBuilder.Append("
\\");
pathBuilder.Append(pathPart2);
if (pathPart2[pathPart2.Length - 1] != '\\')
pathBuilder.Append("
\\");
pathBuilder.Append(pathPart3);

return pathBuilder.ToString();
}
public class GetNetShares
{
#region External Calls
[DllImport("
Netapi32.dll", SetLastError = true)]
static extern int NetApiBufferFree(IntPtr Buffer);
[DllImport("
Netapi32.dll", CharSet = CharSet.Unicode)]
private static extern int NetShareEnum(
StringBuilder ServerName,
int level,
ref IntPtr bufPtr,
uint prefmaxlen,
ref int entriesread,
ref int totalentries,
ref int resume_handle
);
#endregion
#region External Structures
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public struct ShareInfo
{
public string ShareName;
public uint ShareType;
public string Remark;
public uint Permissions;
public uint MaxUses;
public uint CurrentUses;
public string Path;
public string Password;

public ShareInfo(string netname, uint type, string remark, uint permissions, uint max_uses, uint current_uses, string path, string password)
{
ShareName = netname;
ShareType = type;
Remark = remark;
Permissions = permissions;
MaxUses = max_uses;
CurrentUses = current_uses;
Path = path;
Password = password;
}
}
#endregion
const uint MAX_PREFERRED_LENGTH = 0xFFFFFFFF;
const int SuccessCode = 0;
private enum NetErrorResults : uint
{
Success = 0,
BASE = 2100,
UnknownDevDir = (BASE + 16),
DuplicateShare = (BASE + 18),
BufTooSmall = (BASE + 23),
}
private enum SHARE_TYPE : uint
{
DiskTree = 0,
PrintQ = 1,
Device = 2,
IPC = 3,
Special = 0x80000000,
}
public List<ShareInfo> EnumNetShares(string Server)
{
List<ShareInfo> ShareInfos = new List<ShareInfo>();
int entriesread = 0;
int totalentries = 0;
int resume_handle = 0;
int nStructSize = Marshal.SizeOf(typeof(ShareInfo));
IntPtr bufPtr = IntPtr.Zero;
StringBuilder server = new StringBuilder(Server);
int ret = NetShareEnum(server, 2, ref bufPtr, MAX_PREFERRED_LENGTH, ref entriesread, ref totalentries, ref resume_handle);
if (ret == SuccessCode)
{
IntPtr currentPtr = bufPtr;
for (int i = 0; i < entriesread; i++)
{
ShareInfo shi1 = (ShareInfo)Marshal.PtrToStructure(currentPtr, typeof(ShareInfo));
ShareInfos.Add(shi1);
currentPtr = new IntPtr(currentPtr.ToInt32() + nStructSize);
}
NetApiBufferFree(bufPtr);
return ShareInfos;
}
else
{
return new List<ShareInfo>();
}
}
}
}
}

Thursday 9 July 2009

Forays into Linq - Querying a text file.

Hi! I'm Phil and this is my first ever programming themed blog post. I am a software developer of around 7 years currently working in .net 3.5. It is my intention to post about programming matters that I find interesting as I encounter them. Hopefully the result will be as enjoyable to you as it is to me!

To start off I'd like to blog about a program I wrote recently. It was interesting to me as I was able to leverage a couple existing technologies to write something quite concise.

We had a problem with some "corrupted" text files - some MS Office documents had been converted to text files and apparently contained both the body of the text and the accompanying meta data. What was needed was a text file that contained just the body of the text.

A cursory look over some of the files showed that the meta data was always 72 characters wide and consisted of alphanumeric characters and the symbols '/', and '+'. In that case it is a simple matter to write a Regular Expression that will match a line of of meta data:

// regex recognises a row of corrupted data.
Regex corruptedText = new Regex(@"^[a-zA-Z0-9+/]{72}$");
So, we have a method for spotting lines of meta data and now what we need to do is iterate over every line of the file and copy all the lines that don't match to a new text file. Or do we? It would be nice if we could just ask the file to give me all the lines of text that don't match the regular expression, wouldn't it? It turns out that using Linq we can do exactly that!

Linq can be used to query any collection that implements the IEnumerable<> interface. If we had such a collection that had an item for each line of text we could create a Linq expression that used the regular expression to give us all the lines of text that weren't meta data:

IEnumerable<string> goodlines = textFile.Where(dataLine => !corruptedText.IsMatch(dataLine)).Select(dataLine => dataLine);
Where textFile represents a collection implementing IEnumerable. This will result in every instance of dataLine that doesn't get matched by the regular expression being placed in the goodlines collection. Now we have a complete method for outputting the data we want from a collection all that's left is to get the text from the file into said collection. It turns out that this is also easy enough to almost be trivial:

/// <summary>
/// File reader that enumerates every line of text in a file.
/// </summary>
class StreamReaderEx : StreamReader, IEnumerable
{
public StreamReaderEx(string path)
: base(path)
{ }
#region IEnumerable Members
public IEnumerator GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
}
Let's take a minute to look at what we have here. The StreamReaderEx class inherits from the StreamReader class and implements the IEnumerable interface. Inheriting from the StreamReader class gives us all the functionality we need to read text files from the file system, and implementing the IEnumerator interface lets Linq in on the action too. We want our Linq expression to query each line of data so our implementation of IEnumerable<>.GetEnumerator() should do a yield return on each line of text from the text file. Using the yield return keywords means that the compiler will generate the enumerator class for us in the background.

So there we have it! The result is we have an application that is mostly declarative rather than imperative. This means that rather than write detailed instructions for every step of the process we can just ask for a specific result and the compiler will interpret that request and transform it into detailed instructions for us. The advantages of being declarative rather than imperative are that the code is faster to write, easier to maintain, and more concise, and the compiler can make optimisations as it sees fit. The disadvantage is that we lose control of the fine details of the execution.

The full code listing is shown below for the sake of completeness. Unfortunately all the formatting around the code gets messed up by the blog editor; I'm looking into a way of tidying that up.

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace CleanTextFiles
{
class Program
{
static void Main(string[] args)
{
// regex recognises a row of corrupted data. Regex
corruptedText = new Regex(@"^[a-zA-Z0-9+/]{72}$");
foreach (string corruptedFilePath in Directory.GetFiles(@"D:\Data\"))
{
string cleanedFilePath = String.Concat(Path.GetDirectoryName(corruptedFilePath),
@"
\clean\",
Path.GetFileName(corruptedFilePath));
Directory.CreateDirectory(Path.GetDirectoryName(cleanedFilePath));
using (StreamReaderEx reader = new StreamReaderEx(corruptedFilePath))
using (StreamWriter writer = new StreamWriter(cleanedFilePath))
{
WriteLines(reader, writer, corruptedText);
}
}
}
/// <summary>
/// Write every line of text from reader into writer that doesn't match the corruptedText regex pattern.
/// </summary>
static void WriteLines(StreamReaderEx reader, StreamWriter writer, Regex corruptedText)
{
IEnumerable goodlines = reader.Where(dataLine => !corruptedText.IsMatch(dataLine)).Select(dataLine => dataLine);
foreach (string goodLine in goodlines)
writer.WriteLine(goodLine);
}
}
/// <summary>
/// File reader that enumerates every line of text in a file.
/// </summary>
class StreamReaderEx : StreamReader, IEnumerable
{
public StreamReaderEx(string path)
: base(path)
{ }
#region IEnumerable Members
public IEnumerator GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
string dataLine;
while ((dataLine = base.ReadLine()) != null)
yield return dataLine;
}
#endregion
}

}