julian m bucknall >> What Does Configuresoft Do?

What Does Configuresoft Do?

published: Mon, 12-Dec-2005 | updated: Mon, 12-Dec-2005

I've been asked this before: "what does Configuresoft do and where do you fit in?" Usually this goes along with a comment that they've visited the official website, read about Enterprise Configuration Manager (ECM), but are still flummoxed. Well, it's all very simple really, once you strip the jargon away. In essence, ECM is a very good (as in pretty awesome) PC monitoring system. But, rather than continue in this vein, let's approach what we do from a different angle.

Say you're a sys admin in a medium to large company. You are responsible for your corporate network that has, say, 5000 machines plugged in. How do you know that those machines are configured properly? Let's say that part of the company's security policy is that the IP subnet mask on all those machines must be set to 255.255.255.0. How can you make sure? (And note that the security policy may also have a zillion other little details of configuration options that must be set just so.)

Option 1 is to go to each of those machines and manually check each by hand. Heh, rather you than me. Let's say it takes 5 minutes to walk to the next PC, turf the user off, log in as admin, and go to Start | Control Panel | Network Connections to check the value(s). Multiply that by 5000 machines, factor in a 40 hour week, and you're talking a complete year to do the job.

Option 2 is to get an army of interns to help you do it. Given 50 interns, you'd only need a week. Not exactly anything close to quick though, and you'd have a big lunch expense report at the end of the week not to mention you’d have to change all those admin passwords after the interns left (which would take another year).

Option 3 is to use some kind of software to interrogate the machines remotely and sit at your desk and wait for the answers to come back. This is the space where ECM fits.

Now the really nifty thing about ECM is that it doesn't do this in real-time. Nope. If you ask the question "is the subnet mask set to blah?" ECM will not go and ask all 5000 machines. Why? And why is that "nifty"? Well, because it has already collected lots of configuration information from all the machines and stored it in a (fantastically) huge database. So it only has to look into the database for the answer.

Now, sure, the answer may be several hours old at this point, so you can certainly get ECM to ask the machines for their latest values. But the issue now is one of network latency and bandwidth. The central server must send a some kind of well-formed request to each of these machines, and they must do the work to find out the value, and then they must reply to the server with that value. Over 5000 machines, that's a lot of network traffic. Suppose it takes 5 seconds to make the round-trip (server asks, workstation works it out, workstation replies, server stores the answer): then it would take nearly 7 hours for the server to get all the "up-to-date" data.

But, let's be even more niftier now, suppose that 500 of those 5000 machines are laptops and that 10% of them (50, in other words) aren't actually plugged into the network at the moment: the sales guys are out and about. With ECM, you can at least still say that laptop isn't configured correctly, even though it's not currently connected. Just wait until that salesman gets back.

So how did ECM get all the data in its database? Well, easy, really, you just set it up so that it collects configuration data at scheduled times. There are lots of options for this data collection (for example, the types of data to collect, only collecting the data that has changed, and so on) to minimize the disruption to network traffic and to people's use of their workstations.

But that's not all (as they say). There's an option in ECM to go ahead and change the invalid configuration values on these machines. You don't even have to visit the PC and make the change manually. We call this functionality remediation.

And then? Well, system admins can write their own rules to ensure that their machines are in compliance with their policies. Why have someone go check to see if the subnet mask is set correctly across all 5000 machines? Just write a rule that does so, and then schedule it to run every 12 hours. The rule can also be set up to do the remediation automatically. And those wacky lawmakers in Washington are enacting laws that force companies to know what's happening on their networks (the Sarbanes-Oxley Act is one such), and various regulatory and standards bodies are writing rules that help you comply with the law (the compliance aspect of our system).

Anyway, so where do I fit into all this? Well, I'm in charge of designing the next generation middle tier on the ECM server (known around here as the Collector, because it, well, collects the data). Consider it: ECM collects a bundle of data from each machine. Megabytes and megabytes worth. (We collect file system and registry information so you can even answer questions like "Who has Office 97 installed?" "Who has turned off Norton Antivirus?") The middle tier must take this data, uncompress it (it comes compressed to reduce network bandwidth), write it all out to the database (it uses "bulk load" techniques to get the throughput), and do some data consolidation and transforms.

Not only that but the middle tier must satisfy the UI part of the system. The user sitting at the UI wants to see data and that data in a nice order and aggregated in nice ways. The middle tier provides all this too. We currently have an ASP front end using AJAX (from well before it was called AJAX) using a rudimentary business layer, one that I'm going to improve as a matter of priority.

And then the middle tier is also the bit that provides the scheduling for the collections and other functionality. And the alerting subsystem ("Hey, Joe's machine is no longer scanning for viruses"). And is the primary conduit for the scalability of the system. And it had better be recoverable if the power went out (collecting is a very lengthy process and we'd like to make sure that a collection makes it into the database). And…

So, as you can see, it's a pretty amazing and huge system and my job in the whole works can get pretty involved. Worrying about vast amounts of data and making sure the system is efficient despite it is a new experience for me. And, compared to other systems, the issue is not the number of users (our system tends to have some low tens of users) but the amount of data. For example, collecting data from Active Directory is an exercise in mind-boggling data acquisition. Some of our customers have tens of thousands of user objects defined in AD with an aggregate of millions of attributes and we collect it all for analysis and display.

Consider this: in C# I've never worried particularly about the efficiency of the heap allocation process. Yes, it's fast, much faster than a standard C++ (or Delphi) heap allocator. But if you're processing millions of rows of data, suddenly the time taken to allocate those business objects could be significant. Suddenly, object pools or some other technique look like a good way to gain some speed.

Another very interesting topic that's going to become more important is that the data we collect will no longer be fixed in terms of its type, may not even be there, or may just suddenly turn up out of the blue. In other words the schema for the data we collect may change (consider someone adding a new attribute to a user object in AD) and we have to cater for this. We have to know where to put it, how to report on it, how to structure our database schema to cope. Our database organization is going to become very fluid.

Anyway, expect me to become a big data guru (as in the size of the data, not the size of the Software Architect) as the months go by. Also expect more thoughts about performance and scalability as we work through some intriguing scenarios here. Suddenly SQL Server 2005's clustering functionality looks very attractive.

And most of all, watch me struggle with the dichotomy between TDD (test-driven development) and BDUF (big design up front) and continuing to work out where an architect fits into the development scheme.