20090307

Veeam Backup - Linux file-level recovery, a neat hack!

I attended the March meeting of the Greater Cincinnati VMware User Group on Thursday, and saw a presentation on the Veeam Backup product. I enjoyed the presentation, learned quite a bit about the product, and got quite a bit of amusement out of a bit of a 'hack' that the product uses to perform file-level restores from Linux VMs.

When performing a file-level restore from a Linux VM, the Veeam Backup client boots a VMware Player-based VM. That VM boots a tweaked Linux kernel and uses a combination of Veeam-proprietary and Linux built-in kernel drivers to read files out of the Veeam backup image. I find this endlessly amusing. It's a great way to leverage the functionality already present in the Linux kernel, and gives their product a leg-up on the competition. As somebody who has used Linux to read "oddball" filesystems in the past (mounting a hard disk drive from a Commodore Amiga on a PC comes to mind... *snicker*), I really appreciate the ingenuity in this approach.

It's not really fair to say that it does file-level restores from "Linux VMs". It can do file-level restores from a variety of filesystems, some of which aren't even used by operating systems that can be virtualized under VMware (MacOS, as one example). The presenter indicated that they had support for over 40 filesystems as he launched into the topic (and rather stole my thunder, since I was primed to ask the question "You say 'Linux VMs', but what filesystems do you actually support?").

The Veeam Backup product looks like a winner. I'm hopeful that I can find some application in my Customer-base for this product to improve backup / restore efficiency.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200903070154 ]

20080701

Microsoft Advisory 954960 - A Pattern of Systematic Carelessness

Back in April of 2008, Microsoft pushed Office Genuine Advantage out to Customers' WSUS servers world-wide, though the tool was only supposed to be distributed in a targeted geographic area. This was acknowledged as a "mistake" by Microsoft. No method of removal was provided.

In November 2007, Microsoft renamed a product category in the head-end WSUS servers and broke the user interface on Customers' WSUS servers. This was rectified in a subsequent update. Quoting a note from the WSUS team: "We are also improving our publishing tools to make sure that issues like this are caught during the publishing process, before they impact customers." (It would seem that this relates only to catching this particular issue-- assuming we don't see it happen again.) If you were one of the unlucky Customers to receive the bad data, you were stuck performing a manual resolution procedure!

In October 2007, Windows Desktop Search was widely deployed to desktop computers, inadvertantly, by Customers using WSUS. Microsoft cited the "decision to re-use the same update package" as having "unintended consequences to our WSUS customers". No automated solution was provided to undo the damage done to potentially large numbers of computers. Windows Desktop Search did get a boost in installed base, though.

In September 2007, Microsoft caused the WSUS servers of Customers who opted to synchronize hardware driver updates to see approximately 4,000 new updates for ATI graphics cards. The WSUS team noted: "We are changing the publishing process for the future btw so that multiple HWIDs will be associated to one update in the future." Customers received metadata for the 3,982 seemingly duplicated updates were given instructions on manually rectifying the situation themselves.

In November 2006, Microsoft released Internet Explorer 7, Spanish locale, to all locales (not just Spanish). The error was confirmed by Microsoft and updated metadata was scheduled to be deployed. At the time, Microsoft's representative stated "We regret the inconvenience and confusion this issue may have caused WSUS customers. Thank you for your reports and enabling us to get this issue headed off so quickly." It is fortunate that so many Microsoft Customers work as unpaid regression and quality-assurance testers.

(I'm not even going into the months-long fiasco about "SVCHOST.EXE" hanging older PCs and the multiplicity of "fixes" that didn't actually resolve the issue proffered by Microsoft. That's probably more a beef with the "Windows Installer" people than with the WSUS people.)

After all of this, we now have a situation where bad data gets synchronized into Customers' WSUS databases causing unhandled errors in the server-side code called by client computers looking for updates. Beautiful.

So far, the only resolution I'm aware of involves a manual procedure performed by the Customers. This is also beautiful. I've already had the issue in at least one Customer site.

Is there any regression testing being done on patches deployed thru WSUS? Is there regression testing of the patch metadata being synchronized into Customers' WSUS databases? It sure doesn't look like it, on either front.

Why can't Microsoft take the time to provide automated fixes for the damage it creates automatically. It's not as if they can't write code to do things automatically.

It has the look and feel that a single disgruntled (or stupid) Microsoft employee could bring down a large portion of the desktop PCs and servers in the world. I won't even think about malicious third-parties gaining access to the server computers that serve updates out to privately owned WSUS servers throughout the world. Seemingly, if some catastrophe like this did happen, Microsoft would release a procedure for their Customers to manually perform on each affected system. Whee!

Yet again, I'm embarrassed to have my Microsoft "certification" and to be associated with them in any way. Way to foster trust in IT, Microsoft!

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200807010105 ]

20080504

Strangeness with Trend Micro 32-bit Virus Scan Engine 8.700.1004

I ran into an odd issue with a friend's network yesterday, and have decided that feeding it to the LazyWeb to chew on is a good idea.

For semi-relevant informational background, the subject network in this diatribe is a small private religious K-12 school w/ roughly 120 Windows XP Professional SP2-based PCs. They have a Windows Server 2003 Standard Edition R2 32-bit file server computer acting as an AD domain controller, a Windows Server 2003 Standard Edition R2 32-bit server computer acting as a replica domain controller, and a Windows Server 2003 Standard Edition R2 64-bit server computer running Exchange 2007. Everything was a clean install migrating away from Novell Netware last September, and all the client computers were reinstalled from fresh Windows XP installations at the time of the migration. The network infrastructure is all Cisco-based switched 10/100 Ethernet (w/ gigabit uplinks between switches) with no VLANs or QoS. I did most of the original setup, and things are sanely configured (clients pointed to internal DNS servers running on domain controllers, IP addresses handed out via DHCP, etc). In general, everything has been humming along since I did the initial setup, and walking in to look at the issue I didn't expect that it was anything setup-related. (Because of bogus political reasons, I can't bill for work on this network anymore, but I became friends with the on-site "computer teacher" and I still stay in touch with him. It's frustrating, but I like the people and try to generally be helpful and nice... *smile*)

Okay, okay-- enough blathering on. The issue shakes out like this:

Last week, the client computers (most of them current on Microsoft updates as of April 20th or so) started hanging during common user activities-- mainly opening and closing Microsoft Office and Adobe CS3 applications and using Internet Explorer. Even Windows Explorer would hang, from time to time. If one would leave the computers sit in this "frozen" state, they would eventually "free up" and begin to work again. In cases where a hang occurred closing a program (such as WINWORD.EXE), the program might hang around in the process list for awhile and eventually disappear. You could open more copies of the program, and as you closed them, you would build up more "hung" copies in the process list.

My friend and I found that we were able, just by fiddling around with Microsoft Word, Adobe Illustrator, etc, to reliably generate failures in about 3 - 5 minutes of work. Strangely, though, we could only get failures to occur when logged-on as a user who did not have local "Administrator" rights.

Of all the users on the network, only one (1) user logs-on with a non-limited "Administrator" account (for frustrating reasons I won't go into). We checked with this user and found that she has seen no issues. This seems to jibe with our inability to reproduce the issue except when logged-on as a limited user.

Watching the hangs with Process Explorer, I was seeing several threads in the hanging programs stuck on calls to kernel32.dll's GetModuleFileNameA+0x1b4 export. I think this is related to the root-cause of the issue, but I don't have the right source code to debug this any further down into the stack. Anyway, I kept banging on Process Explorer for a bit, but then we moved on to think about other things that might've changed.

A major "changed" item that we discovered related to the Trend Micro OfficeScan product. The OfficeScan "32-bit Virus Scan Engine" was updated on 4/22/2008 to version 8.700.1004. My friend recalled that the problems being reported by users starting last Tuesday, and a quick review of the trouble ticketing system revealed that this was the case-- all the trouble reports started on Tuesday after the Trend Micro update.

I'd already gotten the feeling that the root cause was probably anti-virus related, simply because the issue was happening to such a variety of computers and in a variety of applications. The only commonality between the machines, aside from the operating system, was the anti-virus software. In an earlier test, we removed OfficeScan from a machine on which we had been able to reproduce the issues and tried for 30 minutes to reproduce the issues without success. We allowed Group Policy to reinstall OfficeScan and reproduced the issue again within 5 minutes.

I performed a "rollback" to OfficeScan virus scan engine version 8.550.1001 on our test client computer (via the OfficeScan console). We verified that the client reported the older scan engine, bounced the machine, and spent 30 mintues attempting to reproduce the issue. We could not reproduce the issue with scan engine 8.550.1001. We rolled the engine forward to 8.700.1004 again and were able to reproduce our issue.

For now, we've initiated "rollbacks" on all the client computers that are "online", and my friend will watch tomorrow and rollback any other clients that don't pick up the rollback request automatically. I don't like not being current on updates to things like anti-virus software, but I think it's a necessary evil in this case, and because it's only the scan engine and not the virus definitions, we are probably not opening ourselves up to undue risk.

The only thing I found on the 'net thusfar was a vague posting, and it's too vague to really get anything out of.

How about it, Lazyweb? Any similar situations happening out there?

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200805041011 ]

20071002

'Disk Full' versus 'Error 0x80070052: The directory or file cannot be created.'

Mark Russinovich posted about a problem copying files to a USB flash drive that a friend of his was seeing in Windows Vista.

Mark quotes from a January 2001 thread on the ntfsd mailing list regarding the cause of the problem:

[the error code] ...indicates that you tried to make a file in the root directory of a FAT12/16 disk, and there were not enough available directory entries. FAT12/16 roots are fixed size, usually formatted to 512 entries (32 bytes per), and files with non-8.3 names take up at least two.

It's been years since I've seen a problem with not having enough root directory entries available. I'll admit, I didn't guess the nature of the problem before Mark explained it, but I feel fairly confident that I'd have diagnosed it properly in the field.

I used to see this fairly frequently when Iomega ZIP drives first became popular, because people were treating them like floppy diskettes, and because the new VFAT extensions in Windows 95 contributed to the use of root directory entries.

Exhausting all of the root directory entries on a 3 1/2" floppy diskette (112-- see http://www.pcguide.com/ref/hdd/file/fatRoot-c.html) wasn't easy with MS-DOS. You'd need an average file size of about 12.71KB to do that, and although I'm sure a significant number of users saw root directory entry exhaustion on floppy diskettes, I'd imagine most floppies had full data areas before they had full root directories.

With the advent of VFAT, and the larger storage capacity of ZIP disks, the conditions for root directory exhaustion became much more favorable. Users weren't familiar with making subdirectories on floppies, and they'd store tons of files in the root directory of their ZIP media. On a 100MB ZIP disk, piling on files of 200KB or less would exhaust the 512 available root directory entries. If you used long filenames, additional root directory entries would be populated with long filename data.

MS-DOS and Windows 9X return a 'Disk Full' error when you run out of root directory entries on a FAT volume. I saw multiple cases where users had purchased additional ZIP media because they erroneously thought their other medias were "full", when in reality they just exhausted all the root directory entries. *smile*

Further down in the ntfsd posting, the author says:

Prior to Whistler, FAT returned STATUS_DISK_FULL but the Win32 ERROR_CANNOT_MAKE had existed since Win9x, and was used there exactly for this case. The new status code was created so we could start using it.

(The 'Whistler' software the poster refers to is none other than Windows XP.)

I find it quite humorous that Windows Vista, the present-day flagship version of the Windows operating system, returns a more arcane and less helpful message to an error condition than more than decade-old Microsoft operating systems. The posting from ntfsd shows that the file-system people were plumbing their side of the house to allow for a descriptive error message, but the UI team dropped the ball!

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200710020018 ]

20070915

The Kids These Days...

I meet young adults in my work with the American Legion Ohio Buckeye Boys State program, and some of them, by virtue of their proximity to my home, stay in touch with me.

One such young gentleman works at the coffee shop / restaurant that Stephanie and I frequent most Saturday nights. He stops by our table and talks to me about his college CS exploits. Tonight, he mentioned that he was taking a class in Java programming. Being 19 years old, he started programming with languages like C# and VB.net. I asked him what he thought of Java, and how he'd contrast that to languages and API environments he was already familiar with. To slightly paraphrase what he said:

"Programming in Java after using C# is like having a hot chick break up with you, and, in desperation, deciding to date her retarded twin sister."

Not very sensitive or politically correct, but funny as hell.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200709152241 ]

20070224

Microsoft Daylight Saving Time Irritation

Susan Bradley posted contrasting her feelings to my spiteful attitude re: Microsoft's response to the changing DST. Certainly, spite is my initial reaction, but when I have a moment to be more thoughtful about it, my reaction is more calculated. We're seeing the result of an inept response by Microsoft to the change in DST. The response is inept for a few reasons, to my mind, and purposeful in its level of ineptitude.

Microsoft's top talent has been distracted since prior to the passage of Energy Act of 2005. They've had several major product releases (Windows Vista, Office 2007, and Exchange 2007) to contend with, and no one would argue that, in their business, pushing new products out the door drives revenue.

Microsoft management has to be sure that their people are working toward the interests of increasing shareholder value. Writing manicured patching tools, doing QA on those tools, and developing best-practices documentation on using the tools isn't going to drive revenue. Taking top-flight technical talent off of a new product launch to work on a patch also doesn't drive revenue. Further, one can rationalize, DST changes don't take effect until March, 2007, so the response can wait a bit... (Relative to Outlook and Exchange, my work would have been a lot easier if the underlying Windows patches had come out in, say, mid-2006. I'm guessing that others would agree with me. If nothing else, my users would have had more time to double-check their bookings. I think there would have been far fewer items scheduled in the new DST interval to have to contend with "re-basing" if Windows had been patched for the new DST back in mid-2006. I supposed I should've dug out TZedit.exe back then and just done it... *sigh*)

I've no doubt, based on the hurried, last-minute, and frequently revised response to the DST changes, that working on these patches has been some of the most un-sexy and undesirable work that Microsoft developers have had to do, and I can imagine that it's been put off again and again since the Act was passed.

Customers are going to have to live w/ the quality of the tools they get. Sure-- there's the remote likelihood that some Customers might be so upset at the quality of Microsoft's response as to look toward migrating to other software, but nobody can call the "other Microsoft" and get patches for Windows, Exchange, etc. Microsoft's position in the market is pretty secure, and mediocre response to the changes in DST isn't going to hurt them. Keeping the Customers happy enough to stay around (or, some would say, at a manageable level of dissatisfaction) by making a token effort to release mediocre patches is all Microsoft needs to do. It's cheap, too, since less technical talent has to be engaged in unprofitable patch development work.

The tools we're seeing for Exchange (with the numerous revisions, mistakes, retractions, and conflicting comments) remind me of the quality of Microsoft's "unsupported" Resource Kit tools. These Exchange DST tools have the feel of being raw, unfiltered output from developers, without the benefit of much QA. That's also a great indication that management hasn't engaged teams in a formal process of developing a response to the change in DST.

The character of the repeated revision to the KBase articles and tools related to the DST change have the feel of an in-house development team firing off emails to the IT department: "Hey, guys-- that last tool we sent you has a problem. Don't install it into a directory w/ spaces in the name... sorry!" That's all well and good, except that most Microsoft Customers aren't as skilled as the in-house IT department at Microsoft. When I looked at ther v1 Exchange "re-basing" tool, I commented to my business partner that it was highly unlikely most people were going to know how to get the "legacyExchangeDN" name of their server(s) (let alone the fact that the tool doesn't even call it a "legacyExchangeDN", but rather calls it the "Server Domain Name"). Somebody closer to the code for the tool would know what it was looking for. The tool should have just searched the Active Directory, pulled the necessary values out, and presented them in a friendly menu. The intended audience for the Exchange Calendar Update Tool was a technically proficient administrator-- not somebody with passing familiarity with the product. (Yes, yes-- I know that version 2 of the Exchange Calendar Update Tool is friendlier and better... and, apparently, can now handle spaces in the name of the installation directory...)

Microsoft thumbed their nose at all of us in the IT industry when they elected not to release a patch for Windows 2000 for DST. Their response, relative to Windows 2000, was as much a "marketing" effort to further the cause of forced "upgrades". (I know that Windows 2000 is in the "Extended Support" phase of its "lifecycle"...) Having written a bit of code to patch the Windows 2000 time zone entries in the registry, it's apparent that it would have taken almost no additional work, on top of the QA work already done for the Windows 2003 / Windows XP patches, to put a patch for Windows 2000 out the door. This is another indication, to me, that the mishandling of this situation was purposeful, on the part of management.

<soapbox>

This is what happens when an enterprise uses closed-source software. If an enterprise really cares about being able to get "support" for a software application, it makes the most sense, to me, to use software that (a) has source code availability, and (b) is licensed such anyone qualified can be hired to work on that source code.

People seem to forget, when they commit to using closed source platforms to support their business, that Microsoft, and other closed-source software development firms, get to dictate monopoly pricing on "support" for their products. Customers are at the mercy of the software manufacturer's decisions about the quality and timeliness of the response to issues and incidents. I'd much rather be able to send out an RFP for changes, and choose the best bidder, than be stuck with a single choice to fulfill my needs.

There are almost no "consumer rights" in the world of software. Realistically, commercial, closed-source software should be approached with the attitude of "no warranty, as-is, if it breaks you get to keep the pieces", unless you've got a contract with the manufacturer that says otherwise.

</soapbox>

Keeping track of time is hard to get right, especially when time zones and daylight saving time come into play. People have difficulty understanding the difference between items that are scheduled relatively or absolutely, with respect to a standard time. Perhaps it's not fair of me to pick on Microsoft, since as everyone has to deal with these changes, and other software manufacturers haven't gotten it right very quickly, either.

Microsoft bears the brunt of my ire for not making proper engineering decisions during the design of their products (Windows, Exchange, Outlook) to make these kinds of changes easier. It's not as though changes to daylight saving time haven't occurred before, and it's not as if they're a fringe element in the personal computer operating system market. (Perhaps a web browser is an integral operating system feature, whereas keeping track of daylight saving time should be left up to the ISVs...)

Many of my Customers have committed (by way of giving over lots of money) to using Microsoft products to form the basis for their IT infrastructures. It would have been nice if Microsoft had recognized that commitment (and the lots and lots of money), and developed quality patches and documentation much earlier in the game, instead of forcing us all to play "catch up". (I'm also glad to see that I'm not the only one who is complaining about this.)

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200702241720 ]

20060703

ActiveX Installer Service

Bruce Schneier posted about an eWeek article that details the proposed new ActiveX Installer Service. I was a little disappointed to see that Bruce thought it was a bad idea. I think this may be because Bruce doesn't know how bad the current situation is.

This is really an improvement over prior Windows versions, not a "poking holes" in security deveopment. It has very little to do with User Account Control at all. It's not bad, per se. It's better than things were before. I wish, in fact, that it was being backported into the current Windows XP environment.

In a presently-deployed network with Windows 2000 and Windows XP client computers and users who do not have local Administrator rights, users cannot install ActiveX controls. This ends up being a huge pain, a sink of manual labor, and clearly isn't a situation that was very well thought-out by Microsoft at all.

Most of my Customers, for example, have some web-based application that requires certain ActiveX controls to be installed to function properly. In most cases, I can deploy Windows Installer-based (MSI) packages for the controls they need (mainly Adobe Reader and Macromedia Flash), and the headache is taken care of.

For my Customers that use more "boutique" ActiveX-based applications (an outsourced payroll management system that is ActiveX-based, an "ASP" document control repository interface, etc) that are not distributed as MSI files, I have two (2) remotely viable choices in getting the controls deployed onto their PC's, and neither is very good:

(1) Capture all the registry settings and files installed during the browser-based installation with a "packaging tool" (or "by hand") and build an MSI package. Hope that I got everything right (since I don't have source code to their control) and do damage control when I screw up some nuance of the installation on a subset of the, potentially, hundreds of different PC configurations that the control may deploy onto. Update this package each time the manufacturer "updates" the control.

(2) Logon to the PC manually w/ an Administrator credential and install the control. Attempt to verify and hope that the control doesn't inappropriately store anything in the user-specific portion of the registry such that the control won't function properly when the user attempts to use it. Do this to each PC that uses the control each time the manufacturer "updates" the control.

The third choice, of course, is just to weaken the security, typically by giving the user local Administrator (or, sometimes, Power User) rights. That's totally unacceptable to me, so most of the time I end up doing a combination of the first and second above, depending on whether the control needs to be on three (3) or three-hundred (300) PC's, and depending on the frequency of "updates" by the control manufacturer. (Don't even get me started about these idiotic software firms that think that "release early, release often" is a good idea for this type of application...)

The "solution" that Bruce posted about alleviates these problems above and creates an interface that I can use to grant access for "normal" users to install ActiveX controls that I've approved. Let's be clear: I don't _like_ the fact that we need this at all-- i.e. I think Microsoft shouldn't have browser-based control installation and _ALL_ installations of software should be managed by a service like the Windows Installer. Since nobody at Microsoft values my input on this issue (and, apparently, the input of every other clueful corporate network admin), I'm stuck favoring this solution only because it's leaps and bounds better than the alternative and will end up saving my Customers money.

It seems pretty clear to me that Microsoft doesn't do a lot of thinking about how large corporate Customers are going to integrate "new innovations" into their deployments. For everything that Microsoft does to add management interfaces, automation systems, and centralized administration tools, it seems like most cool new "innovations" get marketed heavily to ISV's who end up deploying them into systems that my Customers interact with. These "innovations" usually end up being the equivalent of prototypes that got shipped, and they aren't engineered heavily enough for real world deployment, use, and management.

In the case of ActiveX controls, the majority of the ISV's I've dealt with don't have any coherent strategy for deploying the control onto large numbers of PC's in a clean, manageable way, primarily because Microsoft didn't drive the point home clearly enough to the ISV's developers that deployment is an important issue. This feature, at least, gives us some mechanism for controlling the insanity.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200607031256 ]

Windows Genuine Pain in the Ass

I'm a couple of days behind in the blogs, but I saw that Bruce Scheneier had a posting about Windows Genuine Advantage and the rumored "Windows kill switch" that Microsoft might be planning to install.

For my part, I've already seen two (2) false positives where WGA began displaying pop-ups indicating that a copy of Windows was not genuine. Both computers were laptop computers acquired a couple of years ago from a large OEM by one of my government Customers. I know, with complete certainty, that they were "genuinely" provided by this OEM. They were still running their factory installs of Windows XP Professional (they're light-duty machines, and are used by task-oriented users w/o local Administrator rights, so they stay stable and have pretty clean software), and both of them started indicating that they were not "genuine" at almost exactly the same time of the day.

We called the OEM's technical support, and the level 1 drone let slip that he'd heard from other agents in his call center that a large number of calls related to this problem were coming in. He indicated that, at that time, the only know "fix" was to reinstall the operating system using the product key affixed to the PC. A quick reviewed of both PC's revealed that they appeared to be using the same product key in their "factory" installs. This seems pretty common w/ OEMs-- I've seen the same product key in several different OEM machines in %SystemRoot%\System32\OOBE\OOBEInfo.ini. I've seen this w/ Dell, HP, and Gateway systems.

I'm not going to debate the philosophical points re: copyright and Microsoft's control of their "intellectual property". From a security perspective, attacking this system and disabling copies of Windows sounds like a great way to disable the IT infrastructure of a competitor. I'm not the only one who is thinking of this, so I'm assuming that the malicious attackers are, too.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200607030252 ]

20060627

Automated Update Behaviour

Over on the flow|state blog, Jan Miksovsky is talking about automated update behaviour in his post Oh app, for crying out loud, go update yourself. This is, of course, a subject near and dear to my heart.

Frequent updates, to me, are a sign of probable poor software engineering practices on the part of a software manufacturer. This is cemented in my head when I contact a manufacturer's technical support and am immediately asked "have you downloaded all the current updates" w/o so much as being allowed to describe my issue? This tells me, in no uncertain terms, that the manufacturer is deploying code with the underlying strategy of "if we ship bad code we'll just push out a patch later-- why bother testing?"

Most of the discussion in Jan's blog is about end-user interaction with updates. As a contract IT administrator for several companies, my biggest beef with update systems is the lack of attention paid to centralized control and deployment of updates. My end users shouldn't be bothered by update notifications because (a) it's not their job to perform updates, (b) they aren't qualified to select what updates they should/should not receive, and (c) they don't have the necessary rights on their PC to perform updates (and, if they do, there's something mal-configured). I don't need 2,000 PC's downloading the same update from the Internet independently, either.

In the Windows world, it would've been helpful if Microsoft had plumbed the "Microsoft Update" infrastructure on clients to be extensible for third parties. I'd love to have been able to deploy updates for a large portion of the software that I administer on my WSUS servers and manage those updates the same way I manage operating system updates. A managable patch deployment system seems, to me, to be as much an "operating system" feature as, say, an integrated web browser.

I am continually peeved at software infrastructues being designed for end-users then marketed toward business environments. Almost every time I deploy a new product, I undertake a search to determine how to neuter the "automated update" behaviour in the product, and end up writing scripts, Group Policy administrative templates, or hacking DNS to prevent the product from rolling out updates w/o my express consent.

If I wasn't this much of a hard-ass, my Customers' networks would be unmanagable swamps of crap. Fortunately, we can "sell" this pretty well, and show a clear ROI on our hard-assedness.

Guys in the software industry: We've been doing this _how_ many years now? Surely you could be learning something... *sigh* I know I'm not the only one who thinks this way.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200606272043 ]

20051222

"We update our software frequently, over the Internet, ..."

"...so that you always have the most current version!"

If you see this assertion with respect to a piece of software you're considering, my advice to you is to RUN!

Yes, there are market segments for which this is not an unreasonable practice. I've worked with software applications used to populate forms (which are then printed, signed, and filed... *sigh* A subject for another post...) that are based on legally mandated designs (think tax forms, real estate appraisal forms, disclosure forms, etc) which change somewhat frequently. In this market, it's totally reasonable to think that patches are going to have to be produced regularly. Ideally, the design of the application should be such that this involves the smallest code footprint possible and provides the least opportunity to introduce regression errors.

On the other hand, if your software vendor is touting frequent patches that add new features or fix defects as a benefit, then I'd argue you're seeing the output of shoddy software "engineering" practices and bad product management. This "patch early, patch often" attitude seems to be driven by a misguided attempt to be "support oriented" or "Customer driven", but actually results in a worse experience for the Customer and increased expense for the vendor.

Defects and Quality Assurance

With respect to defects, I agree with using bug tracking systems, responding to Customer-reported issues, and ultimately releasing patches as necessary. I'm not in favor, however, of leaning on this strategy as an excuse to use your production users as unwitting beta testers, as this strategy seems to lead.

This practice has a negative impact on quality assurance. QA is often neglected in the software industry anyway, and the idea of issuing patches when issues arise, in lieu of having comprehensive testing and QA procedures, is seductive to management because it can further decrease the perceived "expense" of QA. Why spend time looking for defects internally when the Customers can report them? As you can well imagine, this also creates patches that are of equal or lesser "quality" to the original software and often introduces regression errors.

Coupling the scrimping on QA with the automatic delivery of patches creates the most terrifying scenario. I've been in the situation, all too frequently, of having to explain to one of my consulting Customers that a software update that was automatically applied (without their express consent) actually created an issue due to a vendor-induced regression error.

IT Support Hell

Even when a patch isn't automatically applied, though, applying patches from a vendor that has a history of "poison pill" patches is a game of Russian roulette for your IT staff.

Suppose you spend the money for a lab environment to test the patch before deployment, and even more money to spend on humans to do the testing; unless you're sure that your lab accurately represents all aspects of the production environment, and the usage patterns of the production users, you're still playing with fire.

How many of us have WAN simulators in our labs, or employ testers who are as familiar with all the features of the application as the users who use the software every day? Most test labs I've seen involve a secondary installation of the application on an unrelated production server, the desktop PC's of the helpdesk or IT suport team, and the time necessary to install the patch and see if the application opens afterwards.

With the plethora of applications that even a small company employs combined with frequent operating system patches and the potential for unwanted "interaction" between disperate applications installed on the same PC, it's all but infeasible for small organizations to effectively test patches prior to deployment. The sheer number of hardware, operating system, and application software configurations is too great. Small companies aren't ever going to be able to fully test all possible scenarios, so it's better for them to choose software applictions that require less frequent patching and that come from vendors who ship software with fewer defects.

New Features

Implementing new features on the whim of a Customer, without undertaking any kind of formal requirements research and planning, is a setup for yet more patches to be issued when other Customers discover the new feature and find that it doesn't quite meet their needs. It would be better to put new feature requests into a requirements specification for a future version, rather than writing code based on ill conceived requirements. It seems like vendors see this as a benefit to the Customer, but I'd much rather have fewer patches to support at the risk of having to wait for features.

Technical Support

The "update frequently" attitude leads to substandard technical support, as well. Invariably, the support technicians are taught to ask the Customer for the version number of the application they're using, and to instruct them to download the latest build before giving the Customer a chance to give a description of the issue.

This is, I suppose, because the technical support management at these companies believes that "most" issues are solved already in the new build. Of course, without actually determining what issue is being reported and at least following-up to see if the new build resolved the issue, blindly directing the Customer to the most recent build doesn't do anything to improve the quality of your support metrics! (A feedback loop that appears to be missing the "loop" part, eh?)

It sounds silly if you actually think about it, but I've been directed on many occasions to "download the current version and call back if you're still having a problem". Coupled with the problem of low quality patches that contain regression errors, there is the serious potential to induce new issues when instructing a Customer to update their installation. It would seem to me that inducing additional issues when the Customer is already experiencing an issue would be a bad Customer service policy.

Summary

In the end, when I see a company touting frequent patches as a feature, my "gut" tells me that they have succumbed to the stuporous routine of patching problems as they arise, and not taking the time to develop the necessary internal practices and controls to stop poor quality software from ever shipping. I incorporate frequency of patches as a criteria when I evaluate software for my Customers, and an application that is patched very frequently receives low marks.

I encourage you to hold software vendors who treat you this way responsible. I would advise you to evaluate competing products that have less history of patching. Even if the competion lacks a few features or isn't as "nice", you'll more than make up for that in fewer patch-related headaches, and you'll be sending a clear message to the vendor that this kind of behavior won't be tolerated.

[ Other blogs' comments... ] [ Category: /software ] [ permalink ] [ Posted at 200512221930 ]