Short form: NO USER DATA WAS HARMED, but lorien needs to be rebuilt, because I did a stupid thing. No ETA as yet.
The mail reception problem on lorien was due to /var filling up with HTTP logs, particularly when Windows worms probe around for exploited hosts.
So I wrote a nice little script to deal with that -- actually, I stuck a line in one of my cronjobs to dela with that, and it said:
find . -mtime +30 | xargs rm
Yeah. Um. And once that ran, I no longer had anything in /usr/bin or /usr/local/bin? And that means I lost a lot of things that a girl finds useful.
NO ACTUAL USER DATA WAS HARMED!
...but a lot of the commands we use to manipulate your data? Definitely gone. This includes pretty much the entire mail system, by the way.
This will be back up ASAP, but as it requires a rebuild, no ETA on that, and wow do I feel stupid. This wasn't the stupidest sysadmin trick I pulled, but it's definitely in the top 5.
-- Lorrie (duh)
The mail reception problem on lorien was due to /var filling up with HTTP logs, particularly when Windows worms probe around for exploited hosts.
So I wrote a nice little script to deal with that -- actually, I stuck a line in one of my cronjobs to dela with that, and it said:
find . -mtime +30 | xargs rm
Yeah. Um. And once that ran, I no longer had anything in /usr/bin or /usr/local/bin? And that means I lost a lot of things that a girl finds useful.
NO ACTUAL USER DATA WAS HARMED!
...but a lot of the commands we use to manipulate your data? Definitely gone. This includes pretty much the entire mail system, by the way.
This will be back up ASAP, but as it requires a rebuild, no ETA on that, and wow do I feel stupid. This wasn't the stupidest sysadmin trick I pulled, but it's definitely in the top 5.
-- Lorrie (duh)
no subject
Date: 2006-01-10 06:37 pm (UTC)no subject
Date: 2006-01-10 07:49 pm (UTC)no subject
Date: 2006-01-10 07:57 pm (UTC)no subject
Date: 2006-01-10 07:12 pm (UTC)no subject
Date: 2006-01-10 07:49 pm (UTC)-- Lorrie
no subject
Date: 2006-01-10 07:15 pm (UTC)no subject
Date: 2006-01-10 07:49 pm (UTC)Note that I'm not revealing what my Number One Stupid Admin Trick was. 8-P
-- Lorrie
no subject
Date: 2006-01-10 08:46 pm (UTC)(You may want to look into logrotate or similar for future log management)
no subject
Date: 2006-01-10 09:41 pm (UTC)Pushing a zero-length password file to every large file server at Hotmail. The machines then had to be rebooted from their original OS CD's. There was a script that did it, I typoed the source file and the script that actually hashed and pushed the password and shadow files didn't have any lint checking.
There were several dozen of these. Wouldn't affect the user experience at all, but the admins couldn't do anything with them...
The whole team -- with the notable exception of myself -- spent most of the night fixing it. I was the notable exception because they were all too pissed to look at me, with definite justification.
Now, I didn't write the script with no lint checking, so that guy got a share of the blame, but...
-- Lorrie
no subject
Date: 2006-01-10 07:56 pm (UTC)Personal note for future: when running
find(1)in any non-interactive situation, always explicitly specify the directory. Mmmm-hmmm, lesson learned. (I'm big on learning from other people's mistakes, as well as my own.)no subject
Date: 2006-01-10 09:42 pm (UTC)8-)
-- Lorrie
no subject
Date: 2006-01-10 08:13 pm (UTC)no subject
Date: 2006-01-10 09:42 pm (UTC)-- Lorrie
no subject
Date: 2006-01-10 09:57 pm (UTC)no subject
Date: 2006-01-10 08:20 pm (UTC)hey, are you going to be at Greyhaven Sat for the Greek ritual? I'm tagging along with Zoe.
no subject
Date: 2006-01-10 09:42 pm (UTC)-- Lorrie
no subject
Date: 2006-01-10 08:34 pm (UTC)COuld you get in and change my email over t' the comcast account? We have a Rede meeting and I *need* access...
no subject
Date: 2006-01-10 09:43 pm (UTC)-- Lorrie
no subject
Date: 2006-01-11 04:12 pm (UTC)Ouch
Date: 2006-01-10 09:48 pm (UTC)Heh - I've done stupid sysadmin tricks like this before. The worst one I ever did: In my first position as a professional sysadmin, we were using Novell 3.1 as the OS on the accounting server. I was applying security patches and updates to the system, and wanted to make a copy of the bindery.
Only I didn't make a copy. I moved it. Out from under the server. While it was running. With no backups (the updates were in part to solve a problem we were having with the backups). I killed the move midway, but suddenly, no one could log into the server.
Novell support told me I had to reinstall - not doing that. I actually had to use tools from a cracker site to hack into the system and restore the bindery. It took all night, but I was finally able to recover.
*whew*
Experiences like that make you properly paranoid. That's how I tell someone has good sysadmin experience when I'm interviewing someone - the level of paranoia and immediate resistance to change before everything is mapped out.
Re: Ouch
Date: 2006-01-10 09:59 pm (UTC)-- Lorrie
Re: Ouch
Date: 2006-01-10 10:16 pm (UTC)This experience actually got me working with command line utilities, which ultimately got me into various and sundry Unix OSes, where I have happily lived ever sense. So in a way, it was a Good Thing (tm).
Re: Ouch
Date: 2006-01-10 10:13 pm (UTC)I've saved off a tarball with the contents of /etc (unharmed) and put it in an fs that (shouldn't) be affected by the rebuild, which will preserve my postfix config and uids/passwds. The apache config was in /home and wasn't affected. What else can you think of?
Ooh, the specifics of the network config, that's important. *writes that down*
The dicey thing about a reboot is that I'll lose ssh at that point; the memory-resident binary survived the disk falling away beneath it.
-- Lorrie
Re: Ouch
Date: 2006-01-10 10:30 pm (UTC)Let's see..
* check /var for possible things you might need (/var/backups in particular)
* check /root for anything that might be hanging about that you forgot
* check /opt for any 3rd party installations (not likely on a Debian machine)
That's all I can think of off hand.
I have seem ext3 filesystem versioning problems with what you are attempting to do - there are some upgrades to ext3 that aren't backwards compatable, so leaving things on local filesystems you hope to mount later can be dicey. 90% of the time, it will work, but it might not. I recommend tar-ing everything up, and copying everything to a remote location or burning to CD before rebooting (given that scp is in /usr/bin, this might be a problem. Do you have a local ftp client? I can provide disk space for you if you need on a server I have).
I just added your Yahoo IM to my f-list. Approve the add, and message me for my cell number if you need either an scp or ftp based account to stuff things to and from. I would back up everything that wasn't killed, attempt to recover, and if you can't - wipe and full reinstall.
Re: Ouch
Date: 2006-01-10 10:50 pm (UTC)/var tarred and moved to high ground. Copied/pasted /etc/[passwd|shadow] to text files on my Mac.
* check /root for anything that might be hanging about that you forgot
/root doesn't have anything helpful in it, nor /boot.
* check /opt for any 3rd party installations (not likely on a Debian machine)
That's /usr/local here, but /usr/local was part of the carnage. Happily I had Apache in /home/apache, as /home is my free space hog, or all the websites would have been blowns away (they have offsite backups as well as a select number of the ~'s)
I have seem ext3 filesystem versioning problems with what you are attempting to do - there are some upgrades to ext3 that aren't backwards compatable, so leaving things on local filesystems you hope to mount later can be dicey. 90% of the time, it will work, but it might not.
There was a kablooie of that scale when we went from Debian 3.0 to 3.1 -- when it updated everything, it changed things around successfully, but didn't rereun lilo after fiddling with fiddly boot bits. Duh. I'll be putting 3.1 on top of 3.1, so there "shouldn't" be any weird fs issues, but I'll definitely be running lilo before booting from hdd. 8-P
I recommend tar-ing everything up, and copying everything to a remote location or burning to CD before rebooting (given that scp is in /usr/bin, this might be a problem. Do you have a local ftp client? I can provide disk space for you if you need on a server I have).
/usr/bin, /usr/local, et al are gone, and with them ftp, scp, etc etc. I can copy and paste text files around, I could conceivably burn CD's and walk them across the room, but that's the only way I have of getting non-text files in or out just now.
I just added your Yahoo IM to my f-list. Approve the add, and message me for my cell number if you need either an scp or ftp based account to stuff things to and from. I would back up everything that wasn't killed, attempt to recover, and if you can't - wipe and full reinstall.
I didn't get your add req. Odd...
-- Lorrie
no subject
Date: 2006-01-10 10:02 pm (UTC)I still recall taking Vento down by cutpasting a bash forkbomb into a terminal window instead of IRC, as intended... part of me was impressed that I could do that from a userland account, the other was pondering what level of deniability I needed.
no subject
Date: 2006-01-10 10:24 pm (UTC)no subject
Date: 2006-01-10 10:04 pm (UTC)worst sysadmin screwups
Date: 2006-01-10 11:06 pm (UTC)