...oops.

Jan. 10th, 2006 10:31 am
lwood: (wizpod)
[personal profile] lwood
Short form: NO USER DATA WAS HARMED, but lorien needs to be rebuilt, because I did a stupid thing. No ETA as yet.

The mail reception problem on lorien was due to /var filling up with HTTP logs, particularly when Windows worms probe around for exploited hosts.

So I wrote a nice little script to deal with that -- actually, I stuck a line in one of my cronjobs to dela with that, and it said:

find . -mtime +30 | xargs rm

Yeah. Um. And once that ran, I no longer had anything in /usr/bin or /usr/local/bin? And that means I lost a lot of things that a girl finds useful.

NO ACTUAL USER DATA WAS HARMED!

...but a lot of the commands we use to manipulate your data? Definitely gone. This includes pretty much the entire mail system, by the way.

This will be back up ASAP, but as it requires a rebuild, no ETA on that, and wow do I feel stupid. This wasn't the stupidest sysadmin trick I pulled, but it's definitely in the top 5.

-- Lorrie (duh)

Date: 2006-01-10 06:37 pm (UTC)
jamie: bitter panda saying not quite zen (geek)
From: [personal profile] jamie
so no ftp either, right? I can't get in :(

Date: 2006-01-10 07:49 pm (UTC)
From: [identity profile] lwood.livejournal.com
Right. No ftp, no scp... no way to move files off of the main machine that I know of. If you gave me a receiving server, I could conceivably give you a copy from one of the backup archives.

Date: 2006-01-10 07:57 pm (UTC)
jamie: bitter panda saying not quite zen (geek)
From: [personal profile] jamie
s'okay. It tends to be a common thing with the people I host so I wanted to confirm that since what had been discussed at that point had been email.

Date: 2006-01-10 07:12 pm (UTC)
From: [identity profile] arian1.livejournal.com
That's it, turn in your geek card, go to the penalty box and feel shame.

Date: 2006-01-10 07:49 pm (UTC)
From: [identity profile] lwood.livejournal.com
Shame takes a back seat to fixing!

-- Lorrie

Date: 2006-01-10 07:15 pm (UTC)
From: [identity profile] snowwy.livejournal.com
My sympathies. Kinda reminds me of the time I accidentally marked an entire inventory as "non-stock" because I forgot I was on the production instance rather than test.

Date: 2006-01-10 07:49 pm (UTC)
From: [identity profile] lwood.livejournal.com
whewps!

Note that I'm not revealing what my Number One Stupid Admin Trick was. 8-P

-- Lorrie

Date: 2006-01-10 08:46 pm (UTC)
From: [identity profile] princeofjeru.livejournal.com
Drat, and I was going to ask too. :)

(You may want to look into logrotate or similar for future log management)

Date: 2006-01-10 09:41 pm (UTC)
From: [identity profile] lwood.livejournal.com
Number One?

Pushing a zero-length password file to every large file server at Hotmail. The machines then had to be rebooted from their original OS CD's. There was a script that did it, I typoed the source file and the script that actually hashed and pushed the password and shadow files didn't have any lint checking.

There were several dozen of these. Wouldn't affect the user experience at all, but the admins couldn't do anything with them...

The whole team -- with the notable exception of myself -- spent most of the night fixing it. I was the notable exception because they were all too pissed to look at me, with definite justification.

Now, I didn't write the script with no lint checking, so that guy got a share of the blame, but...

-- Lorrie

Date: 2006-01-10 07:56 pm (UTC)
From: [identity profile] digitalsidhe.livejournal.com
Man, now I'm even more glad that you're talking about your Lorien and not mine.

Personal note for future: when running find(1) in any non-interactive situation, always explicitly specify the directory. Mmmm-hmmm, lesson learned. (I'm big on learning from other people's mistakes, as well as my own.)

Date: 2006-01-10 09:42 pm (UTC)
From: [identity profile] lwood.livejournal.com
s/find/any cronjob/g

8-)

-- Lorrie


Date: 2006-01-10 08:13 pm (UTC)
From: [identity profile] bergtagen.livejournal.com
Poor monkey. So, erm... is there seidhr at BFUU tonight? Seeing as my post to the list came wandering home all lonesome.

Date: 2006-01-10 09:42 pm (UTC)
From: [identity profile] lwood.livejournal.com
Yep, there sure is! I have hats and mittens for you two to try on, too...

-- Lorrie

Date: 2006-01-10 09:57 pm (UTC)
From: [identity profile] bergtagen.livejournal.com
Wheee! See you there!

Date: 2006-01-10 08:20 pm (UTC)
From: [identity profile] bellacrow.livejournal.com
*hugs*

hey, are you going to be at Greyhaven Sat for the Greek ritual? I'm tagging along with Zoe.

Date: 2006-01-10 09:42 pm (UTC)
From: [identity profile] lwood.livejournal.com
Mmmmmmaybe...

-- Lorrie

Date: 2006-01-10 08:34 pm (UTC)
From: [identity profile] walkyrja.livejournal.com
Sweetling?

COuld you get in and change my email over t' the comcast account? We have a Rede meeting and I *need* access...

Date: 2006-01-10 09:43 pm (UTC)
From: [identity profile] lwood.livejournal.com
Turned mail receipt ON for your comcast addy.

-- Lorrie

Date: 2006-01-11 04:12 pm (UTC)
From: [identity profile] walkyrja.livejournal.com
Thankee kindly!

Ouch

Date: 2006-01-10 09:48 pm (UTC)
From: [identity profile] trogula.livejournal.com
Good luck with the rebuild! Feel free to call/email me if you want/need to bounce rebuild ideas off of me - would be a good time to implement things that weren't implemented before.

Heh - I've done stupid sysadmin tricks like this before. The worst one I ever did: In my first position as a professional sysadmin, we were using Novell 3.1 as the OS on the accounting server. I was applying security patches and updates to the system, and wanted to make a copy of the bindery.

Only I didn't make a copy. I moved it. Out from under the server. While it was running. With no backups (the updates were in part to solve a problem we were having with the backups). I killed the move midway, but suddenly, no one could log into the server.

Novell support told me I had to reinstall - not doing that. I actually had to use tools from a cracker site to hack into the system and restore the bindery. It took all night, but I was finally able to recover.

*whew*

Experiences like that make you properly paranoid. That's how I tell someone has good sysadmin experience when I'm interviewing someone - the level of paranoia and immediate resistance to change before everything is mapped out.

Re: Ouch

Date: 2006-01-10 09:59 pm (UTC)
From: [identity profile] lwood.livejournal.com
I'm usually smarter than that, which is why the Number One fuckup was seven years ago.

-- Lorrie

Re: Ouch

Date: 2006-01-10 10:16 pm (UTC)
From: [identity profile] trogula.livejournal.com
heheheh - me too. I remember that my fatal mistake when I killed the server was fat fingering the mouse. The copy job was using Windows Explorer, and the 'cut' and 'copy' options were next to each other.

This experience actually got me working with command line utilities, which ultimately got me into various and sundry Unix OSes, where I have happily lived ever sense. So in a way, it was a Good Thing (tm).

Re: Ouch

Date: 2006-01-10 10:13 pm (UTC)
From: [identity profile] lwood.livejournal.com
Got IM? Mine's in my userinfo. /var/lib/dpkg, where the Debian package info is, is intact. I'm thinking that I can boot from the Debian net installer to bootstrap just enough stuff on there for it to read the package list, which would be most of an effective recovery in one fell swoop.

I've saved off a tarball with the contents of /etc (unharmed) and put it in an fs that (shouldn't) be affected by the rebuild, which will preserve my postfix config and uids/passwds. The apache config was in /home and wasn't affected. What else can you think of?

Ooh, the specifics of the network config, that's important. *writes that down*

The dicey thing about a reboot is that I'll lose ssh at that point; the memory-resident binary survived the disk falling away beneath it.

-- Lorrie

Re: Ouch

Date: 2006-01-10 10:30 pm (UTC)
From: [identity profile] trogula.livejournal.com
Debian system, huh? Let me get onto one of those bad boys so I can look and see how the fs is laid out (I've been using RHEL/WBEL/CentOS pretty much exclusively for the past two years).

Let's see..
* check /var for possible things you might need (/var/backups in particular)

* check /root for anything that might be hanging about that you forgot

* check /opt for any 3rd party installations (not likely on a Debian machine)

That's all I can think of off hand.



I have seem ext3 filesystem versioning problems with what you are attempting to do - there are some upgrades to ext3 that aren't backwards compatable, so leaving things on local filesystems you hope to mount later can be dicey. 90% of the time, it will work, but it might not. I recommend tar-ing everything up, and copying everything to a remote location or burning to CD before rebooting (given that scp is in /usr/bin, this might be a problem. Do you have a local ftp client? I can provide disk space for you if you need on a server I have).

I just added your Yahoo IM to my f-list. Approve the add, and message me for my cell number if you need either an scp or ftp based account to stuff things to and from. I would back up everything that wasn't killed, attempt to recover, and if you can't - wipe and full reinstall.

Re: Ouch

Date: 2006-01-10 10:50 pm (UTC)
From: [identity profile] lwood.livejournal.com
* check /var for possible things you might need (/var/backups in particular)

/var tarred and moved to high ground. Copied/pasted /etc/[passwd|shadow] to text files on my Mac.

* check /root for anything that might be hanging about that you forgot

/root doesn't have anything helpful in it, nor /boot.

* check /opt for any 3rd party installations (not likely on a Debian machine)

That's /usr/local here, but /usr/local was part of the carnage. Happily I had Apache in /home/apache, as /home is my free space hog, or all the websites would have been blowns away (they have offsite backups as well as a select number of the ~'s)

I have seem ext3 filesystem versioning problems with what you are attempting to do - there are some upgrades to ext3 that aren't backwards compatable, so leaving things on local filesystems you hope to mount later can be dicey. 90% of the time, it will work, but it might not.

There was a kablooie of that scale when we went from Debian 3.0 to 3.1 -- when it updated everything, it changed things around successfully, but didn't rereun lilo after fiddling with fiddly boot bits. Duh. I'll be putting 3.1 on top of 3.1, so there "shouldn't" be any weird fs issues, but I'll definitely be running lilo before booting from hdd. 8-P

I recommend tar-ing everything up, and copying everything to a remote location or burning to CD before rebooting (given that scp is in /usr/bin, this might be a problem. Do you have a local ftp client? I can provide disk space for you if you need on a server I have).

/usr/bin, /usr/local, et al are gone, and with them ftp, scp, etc etc. I can copy and paste text files around, I could conceivably burn CD's and walk them across the room, but that's the only way I have of getting non-text files in or out just now.

I just added your Yahoo IM to my f-list. Approve the add, and message me for my cell number if you need either an scp or ftp based account to stuff things to and from. I would back up everything that wasn't killed, attempt to recover, and if you can't - wipe and full reinstall.

I didn't get your add req. Odd...

-- Lorrie

Date: 2006-01-10 10:02 pm (UTC)
From: [identity profile] ibm.livejournal.com
doh! Glad to hear it's a Learning Experience, not a "Backups? What backups?" event.

I still recall taking Vento down by cutpasting a bash forkbomb into a terminal window instead of IRC, as intended... part of me was impressed that I could do that from a userland account, the other was pondering what level of deniability I needed.

Date: 2006-01-10 10:24 pm (UTC)
ardaniel: photo of Ard in her green hat (Default)
From: [personal profile] ardaniel
I was more amused than annoyed, since I just had to reboot it to clear the process table. :)

Date: 2006-01-10 10:04 pm (UTC)
From: [identity profile] ibm.livejournal.com
Hah! I just realized I'm still tunneling traffic via SOCKS4 over SSH through lorien at this moment. impressive, that... I should probably undo that for a bit, eh.

worst sysadmin screwups

Date: 2006-01-10 11:06 pm (UTC)
From: [identity profile] gnowun.livejournal.com
My worst one was tarring the client's international financial database onto ITSELF. BOOM, implosion, data gone. The INTENT was to make a backup of the database to move over onto a new test system with the updated versions of both the database software and the new (then) HPUX 10. The timing was also bad as it was just before quarter end. Backup from tape was restored in just about an hour, but they still lost about 8 hours of data entry. I developed a healthy paranioa of command line parameters thereafter.

Profile

lwood: (Default)
lwood

February 2011

S M T W T F S
  12345
6789 101112
13141516171819
20212223242526
2728     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 12th, 2026 02:39 pm
Powered by Dreamwidth Studios