Wednesday, March 7, 2012

"Mysql server has gone away" or "Lost connection to mysql server" when using Amazon RDS for for backup or whatev's

Hey,
OMFG what a PITA.
You run long mysql jobs, especially on an amazon RDS instance, especially e.g. backup, and it randomly fails, esp as the db gets bigger, and esp after several hours.
Well FML!
The answer is all over Google; it's either

a) [most common] the "net_write_timeout" variable on the source server (e.g. that you're dumping from); the AWS default is like 60s, which is sensibly cautious if your clients are some crashy bullshit (as hanging result sets/cursors are obviously expensive to keep lying around). 60s is fine for a web app, but if your client is occasionally/randomly gets stalled for a really long time (think; EBS, Innodb reindex if you're streaming direct via a pipe from mysqldump | mysql to do server->server backups; which can be a fine idea depending on yr needs).

OR

b) [less common] the destination server that you're unpacking/writing to has timed out for similar reasons; you were in the middle of providing [usually a metric fuck-ton] of data in an INSERT or UPDATE and the server doing the insert got hung up on something randomly once in a blue moon (e.g. system backup) and boom; the mysql box you're writing too tells you to f-off.
Typical AWS timeout here (net_read_timeout) is like 30s.

Ok so there you go; AWS defaults are sensible for a high-volume web db trying to protect itself from bad clients. Backups occasionally classify as 'bad clients'. FML again.

------THE FIX:

--EITHER (easy,global)
a) Change your RDS instance settings (amazon control panel) to set 'net_read_timeout" (and write) to something bigger globally across all connections. You might pick e.g. 20mins. If you have a lot of crappy/dropping DB connections tbis might be an issue.
--OR-- (usually easy for client code, not for mysqldump)
b) If you're running client sql code (e.g. php,python) simply do "SET net_write_timeout=3600"or whatever (or read_timeout, depending if yr problem is in SELECT or INSERT/UPDATEs) on each conn after you open it and bingo.
The var is set per-connection. This works perfectly for client code but not for mysqldump backups which I'll get to below.
--OR-- (middling, fixes mysqldump)
If your problem is with mysqldump piped dumps vi
e.g.
mysqldump yadda | ... | mysql yadda
(I do this using 'tee' and pipe a copy via gzip into a backup file at the same time I copy a db direct from one host to another making a compressed copy @ same time with no limit on db size. ..actually that is why I'm writing this..)

The issue you see is when the target DB stalls (can happen for e.g 1-10 min at bad times) the source DB times out (quickly; 60s default) on the write socket. This is not AFAIK fixable with normal mysqldump without changing the server's global timeouts for everyone.

The obvious fix is for mysqldump to send "set net_write_timeout=blah" after opening the conn it uses for dumping. Wierdly I cannot make it do that regardless of options, so I hit up the src code.

Basically I patched mysqldump so it sends "net_write_timeout=somebigtime" to the source server AND prefixes "set net_read_timeout=blah" into the SQL dump it outputs.

This solves both ends of the problem, especially when piping from one server to another with msqldump | mysql - both source and dest servers set the timeout _temporarily on that connection_ to a nice long value, e.g. 1hr.


Fixing mysqldump so it sends "net_write_timeout"

So I basically grabbed the mysql src and compiled it (google), on an aws box.
(actually "yum install mysql-devel ncurses-devel" may be handy, prolly a few more)
You don't need all the mysql stuff, just mysqldump,
..but I compiled everything (./configure {google for opts} and make)

MySqldump is in client/ and all the guts is in mysqldump.c
The hack is utterly trivial;

at the end of "connect_to_db" before teh DBUG_RETURN I added


#ifdef MUNKY_HACK
my_snprintf(buff, sizeof(buff), "SET net_write_timeout=3600");
if (mysql_query_with_error_report(mysql, 0, buff))
DBUG_RETURN(1);
#endif


..which is most important and tells the src server to use longer timeouts just for the dumping.

IF you're being fancy and piping from one db to another, you may want the output dump to include a prefix telling the target db to use longer timeouts.
I addeed this to near the top of 'dump_table' as ;

fprintf(md_result_file, "/* Munky Hack*/\n set net_read_timeout=3600;\n");


It's that easy (it seems) to make this stuff work properly reliably

Monday, January 23, 2012

Tedious issues escaping/quoting strings for MySQL? Use hex! It's awesome!

Hiya,
Ok sure yes you 'escape' all your strings for SQL (i.e. replace `'` with \' and so on for other nonprintable).
Sometimes this is unsuitable.

Did you know you can avoid all that palarva and just pass your strings as hex?

Check it out!

For a varchar (or blob, or whatev) column, just try instead of

insert col='blah' into mytable
=
insert col=0x626c6168 into mytable

How awesome is that? Who knew?

Also, btw, you can get mysqldump to also output blobs as hex, which makes for easier parsing - see the help output

Wednesday, January 18, 2012

See wikipedia during january 18th blackout

Yes, yes wikipedia, we know you're all radical and cool.
Anyway, just append

?banner=none

to the end of your URL, for example;


kthxbye

Friday, January 6, 2012

Obscure packetization bug in Verizon cellular HTTP proxy 'Harmony'! Does your app fail on VZW cell but work on wifi?

Wow this is a fucking bug and a half;

(On iPhone, but applies to everything)
Verizon currently appears to proxy any HTTP request over any port regardless of whether you asked for it. It adds

"X-Via: Harmony proxy"

to show how much it 'helped'. Thanks, if I wanted a proxy I'd ask for one.

It does it even on non-port 80!! FFS!
It appears that anything that says HTTP/1.1 after opening a socket on any port is fair game for fuckwittery.

And, wonderfully, Harmony Proxy has a packet reassembly bug!

The client app was (inadvertently) writing the HTTP request split into two TCP packets;
The first packet was only 22 bytes long, containing the URL
The second packet was 500 or so bytes and was the rest of the header continuing from " HTTP/1.1\r\n...etc"

Hence the very first line of the HTTP req was split into two TCP packets. This is obviously rather unusual.
The proxy freaks out at this, throws away all the HTTP headers, and just generally screws the request up (although it does send it)..

So, when using socket .send() , make sure you've buffered at least the first few lines of your headers in one go.

if your stuff works over wifi, other cell networks, but not VZW, and you're using HTTP...
watch out for this. It's a bug in their proxy.

Tuesday, November 9, 2010

Rewriting (proxying) a TCP stream in one line

Damn unix is great...

Step 1: In a shell, set up netcat to listen on TCP 5500:
nc -l -p 5500

Step 2: In another shell/on an intermediate machine..
nc -l -p 50000 -c 'sed -u -e s/billy/freddy/g | nc localhost 5500'

Step 3: In a third shell,
nc localhost 50000

..and start typing stuff, including the word "billy"

Ok so the third shell will talk to the first shell and replace "billy" with "freddy".
Note this is bidirectional - type in first shell and it goes back to third one (without rewrite)

How handy is that? VERY!

If you need a TCP stream (e.g. gnu debugger) intercepted and rewritten slightly.. that's yer man.

Thank you beardy unix gods from 1972!

Monday, October 18, 2010

Setting Tomcat JVM heap size as a percentage of total ram

Put at start of "bin/startup.sh"

export CATALINA_OPTS="-Xmx`cat /proc/meminfo | grep MemTotal | awk '{ print $2*0.75 } '`k"


..to use 75% of your total system memory for JVM.
Add to /bin/tomcat.sh or whatev's

Tuesday, October 12, 2010

SSL performance of Tomcat and Apache Commons HTTP

Using latest Tomcat 6 on Java 1.6 on Centos.

1. Tomcat JSSE (standard) SSL option is much slower than the Apache Portable Runtime (APR) libaries.

SSL performance (using CURL to upload large binaries via HTTPS to localhost on a multicore box) was getting
5.3MBytes/sec using Tomcat default SSL (100% Java on one CPU), and getting >30MBytes/sec using Tomcat with APR (basically OpenSSL) and lower CPU usage.

Conclusion: Well worth the hassle installing APR for incoming (and probably outgoing) e.g. 5x faster SSL traffic, lower less Java CPU.

2. Beware using standard libraries to upload to Rackspace Files and other HTTPS-accessed key:value stores. Using Apache Commons HTTP Uploader tops out at about 2mbytes/sec on one machine compared to 5mbytes/sec using CURL or WGET from command line. This was very repeatable.

3. Speed writing from Rackspace Cloud Server to Rackspace Files using CURL (i.e. running flat-out)
(See that the different CLoud Server options have different bandwidth clamps as well as significantly faster CPU)
Uploading a 176MB object to Rackspace Files, using CURL from command line to do a PUT
"512MB machine" : 5.02MBytes/sec (very reliably - this is clamped by network)
"1GB machine" : 7.65MBytes/sec (again network bound, more expensive server)
"16GB machine" : 15MBytes/sec
CPU in these cases (for Curl) is <10% typically.
In all cases running two threads (these are multicore machines) gets exactly half the performance, indicating networtj throttle.

HOWEVER if you don't use OpenSSL for your HTTPS....

Using Apache HTTP Uploader (and most likely any 'pure'-Java SSL options that are using JSSE) only gets
"512MB machine" : 2.1MBytes/sec (this is 100% CPU on that java thread)

Conclusion:
...creating outbound HTTPS connections using Java SSL gets CPU-bound really quickly - watch for this when using JSSE-based Java SSL clients.