Satya's blog - Moving things in AWS S3
Recently I had several hundred small files in an AWS S3 bucket, in folders by date. Something like s3://bucket/2016-02-08/ (and a couple layers deeper), with a few "directories" under the dated "directory". (The sub-directories were the first letter of the ... things, so I had about 62 sub-directories (A-Z, a-z, 0-9). This is a naive hash function.) I wanted to move them into a year-based "directory", so s3://bucket/2016/2016-02-08/ (Why am I quoting the word "directory"? Because they're not really directories, they're "prefixes". This is relevant if you use the AWS S3 SDK libraries.) Moving them via the S3 web "console"'s cut/paste interface is slow. REALLLLLY slow. Like, multiple-days slow. So I (after trying a few other things) pulled out the aws command-line tool (AWS CLI). Since the sub-directories were letters and numbers, I could do this: `for x in a b c d;do echo aws s3 mv --recursive s3://bucket/2016-02-08/$x s3://bucket/2016/2016-02-08/$x \&;done > scr.sh` The `for` loop runs through a, b, c, d (different from A, B, C, D), and sets up a recursive move operation. This move is much faster using the AWS CLI. Additionally, I background the process of moving the 'a's (using the `\&`) so the 'b's can start right away, and so forth. But I don't run the commands right away. Notice that they're being `echo`ed. Capture the output in a file scr.sh, and run the scr.sh. Why? Because I can now set up a second file with d e f g, to go right after the first, or even in parallel. So now I have up to 4 or 8 move operations going at once. watch the whole thing with `watch "ps axww|grep scr"` in a separate terminal, of course. But mainly because the `&` backgrounding interacts weirdly with the for loop. With this, I was done in well, a couple of hours. A lot of that was waiting for the last copy-paste I ran in the web console to finish. |
|