How to Securely Provide a Zip Download of a S3 File Bundle

Teamwork Engineering
Teamwork Engine Room
7 min readAug 3, 2015

Back in 2012, we added a “Download Multiple Files” option to Teamwork Projects. However, this option depended on browser support and dumped all the files to the browser’s “downloads” folder without keeping the categories’ directory structure.

For years, we have meant to find the time to add a better ZIP download option that would download all the files in one bundle wwhile still maintaining the defined categories’ directory structure.

Here, I outline how we built an elegant file zipper in just one night thanks to the power of Go. Even if you don’t currently use Go (aka “Golang,” a language from Google that we are massive fans of), the mechanism we present here works with your server-side language of choice and you just run the file zipper as a microservice.

Impatient? Go grab github.com/Teamwork/s3zipper

A Streaming Solution

The standard way to provide a backup of S3 files would be to download all the files to a temp folder, zip them, and then serve up the zipped file. However, that method is slow to start for the user, takes a lot of server file space, and requires cleanup. That’s just slow, inelegant and messy.

What if we could steam the files to the user while zipping them on the fly. Called ‘Piping,’ we wouldn’t have to store files, perform cleanup, and keep the user waiting for the download to start.

Well, that’s exactly what we did, in just a few hours thanks to the power of Go.

Just Show me The Bleedin’ Code

Enough of me ranting. If you’re reading this, and you are like me, you want working code to try. But first, please let me briefly outline how the download process and security works first :-

  • Our main platform takes an API request for a zip file with a number of fileIds passed. E.g. download/zip?fileIds=83748,379473,93894
  • The platform then authenticates the user as normal and pulls the details about the files from our database.
  • It then creates a unique download reference string and puts an array with descriptions of the files into Redis with the reference string as a key, e.g. “zip:gdi63783hdhA73”. The file descriptions include the file name, folder path, and s3 file path. The key is set to timeout after five minutes.
  • We simply redirect the user to the s3 file zipper passing along the reference string. E.g., zipper.teamwork.com?ref=gdi63783hdhA73

The s3 file zipper itself doesn’t have to perform security. If a key exists, it is happy to proceed. It just receives a request with a reference string, asks Redis for the corresponding files, and starts pulling them from S3 while simultaneously zipping blocks and sending them to the client. It’s just a dumb beautiful machine. Here’s the code:

package mainimport (
"archive/zip"
"encoding/json"
"errors"
"fmt"
"io"
"log"
"os"
"regexp"
"strconv"
"strings"
"time"
"net/http""github.com/AdRoll/goamz/aws"
"github.com/AdRoll/goamz/s3"
redigo "github.com/garyburd/redigo/redis"
)
type Configuration struct {
AccessKey string
SecretKey string
Bucket string
Region string
RedisServerAndPort string
Port int
}
var config = Configuration{}
var aws_bucket *s3.Bucket
var redisPool *redigo.Pool
func main() {configFile, _ := os.Open("conf.json")
decoder := json.NewDecoder(configFile)
err := decoder.Decode(&config)
if err != nil {
panic("Error reading conf")
}
initAwsBucket()
InitRedis()
fmt.Println("Running on port", config.Port)
http.HandleFunc("/", handler)
http.ListenAndServe(":" strconv.Itoa(config.Port), nil)
}
func initAwsBucket() {
now := time.Now()
var dur time.Duration = time.Hour * 1
expiration := now.Add(dur)
auth, err := aws.GetAuth(config.AccessKey, config.SecretKey, "", expiration) //"" = token which isn't needed
if err != nil {
panic(err)
}
aws_bucket = s3.New(auth, aws.GetRegion(config.Region)).Bucket(config.Bucket)
}
func InitRedis() {
redisPool = &redigo.Pool{
MaxIdle: 10,
IdleTimeout: 1 * time.Second,
Dial: func() (redigo.Conn, error) {
return redigo.Dial("tcp", config.RedisServerAndPort)
},
TestOnBorrow: func(c redigo.Conn, t time.Time) (err error) {
_, err = c.Do("PING")
if err != nil {
panic("Error connecting to redis")
}
return
},
}
}
// Remove all other unrecognised characters apart from
var makeSafeFileName = regexp.MustCompile(`[#<>:"/\|?*\\]`)
type RedisFile struct {
FileName string
Folder string
S3Path string
// Optional - we use are Teamwork.com but feel free to rmove
FileId int64
ProjectId int64
ProjectName string
}
func getFilesFromRedis(ref string) (files []*RedisFile, err error) {// Testing - enable to test. Remove later.
if 1 == 0 && ref == "test" {
files = append(files, &RedisFile{FileName: "test.zip", Folder: "", S3Path: "test/test.zip"}) // Edit and dplicate line to test
return
}
redis := redisPool.Get()
defer redis.Close()
// Get the value from Redis
result, err := redis.Do("GET", "zip:" ref)
if err != nil || result == nil {
err = errors.New("Reference not found")
return
}
// Decode the JSON
var resultByte []byte
var ok bool
if resultByte, ok = result.([]byte); !ok {
err = errors.New("Error reading from redis")
return
}
err = json.Unmarshal(resultByte, &files)
if err != nil {
err = errors.New("Error decoding files redis data")
}
return
}
func handler(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Get "ref" URL params
refs, ok := r.URL.Query()["ref"]
if !ok || len(refs) < 1 { http.Error(w, "S3 File Zipper. Pass ?ref= to use.", 500) return } ref := refs[0] // Get "downloadas" URL params downloadas, ok := r.URL.Query()["downloadas"] if !ok && len(downloadas) > 0 {
downloadas[0] = makeSafeFileName.ReplaceAllString(downloadas[0], "")
if downloadas[0] == "" {
downloadas[0] = "download.zip"
}
} else {
downloadas = append(downloadas, "download.zip")
}
files, err := getFilesFromRedis(ref)
if err != nil {
http.Error(w, "Access Denied (Link has probably timed out)", 403)
log.Printf("Link timed out. %s\t%s", r.Method, r.RequestURI)
return
}
// Start processing the response
w.Header().Add("Content-Disposition", "attachment; filename=\"" downloadas[0] "\"")
w.Header().Add("Content-Type", "application/zip")
// Loop over files, add them to the
zipWriter := zip.NewWriter(w)
for _, file := range files {
// Build safe file file name
safeFileName := makeSafeFileName.ReplaceAllString(file.FileName, "")
if safeFileName == "" { // Unlikely but just in case
safeFileName = "file"
}
// Read file from S3, log any errors
rdr, err := aws_bucket.GetReader(file.S3Path)
if err != nil {
switch t := err.(type) {
case *s3.Error:
if t.StatusCode == 404 {
log.Printf("File not found. %s", file.S3Path)
}
default:
log.Printf("Error downloading \"%s\" - %s", file.S3Path, err.Error())
}
continue
}
// Build a good path for the file within the zip
zipPath := ""
// Prefix project Id and name, if any (remove if you don't need)
if file.ProjectId > 0 {
zipPath = strconv.FormatInt(file.ProjectId, 10) "."
// Build Safe Project Name
file.ProjectName = makeSafeFileName.ReplaceAllString(file.ProjectName, "")
if file.ProjectName == "" { // Unlikely but just in case
file.ProjectName = "Project"
}
zipPath = file.ProjectName "/"
}
// Prefix folder name, if any
if file.Folder != "" {
zipPath = file.Folder
if !strings.HasSuffix(zipPath, "/") {
zipPath = "/"
}
}
// Prefix file Id, if any
if file.FileId > 0 {
zipPath = strconv.FormatInt(file.FileId, 10) "."
}
zipPath = safeFileName
// We have to set a special flag so zip files recognize utf file names
// See http://stackoverflow.com/questions/30026083/creating-a-zip-archive-with-unicode-filenames-using-gos-archive-zip
h := &zip.FileHeader{Name: zipPath, Method: zip.Deflate, Flags: 0x800}
f, _ := zipWriter.CreateHeader(h)
io.Copy(f, rdr)
rdr.Close()
}
zipWriter.Close()log.Printf("%s\t%s\t%s", r.Method, r.RequestURI, time.Since(start))
}
View S3Zipper on GitHubIt's extremely fast, low memory, and can handle thousands of simultaneous requests. It's also secure (auth done elsewhere and keys timeout) and very simple.After years of wanting to get this feature done, it was just one long night's work thanks to the power of Go and some of its fantastic open source and internal libraries.You'll see some voodoo around line 211 - this was added to provide UTF character support for our many international customers.TestingIf you want to quickly test this:

Your files should download as a Zip file instantly.

Moving to Production

Now, you just need to get this running on a server and have your server-side language put the file definitions into memory, and then redirect the user to the microservice.

Setting up the S3Zipper Microservice

  • Fire up a new EC2 Ubuntu server (I went with [S3Type and Ubuntu image]).
  • Install Go. Do not install Go via apt-get. At time of writing it’s an outdated version of go. Install go from source — tutorial
  • Create a new user to run the service under.
  • Clone our repo and checkout to server
  • Create your config file
  • Test the script with go run s3zipper.go (run “go get” first to get libraries)
  • Run as a service using the upstart script below
Upstart ConfigCopy this upstart script to etc/init to run this as a service:

s3zipper.conf
description "S3 File Zipping Server"
author "[You]"
start on started mountall
stop on shutdown
respawn
respawn limit 99 5
script
export HOME="/home/USERX"
export GOPATH="/home/USERX/go"
export GOROOT="/home/USERX/.gvm/gos/go1.4.2"
export PATH=$PATH:$GOPATH/bin:$GOROOT/bin
ulimit -n 50000cd /full/path/to/s3zipper
exec setuidgid USERX go run s3zipper.go &gt;&gt; /var/log/s3zipper.log 2&gt;&amp;1
end script
Replace USERX with your new user, set the GOPATH and GOROOT correctly and fix up full/path/to. If this is new to you, see Upstart - Getting StartedServing up the Zip Download

You'll need to make an server side call in your language of choice that will:

  • Authenticate the user (as normal)
  • Generate a unique random reference code
  • Put the details about the files to download into Redis with the key “zip:[ref]” (timeout 5 mins). Note that files must be in the format:

    ``` [{“S3Path”:”path”, “FileName”:”sample.txt”, “Folder”:”folder”}…]

1. Redirect the user to the microservice

Sample: ``` javascript function downloadZip(fileIds) Test_logged_in() files = Lookup_file_details_or_panic()// Encode files and save to redis json = JSON.encode(files) ref = Generate_Random_Ref() redis.set( key="zip:"+ref, value=json, expiry=300 )// Redirect the user to the S3Zipper RedirectUser( "https://zipperURL/?ref=" + ref )

I Hope that Works for You

If you have any questions, just let us know in the comments below. I hope somebody somewhere finds this code useful and please if you do, just say hello (or come work with us). Enjoy!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Responses (2)

What are your thoughts?