Logging HTTP requests in Go

Krzysztof Kowalczyk
Aug 30 · 7 min read · 1050 views

If you're running HTTP servers you probably want to log HTTP requests.

They are helpful for diagnosing problems (which requests fail? how many requests per day we serve? which requests are slow?).

They are necessary for analytics (which pages are popular? where do website visitors come from?).

This article describes everything you need to know to log HTTP requests in your own Go web servers.

This is not about re-usable library but about things you need to know to implement your own solution, a description of my logging choice.

You can see how all things comes together in a sample app: https://github.com/essentialbooks/books/tree/master/code/go/logging_http_requests

I use this logging system in production in OnePage web service.

What information to log

To show what information is typically logged, here's a sample log entry in Apache's Extended Log File Format:

111.222.333.123 HOME - [01/Feb/1998:01:08:39 -0800] "GET /bannerad/ad.htm HTTP/1.0" 200 198 "http://www.referrer.com/bannerad/ba_intro.htm" "Mozilla/4.01 (Macintosh; I; PPC)"

We have:

  • 111.222.333.123 : ip address of client making HTTP requests
  • HOME : domain (for the cases of serving multiple domains from the same web server)
  • - : user authentication (empty in this case)
  • [01/Feb/1998:01:08:39 -0800] : time when request was recorded
  • "GET /bannerad/ad.htm HTTP/1.0" : HTTP method, url and protocol type
  • 200 : HTTP status code. 200 means the request was successfully handled
  • 198 : size of the response
  • "http://www.referrer.com/bannerad/ba_intro.htm" : referer
  • "Mozilla/4.01 (Macintosh; I; PPC)" : so called user agent identifying HTTP client (most likely a web browser)

We can log more information or choose to not log some of the above information.

Personally:

  • I also log how long did the server spent generating a request, in milliseconds (that's enough resolution for me; recording in microseconds is possible but an overkill)
  • I don't log protocol (like HTTP/1.0)
  • a server usually serves a single purpose so no need to log domain
  • if the service has user authentication, I also log user id

Getting information to log

The signature of Go's standard HTTP serving function is:

func(w http.ResponseWriter, r *http.Request)

We'll add logging as a so-called middleware, which is a way to add re-usable pieces of functionality into HTTP serving pipeline.

We have logReqeustHandler function that takes http.Handler interface as an argument and returns another http.Handler that wraps the original handler and adds logging:

func logRequestHandler(h http.Handler) http.Handler {
	fn := func(w http.ResponseWriter, r *http.Request) {

		// call the original http.Handler we're wrapping
		h.ServeHTTP(w, r)

		// gather information about request and log it
		uri := r.URL.String()
		method := r.Method
		// ... more information
		logHTTPReq(uri, method, ....)
	}

	// http.HandlerFunc wraps a function so that it
	// implements http.Handler interface
	return http.HandlerFunc(fn)
}

We can nest middleware handlers on top of each other, each adding some functionality.

Here's how we would use it to add logging to all request handlers:

func makeHTTPServer() *http.Server {
	mux := &http.ServeMux{}
	mux.HandleFunc("/", handleIndex)
	// ... potentially add more handlers

	var handler http.Handler = mux
	// wrap mux with our logger. this will 
	handler = logRequestHandler(handler)
	// ... potentially add more middleware handlers

	srv := &http.Server{
		ReadTimeout:  120 * time.Second,
		WriteTimeout: 120 * time.Second,
		IdleTimeout:  120 * time.Second, // introduced in Go 1.8
		Handler:      handler,
	}
	return srv
}

First, let's define a struct to package all the logged information:

// LogReqInfo describes info about HTTP request
type HTTPReqInfo struct {
	// GET etc.
	method string
	uri string
	referer string
	ipaddr string
	// response code, like 200, 404
	code int
	// number of bytes of the response sent
	size int64
	// how long did it take to
	duration time.Duration
	userAgent string
}

Here's the full implementation of logRequestHandler:

func logRequestHandler(h http.Handler) http.Handler {
	fn := func(w http.ResponseWriter, r *http.Request) {
		ri := &HTTPReqInfo{
			method: r.Method,
			uri: r.URL.String(),
			referer: r.Header.Get("Referer"),
			userAgent: r.Header.Get("User-Agent"),
		}

		ri.ipaddr = requestGetRemoteAddress(r)

		// this runs handler h and captures information about
		// HTTP request
		m := httpsnoop.CaptureMetrics(h, w, r)

		ri.code = m.Code
		ri.size = m.BytesWritten
		ri.duration = m.Duration
		logHTTPReq(ri)
	}
	return http.HandlerFunc(fn)
}

Let's go over the easy cases:

  • r.Method returns HTTP method, like "GET", "POST" etc.
  • r.URL is parsed url, like /getname?id=5 and String() gives us string representation
  • r.Header is a struct representing HTTP headers. Headers contain, among other things, Referer and User-Agent
  • to log how long it takes the server to serve the request, we remember timeStart at the beginning, call the handler, get the duration with time.Since(timeStart)

Other information is harder to get.

The problem with getting IP address of the client is that there could be a HTTP proxy involved. The client makes a request to the proxy, the proxy makes a requests to us. As a result, we get the IP address of the proxy, not the client.

Because of that, proxies usually augment the request to us with information about client's real IP address using X-Real-Ip or X-Forwarded-For HTTP headers.

Here's how to extract this info:

// Request.RemoteAddress contains port, which we want to remove i.e.:
// "[::1]:58292" => "[::1]"
func ipAddrFromRemoteAddr(s string) string {
	idx := strings.LastIndex(s, ":")
	if idx == -1 {
		return s
	}
	return s[:idx]
}

// requestGetRemoteAddress returns ip address of the client making the request,
// taking into account http proxies
func requestGetRemoteAddress(r *http.Request) string {
	hdr := r.Header
	hdrRealIP := hdr.Get("X-Real-Ip")
	hdrForwardedFor := hdr.Get("X-Forwarded-For")
	if hdrRealIP == "" && hdrForwardedFor == "" {
		return ipAddrFromRemoteAddr(r.RemoteAddr)
	}
	if hdrForwardedFor != "" {
		// X-Forwarded-For is potentially a list of addresses separated with ","
		parts := strings.Split(hdrForwardedFor, ",")
		for i, p := range parts {
			parts[i] = strings.TrimSpace(p)
		}
		// TODO: should return first non-local address
		return parts[0]
	}
	return hdrRealIP
}

Capturing ResponseWriter status code and size of the response is the most tricky.

http.ResponseWriter doesn't give us this information. Luckily, it's a simple interface:

type ResponseWriter interface {
    Header() Header
    Write([]byte) (int, error)
    WriteHeader(statusCode int)
}

It's possible to write an implementation of this interface which wraps the original and records the information we want to know. Lucky for us, some already did it in package httpsnoop.

Format of log file

The Apache log format is compact, human-readable but rather hard to parse.

At some point we also need to read the logs for analysis and I don't relish the idea of implementing a parser for this format.

From the point of view of implementation, one easy way would be to log each record in JSON, separated by newline.

What I don't like about this approach is: JSON is not easy to read.

As a middle-ground, I've created siser library which implements a serialization format that is extensible, easy to implement and human readable. It's well suited for logging structured information and I use it in multiple projects.

Here's how a single request is serialized:

171 1567185903788 httplog
method: GET
uri: /favicon.ico
ipaddr: 204.14.239.58
code: 404
size: 758
duration: 0
ua: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0

The firsts line of each record contains the information:

  • 171 is the size of the following record data. Knowing size in advance allows for safe and efficient implementation
  • 1567185903788 is a timestamp in Unix format (number of seconds since Epoch). It allows us to skip recording redundant timestamp in the data
  • httplog is type of the record. This allows us to write different kinds of records to the same file. In our case, all records are the same.

What follows is data in a key: value format.

Here's how we serialize a record and write it to log file:

var (
	muLogHTTP sync.Mutex
)

func logHTTPReq(ri *HTTPReqInfo) {
	var rec siser.Record
	rec.Name = "httplog"
	rec.Append("method", ri.method)
	rec.Append("uri", ri.uri)
	if ri.referer != "" {
		rec.Append("referer", ri.referer)
	}
	rec.Append("ipaddr", ri.ipaddr)
	rec.Append("code", strconv.Itoa(ri.code))
	rec.Append("size", strconv.FormatInt(ri.size, 10))
	durMs := ri.duration / time.Millisecond
	rec.Append("duration", strconv.FormatInt(int64(durMs), 10))
	rec.Append("ua", ri.userAgent)

	muLogHTTP.Lock()
	defer muLogHTTP.Unlock()
	_, _ = httpLogSiser.WriteRecord(&rec)
}

Rotating logs daily

I usually deploy my servers on Ubuntu servers and place logs in /data/<service-name>/log directory.

We can't write to the same log file forever. Eventually we would run out of space.

For long term logs I usually have a log file per day, named by date e.g. 2019-09-23.txt, 2019-09-24.txt etc.

This is sometimes called log rotat

To avoid re-implementing this functionality multiple times, I wrote a library dailyrotate.

It implements Write, Close and Flush methods, so it's easy to plug into existing code using io.Reader etc.

You specify a directory to use and a format for the name of the log file. The format is given to Go's time formatting function. I usually use 2006-01-02.txt as this generates a unique time for each day, sorts by day and .txt is a hint to tools that this is a text, and not binary, file.

You then write to it as to a regular file and the code ensures to create per-day files.

You can also provide a callback to be notified when rotation happens so that you can, for example, upload the just closed file to online storage for archival or run analytics on it.

Here's the code:

pathFormat := filepath.Join("dir", "2006-01-02.txt")
func onClose(path string, didRotate bool) {
  fmt.Printf("we just closed a file '%s', didRotate: %v\n", path, didRotate)
  if !didRotate {
    return
  }
  // process just closed file e.g. upload to backblaze storage for backup
  go func() {
    // if processing takes a long time, do it in a background goroutine
  }()
}

w, err := dailyrotate.NewFile(pathFormat, onClose)
panicIfErr(err)

_, err = io.WriteString(w, "hello\n")
panicIfErr(err)

err = w.Close()
panicIfErr(err)

Long term storage and analytics

For long term storage I compress them with gzip and upload the files to online storage. There are plenty of options: S3, Google Storage, Digital Ocean Spaces, BackBlaze.

I tend to use Digital Ocean Spaces or BackBlaze, as they are significantly cheaper (both in storage costs and bandwidth costs).

They both support S3 protocol so I use go-minio library.

For analytics I do daily processing using my own code to generate basic summaries of the most useful information.

Another option would be to import the data into a system like BigQuery.

More Go resources

  • Essential Go is a free, comprehensive book about Go that I maintain
  • siser is my library implementing simple serialization format
  • dailyrotate is my library implementing a file that rotates daily

Go programmer for hire

If you're looking for a Go programmer for contract work, let's talk.

Upadating...

Share on