Engineering lab notes - what I'm working on

Krzysztof Kowalczyk
Sep 16 · 6 min read · 167 views
This is an experiment in transparency. A log of things I work on. Freshest entries at the top.
Chances are it won't last so get it while it's fresh.
For background: I'm Chris and I work on software projects.

I really like Svelte. In presstige.io I wrote 2 interactive web pages in plain JavaScript.
I spent one day re-writing them in Svelte and the result is so much better.
Generating HTML in JavaScript is awkward and keeping display synchronized with state is tedious and error prone.
Svelte (and other frameworks like React or Vue) is magical: the code is so much simpler and the framework does optimization that I wouldn't bother implementing so it's also faster.

Recent improvements in presstige.io:
  • added a somewhat big feature: collections. Can group pages into collections. The idea behind collection is to help readers find interesting articles
  • rewrote /dashboard and /u/ pages from being mostly generated with Go templates + plain JavaScript to SPA using Svelte. The more interaction there is, the more it makes sense to use Svelte
As a side note: Svelte is great. It's an improvement over React (which is also great).
I'm so giddy when implement things quickly with Svelte.

To be faster I programming I seek to increase iteration speed.
An example of that in notionapi project: I wrote a script that saves all http requests and responses during a browsing session.
It uses puppeteer library that drives an instance of Chrome and allows to hook into network requests.
Working on API wrapper is mostly about reverse engineering the HTTP api.
One way to do it is to use Dev Tools in Chrome and inspect request / responses in the UI. It works but it's not ideal. They UI isn't focused on that specific scenario.
By writing to a file, I have a permanent record for the whole interaction.
I can look at it in any text editor.
To make things easier to myself, I pretty-print JSON data, for better readability.
I can even add code to analyze the structure of the responses.

A fair number of improvements to https://github.com/kjk/notionapi library.
I found 2 forks that did some work but didn't submit PRs so I looked at the changes and merged most of them.
I also did some work on html conversion: support color for tags, support collection_view_page as root page, format dollar values in table cells correctly.

Recent improvements in presstige.io:
  • improvements to /dashboard page
    • per-page pop-up menu with basic actions
    • show page view count
  • implemented unpublishing of pages
  • implemented rendering of breadcrumbs Notion block
And bugfixes.

Project: presstige.io
Behind the scene changes of how we store information about users.
On one hand, feels like I did decent amount of work.
On the other hand, none of that is visible to users.

Project presstige.io
Added an option to unpublish a page. Deletes all data the page.

Improved a flow for publishing. Added /publish/ page which guides the user through publishing and provides feedback.
When Notion page id is not correct:
When correct page:

Fixed a small issue in notionapi: not showing title property twice in tables.
Notion tables are slightly weird in that they always have a title property which is a link to a page with the data.
This is fine except it doesn't seem possible to have a view that hides the title column, which makes it bad for using it as a regular table.

One of the futures I plan to add to presstige.io are completely custom templates so that people can completely customize how their published stuff looks like.
To make it possible for people to test their templates I'm starting to work on presstige_preview cmd-line app.
The idea is that you can cd to the directory with theme files and run presstige_preview ${pageID} to see how the page will be rendered in that theme.

Who's monitoring Twitter?
I'm working on presstige.io, a way to easily publish articles on the web.
By looking at http logs I can learn some things. One of those things is: who's monitoring twitter?
My methodology is simple: I post a tweet with a link to an article and see who's accessing the article shortly after.
The requests without a referer field are most likely bots spidering new content.
The bots I noticed are:
  • TrendsmapResolver (user agent Mozilla/5.0 (compatible; TrendsmapResolver/0.1))
  • Twitterbot/1.0
  • Applebot (user agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot))
  • Nuzzel
  • alexa.com (user agent Mozilla/5.0 (compatible; ia_archiver/1.0; +http://www.alexa.com/help/webmasters; crawler@alexa.com))
  • some entity with user agent Mozilla/5.0 (X11; U; Linux i686; hu; rv:1.8.1.2) Gecko/20070220 Firefox/52.0.1
  • paper.li (user agent Mozilla/5.0 (compatible; PaperLiBot/2.1; https://support.paper.li/entries/20023257-what-is-paper-li)
  • some non-descript bot with user agent python-requests/2.18.4

The joys of reverse engineering.
I wrote a library notionapi that allows accessing Notion content from Go and, among other things, converting the pages to HTML.
It's all based on reverse engineering and works quite well, except when it encounters pages with corner cases that I haven't yet implemented. Page 8c7286a684ae4c32a1b20142b5576748 is one such page.
And I'm fixing it.

A minor server emergency. Very accidentally I went to my web app http://www.apptranslator.org/. It wasn't responding.
I tried to ssh. Didn't work.
I looked into Digital Ocean dashboard and it showed CPU at 100%. Was it my code? Was it the system? No idea and since I couldn't login, I didn't find out.
Luckily a reboot fixed the problem.
I should have had monitoring, which I now added (I use https://uptimerobot.com/ and it's ok).
AppTranslator is a small app. I want to rewrite it to run on Google Cloud Run with Firestore backend, as it's perfect for that. Didn't find time for that yet.

I like to think I'm pretty good at not prematurely optimizing my code. Simplicity first.
And yet I just rewrote code that was over optimized and implemented a simpler version.
In Presstige I have https://presstige.io/populartoday page which lists pages with the largest amount of views today.
The stats are kept in a map[string]pageViewStats and the piece of code in question picks N (say, 25) pages with the largest view counts.
I figured that once I have lots of pages, it might take a while so I opted for somewhat complicated approach that uses a working array of up to N entries and keeps track of what is the lowest view count in the array. That way when I traverse the map, I don't have to consider pages with count lower than that.
The implementation worked, it was deployed to production.
After bit of thought I replaced it with much simpler implementation: construct an array with pageViewStats info for all pages, sort it by count and select the largest.
The computer I'm using is a 4 GB server and currently the memory use for the whole app is under 300 MB.
Even if I have millions of pages (which is a long way off at best and never at worst), it would require a mere few megabytes of (temporary) memory and sorting would probably be plenty fast.
Currently the simple version takes mere few milliseconds so there's plenty of room to grow before I'll have a need for a more sophisticated solution.
Updating...

Share on