Embedly http://blog.embed.ly Making embedding easy. posterous.com Tue, 15 May 2012 14:01:00 -0700 The Rise of Open Graph http://blog.embed.ly/open-graph http://blog.embed.ly/open-graph

42% of all URLs that Embedly processes have one or more Open Graph tags.

If you aren't familiar with Open Graph, it's the semantic metadata that Facebook introduced in 2010. Initially, it could only provide the title, image, and description for links and a few other objects, but it's been extended to power pretty much every third-party application in the stream. Yes, the special sauce that allowed Viddy and SocialCam to amass millions of users in days is Open Graph.

Recently on Quora, we were asked how Open Graph had affected us. So we added a few variables to our Statsd/Graphite setup. In this graph, the purple region represents links we've crawled that provide Open Graph metadata as a percentage of all links:

Open_graph_percent

Embedly's crawler doesn't go out looking for URLs. We only process URLs that have been shared through our API. You can then postulate that our Open Graph average is actually higher as the sites that are shared more are optimized to be shared in Facebook. All these graphs were generated over the last 36 hours, which is a sample size of 12 Million URLs.

Open_graph_types

By far the most popular tags are title, description, image, type and site_name. The Article/Blog type is the most prevalent tags out there.

Open_graph_video

Video is the most popular rich type. People are fairly good about setting width, height and type, but less have a secure_url.

Open_graph_audio

Audio is barely used. We do have far less audio providers than video, so this isn't a shock.

Open_graph_location

Location is used a bit more often than Audio. This one is surprising, because large providers like Foursquare don't use Open Graph location tags; instead, they have a special Facebook app syntax. We are going to look into this one a little more.

It's astonishing how quickly Facebook was able to affect metadata. Open Graph is trending up, so I assume this percentage will increase greatly over the next few years.

Open_graph_trends

The next post in this series will compare how different formats have been adopted. 

Sean

If you are interested in working with this data, checkout our jobs page.

 

 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Mon, 30 Apr 2012 16:10:00 -0700 Scaling http://blog.embed.ly/scaling http://blog.embed.ly/scaling

We've had a bad couple of weeks and we are the first to admit it. For April we had 99.62% uptime, which means that we were down for 2h 45m 5s. This is by far the worst month in awhile, latency has increased and we have a bunch of frustrated clients. I wanted to give you all the inside view of what's going on and what we are doing to fix it.

Defcon_3
While no one was looking Embedly has grown. Here are a few stats.

  • 1,200 URLs per second average with a peak of 4,600.
  • We will serve requests for ~2,500,000,000 URLs over the next month.
  • Team of 4 Engineers.

We on-boarded a few large clients that have quadrupled our unique traffic over the last 2 weeks and brought the pain train. Here is the nerdy, technical story of how Embedly scaled to handle the load, the tools that we used and why we went down. If you are a "Social Media Expert" or care about your Klout score, you should stop reading now.

At Embedly we measure uncached URLs per second (UUPS), as they are the bottleneck of the system. About a month ago we were doing about 25 UUPS, today that number stands between 100-125 with spikes up to 800 UUPS. Yes, I just made up a new measurement, but if Groupon can do it, so can I.

Before on-boarding these clients we did some initial load testing and it was very clear that our current system was not going to work. We had about 30 1 Gig boxes on Rackspace with 4 instances of a Tornado app running on each. This was excessive for traffic at the time, but we kept them up just in case. Here is what happened over the next 2 weeks.

4/11: Load Testing

Our first load test was a complete failure, but it was short, so we only fell over for a few minutes. Like any good startup, we threw more boxes at the problem. 60 1 Gig boxes here we come. To the cloud!

4/16: Load Testing

This one was sustained for a few hours and it cost Embedly about a half hour of down time. This was when we knew we were in trouble.

The first bottleneck was in the app servers that actually make requests and do the parsing. Tornado, like any async framework, is only as good as the time that you spend not blocking. Parsing large HTML documents and images means that each app blocks the IOLoop for a substantial amount of time. Because of this we were always memory bound, rather than CPU bound. Each thread could only do so much work and we couldn't push any more work through them.

Enter ZMQ. The only way we could push more work is to create more instances of Tornado onto each box. To do this we set up frontends and workers using the PUSH/PULL pattern in ZMQ.

No one likes queues because they create single points of failure. ZMQ is a little better, but the trade off is in configuration. If you have 30 workers across 30 boxes, everyone has to know about each other. In Embedly's case, that's about 1800 ports that need to stay semi static. We drop, create and get migrated by Rackspace so often that this wasn't feasible.

Instead we opted for larger boxes that contained 8 frontends and 30 workers on each. The frontends PUSH/PULL down to the workers and the workers PUB/SUB back to the frontends. This allows us to scale quicker without worrying about notifying existing frontends that new workers are available.

pyzmq comes with built in Tornado support. A quick ioloop.install() and then ZMQ can run off the same IOLoop that Tornado is running on.

4/19: Jimbo

Once we deployed this fix we were able to keep up with the load testing traffic, but then it became a game of Whac-A-Mole. All the supporting systems we had in place couldn't handle the load.

The first to go was Analytics. Our real time reporting process (Jimbo) is based on LogStash dumping logs into a Redis queue that workers pull off of that tells us how we are doing. That queue got backed up to about a million items, then died. We rely on Jimbo pretty heavily for health checks, so we were flying blind.

More workers, helped, but now we have abandoned Jimbo completely for about 16 88 lines of node.js, Statsd and Graphite. Jimbo gave us more insights, but maintaining it took time away from keeping the site up.

4/25: Cassandra

Next to die was Cassandra. I believe that this is mostly our fault, rather than the tool itself, but after about a terabyte of data we got a ton of unavailable exceptions from PyCassa. Each one of these errors cost us about 3 seconds of blocking time. Lowering timeout helped, but in reality we had too much data and not enough boxes. TTL also isn't working properly for us as well, hence why we have a terabyte of data in Cassandra.

Luckily Embedly's storage library (Coffer) is configurable so we can shut off writes and reads via config files. We took Cassandra out, life goes back to normal. We will eventually add Cassandra back in, as it gives us more permanent storage for things like RSS feeds and API payloads. We just won't be putting all our data in there forever anymore. At this point we are feeling pretty good.

4/27: Couchbase

This weekend Couchbase took a dive. We had a pretty good run with it, but when Couchbase got 60% full it died hard. We were simply saving too much data. At 60% Couchbase starts writing to disk and everything falls over.

We can't save the cluster at that time and need to bring up a new one. Saturday and Sunday we had 2 different Couchbase clusters, rotating traffic around them after one died. This might have been a new low. Literally the worst possible way to handle traffic that I know of. I hope you don't judge us for this one.

4/29: Fixed?

Sunday we finally fixed the issue by creating a 180 GB Couchbase cluster without replication. We also lowered cache time to 3 hours instead of 5 days. Our working set now fits into about 15% of capacity which seems to be a sweet spot. In Couchbase's defense it does handle 14,000 ops per second for us.

And that brings us up today. Defcon 3.

We obviously know that this isn't the solution. We could buy everyone in the company a car each month with what we spend on hosting. We do however need to make smarter choices about technology, caching and persistent storage.

We apologize for the issues. We are working on bettering the service everyday.

Going forward there there are a ton of optimizations we're plan on making. Async DNS, analytics, long term storage, multiple availability zones, faster image processing, a URL fetching service etc, etc. If any of the above interests you, we are hiring!

BTW, If you find yourself in this situation, strip everything down to the bare bones and get a big cache.

Thanks to Ben Darnell for helping us with blocking in Tornado and more importantly the team here that made it happen.

Sean

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Fri, 27 Apr 2012 14:45:00 -0700 Yesterday http://blog.embed.ly/yesterday http://blog.embed.ly/yesterday

We had a bad day yesterday and we are still waiting on some permission to publish a postmortem about what exactly happened. In the mean time, here are the basics.

  1. We got a spike.
  2. Things crashed.
  3. We hit our Rackspace API and RAM limits at the same time.
  4. Membase is corrupted.
  5. Traffic dies down.
  6. Rackspace ups our limit
  7. Things get better.

It was less than ideal for everyone involved. So while we wait, we have set up http://status.embed.ly to notify everyone of whats going on behind the scenes.

There is already a post on there (Analytics is down). 

We are also going to use it to announce smaller updates to the API like bug fixes and small enhancements as well.

It should be interesting, so follow along!

Sean 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Fri, 20 Apr 2012 14:45:00 -0700 Welcome Andy Pellett http://blog.embed.ly/welcome-andy-pellett http://blog.embed.ly/welcome-andy-pellett

Andy joined Embedly a few weeks ago. We told him we wouldn't announce it till he pushed his first major change to Embedly. That happened today.

Andy pushed a new more efficient way of pulling, parsing and saving images to obtain the correct meta data. This dramatically reduces the number of HTTP calls Embedly has to make. 

Andy grew up in Alaska and received his Bachelors and Masters from the University of Maine. He hates condiments and is a decent fisherman.

Anrope

If you notice that Embedly is a bit faster today, thank Andy.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Thu, 12 Apr 2012 13:48:00 -0700 oEmbed for Spotify http://blog.embed.ly/oembed-for-spotify http://blog.embed.ly/oembed-for-spotify

Spotify is the default music service around the office, so when they added embeds, we jumped all over it. They launched the "Spotify Play Button" with Tumblr, but it should be on every service. The branding is interesting, "embed" is only mentioned once in that post, where "Play Button" is mentioned 6 times.

Here is a Storify with example Spotify embeds. 

I have to say, it's pretty awesome. Press play on any track above, everything is in sync and it just works. 

Here is how to use it yourself:

oEmbed API call:

http://api.embed.ly/1/oembed?url=http%3A%2F%2Fopen.spotify.com%2Ftrack%2F6ol4...

Explorer View:

http://embed.ly/docs/explore/oembed?url=http%3A%2F%2Fopen.spotify.com%2Ftrack...

You can also use Embedly's Parrotfish plugin to see Spotify in Twitter or the Embedly Wordpress plugin for easy blogging.

Enjoy!

Sean

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Thu, 05 Apr 2012 18:21:00 -0700 Why no internetz, Boston? http://blog.embed.ly/why-no-internetz-boston http://blog.embed.ly/why-no-internetz-boston

We finally moved into our new Boston West End/North Station office space (blog post coming) on Portland St, March 2nd. After spending the first 6 months of Embedly in San Francisco, the next year in the Boston’s Innovation district on the Waterfront and then 10 months in the Cambridge Innovation Center (directly across from MIT), we decided to round out our tour of Boston and select office space in the heart of the City – right across the street from the legendary Boston Garden: home of  the Celtics and Bruins.  The area is currently being revived, not only because Embedly has moved in, but also as we are 2 blocks over from the new luxury 12-15 floor Archstone Apt buildings and about 3-4 blocks away from Government Center/Boston City Hall,  Fanuel Hall Marketplace, Suffolk County Courthouse, and the Financial District. We plan on being here for the foreseeable future, and the energy of new space and a new hire has been extremely productive for Embedly.

Art (me), was in charge of setting up internet. This is pretty important for a web-based company to have and in this day and age, every business should have a solid internet connection. Our research lead to Comcast and Verizon; both of which informed us that we were eligible for Cable and DSL, respectively.  This was music to our ears and it was great to know that we would be avoiding the misery of a DSL connection.  We immediately followed up with the New England Comcast business rep and ended up meeting with an onsite engineer to proceed with our setup. Despite our original conversations, the engineer came back to say that there was no way we were getting cable in our building.  Our initial excitement was quickly lost when he mentioned there was actually a draft construction plan in front of Boston officials. The plan is an estimated $60,000 in cost, that no one seems to want to agree on, with neither the City of Boston nor Comcast assuming any responsibility for this projects completion. It seems that our office office building containing 10+ businesses is not worthy of their consideration.

With options quickly disappearing, we were forced to take Verizon DSL.  Verizon and Comcast must be working together in this City splitting referrals because Verizon quickly fell into the ‘Over Promise, Under Deliver’ bucket. After starting off with a paltry 5Mbps, Verizon has made us jump through hoops to try to upgrade to the 10-15 Mbps "Fastest" plan. According to their phone sales reps we are about 1400 feet from their Verizon central office, which qualifies us for getting the upgraded speed. Unfortunately, this did not go as planned (see embedded pdf of our email conversation).  After the Verizon Boston central office offered us the upgrade, a week passed with no response from the Verizon side and it may not even be available!?!  We just don't get it. Bob has a 4G connection of 14Mbps down/4Mbps up on his cell phone. Why no bandwidth for businesses?

We find this whole issue ironic when you look at Boston’s push to attract businesses. The City of Boston wants us to innovate and to keep technology businesses in the City, but the services to allow us to do so are severely lacking. Lets make "good" internet available everywhere in the city, lets figure out a solution that allows companies to grow and "stay" in Boston. I am pretty sure San Francisco has about 10 different internet offerings for businesses at competitive prices. 

One more departing note -  our floor mates have propositioned us with a 100Mbps fiber line, at a pricey $2000/month. Really, thats my alternative? Our rent is barely that high. Get it together Boston.

 

Post sources:

* Email w/ Verizon

verizon_email_log.pdf Download this file

* Andy (the new guy):

Moved into his Boston West End Apt  and within 1 day had a 25 Mbps RCN connection.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1170988/645166728903.jpg http://posterous.com/users/4xg5qnL2rXcB Arthur Gibson agibson Arthur Gibson
Thu, 05 Apr 2012 10:04:00 -0700 We've Missed Provider Updates http://blog.embed.ly/weve-missed-provider-updates http://blog.embed.ly/weve-missed-provider-updates

We have not done a providers blog post in over 6 months, and really do miss finding some shiny new videos or images to present to you guys. Our provider queue is heavy with budding video startups who are even sending us links to videos hosted on localhost, but being early to the embedding game is a good thing.

We have a unique bunch to show you today: a Napster for photos, a professional social network, real-time video casting, and an E-Learning site.

Lets jump to it with a few examples:

* Tipi Trampoline from Pinterest.

Pinterest

Linkedin

* Spreecast with Embedly airing on Spreecast.

Spreecast

* Lesson on Circumference and Area from ShowMe.

Showme

Check out our Spreecast w/ Spreecast. Enjoy!

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1170988/645166728903.jpg http://posterous.com/users/4xg5qnL2rXcB Arthur Gibson agibson Arthur Gibson
Fri, 30 Mar 2012 11:18:00 -0700 Why blindly following meta tags is a bad idea. http://blog.embed.ly/expectation-why-embedly-does-not-always-use-m http://blog.embed.ly/expectation-why-embedly-does-not-always-use-m

We do something that is completely radical when it comes to a description of a page. We try to pick the best one. GASP!

Here is a hypothetical situation. For this html, what would the user expect the description to be?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<html>
  <head>
    <meta property="og:description"
          content="WIN A FREE IPAD: http://fake.net">
  </head>
  <body>
    <article>
      <p>
        This is a funny and insightful article that somehow got
        on this evil site that I would like to share with my
        friends.
      </p>
      <p>
        I would expect when I share this link that the first
        sentence of the article is the description.
      </p>
    </article>
  </body>
</html>

Embedly will pick the following excerpt:

"This is a funny and insightful article that somehow got on this evil site that I would like to share with my friends. I would expect when I share this link that the first sentence of the article is the description."

Facebook will pick:

"WIN A FREE IPAD: http://fake.net"

Google will pick:

"WIN A FREE IPAD: http://fake.net"

Though interestingly enough, Google will use: "This is a funny and insightful article that somehow got on this ..." as the title.

If you ever wondered why Embedly doesn't blindly follow meta tags, this is why.

Screen_shot_2012-03-30_at_10
Screen_shot_2012-03-30_at_10

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Mon, 13 Feb 2012 16:50:00 -0800 Embedly Challenge Results http://blog.embed.ly/embedly-challenge-results http://blog.embed.ly/embedly-challenge-results

On Friday, Embedly offered Hacker News a coding challenge. Apply.embed.ly asked developers to solve 3 different problems and submit their solutions. We didn't force people to apply for 1 of the 3 positions that we have open, just nerd out on some problems. We are going to talk about the results and the answers.

Here is a quick funnel of users to apply.embed.ly

1. Sum of Digits

Based on a question from Project Euler given the formula:

R(n) is the the sum of the digits for n!.
For example, 10! = 3628800
R(10) = 3 + 6 + 2 + 8 + 8 + 0 + 0 = 27.

We loved this question for the golf aspect. In python R(n) it can be written:

import math;
R = lambda x: sum(map(int, str(math.factorial(x))))

To actually solve it, almost everyone used a a brute force algorithm. Like so:

min([i for i in range(1000) if R(i) == 8001])

We got a total of 1008 distinct answers for this question. 758 were seen less than 2 times (some people tried to brute force the value)

The top three answers:

  1. 787 (992)
  2. 0 (384)
  3. 802 (105)

2. Standard Deviation of P tags

This one was a mess, when the problem was first put up we had a very large and invalid, random-generated HTML file. If you used the Chrome console, lxml, nokogiri or ran the html through Tidy you got the 'correct' answer. If you used a sax parser, the answers were much different.

After a few confused tweets, we allowed any answer between 0.5 and 2.0. We then simplified the html greatly. This allowed people to manually count the depths of each p tag or use the white space to determine the depth. This may have defeated the purpose, but ok internet, you win.

We got a total of 242 distinct answers for this question. 117 were seen less than 2 times.

The top 3 answers:

  1. 1.4 (335)
  2. 0.767 (164)
  3. 1.253 (101)

3. Zipf's Law.

We simplified Zipf's law to:

Z(x) = [x, x/2, x/3, x/4...]

This described the frequency distribution for words in a random body of text. Given that x = 2520 and a text of 900 unique words, how many words make up half the text?

This one got a little confusing too.

We can get the word count by using:

words = [2520/float(i) for i in range(1, 901)]
word_count = sum(words)

We can then iterate over the words till they are greater than 50% of the total word count.

min([i for i in range(30) if sum(words[:i]) > sum(words)/2.0])

It got a bit hairy when it came to rounding. We were in the wrong here by using float instead of integer because it doesn't make sense to have fractional word counts. We should have accepted 21 instead of 22.

The top 3 answers:

  1. 22 (450)
  2. 21 (204)
  3. 20 (120)

Hacking

We intentionally made it easy to hack apply.embed.ly. The url paths were /1 /2 /3 and every time you got a question right, we just added a cookie 'au_embedly_1=true' for the problem you solved. Only Will Pearson used this to his advantage and skipped a problem.

Standing Out.

Some notable examples of different ways people solved this.

  1. A couple people solved it in the Chrome console, no text editor needed.
  2. 6 minutes. The total time it took one college sophomore to solve it.
  3. Ruby one liners for all: https://gist.github.com/1792968/cbb3f5c22ff2e7d174734c780df87e8b9e85153e
  4. All in Mathematica: https://gist.github.com/1797321
  5. You can solve the first problem in J in 27 chars: "(+/ "1 f"0 !i.1000x) i.8001"
  6. A number of people used excel to solve a majority of the problems. There seems to be a lot of finance nerds lurking on HN.

 

Gists:

If you are interested in seeing the solutions everyone posted here you go. I embedded a gist of gists because I cannot for the life of me figure out how to get Posterous not to embed them.

https://gist.github.com/1820286

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Fri, 09 Dec 2011 13:31:00 -0800 #New#New Parrotfish - Twitter Plugin Released http://blog.embed.ly/newnew-parrotfish-twitter-plugin-released http://blog.embed.ly/newnew-parrotfish-twitter-plugin-released

We woke up yesterday with smiles on our faces and using our favorite chrome plugin (Parrotfish) . Then we got the news that the #new#new Twitter was released. Jaws dropped, tweets flew in, and profanities flew out. Our users expected results. @hotdogsladies tweeted that lovemaking was just not the same without Embedly. We concur.

For a brief moment we considered retiring Parrotfish. Surely in this latest release Twitter would have implemented embeds the way they should have from the beginning. Lucky for us, it appears to be the same crippled system that caused us to create Parrotfish in the first place.

So, off to the Batcave Sean and Bob went. Afterall, we do our best work with bats circling. Who doesn't? They (Sean and Bob, not the bats) woke up this morning, ready again to tread through the depths of a Twitter re-design, this time armed with some new toys that we have created over the last few months.

We now present to you the latest and greatest Parrotfish ready to conquer your timeline (the Twitter one, not the Facebook one):

New_new_parrotfish

  •  Enabled with SSL support for embeds and images. (Secure)
  •  Better favicons and logos.
  • Available in Chrome and Safari. (FF you're next)

Get it right away at Embedly Labs.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1170988/645166728903.jpg http://posterous.com/users/4xg5qnL2rXcB Arthur Gibson agibson Arthur Gibson
Thu, 08 Dec 2011 10:40:36 -0800 Embedly Hack Week http://blog.embed.ly/embedly-hack-week http://blog.embed.ly/embedly-hack-week

Last week we took a break from bug fixing, redesigning, and development. We held our own internal Hack Week: 4 developers, 4 completely different projects, all using or enhancing the Embedly service.

Embedlyflip
Tom spent the week developing a Flipboard clone, using Embedly. The iPad app connects to Facebook, pulls a user's news feed, sends that through the Embedly API using our iOS library, and displays the results. Tom really lucked out by finding the FlipView project on Github. That made it almost too easy to lay out the resulting embeds in a Flipboard-like experience.

Arthur spent the week adding more social features to Embedly. We want to be able to answer the question: "what's the most popular content on my site?" Arthur developed a Reddit-like voting system for embeds, that get tallied by us and displayed with the rest of our Analytics.

Rate_mate_demo

Bob created a web socket proxy for the Embedly API, developed using node.js, because Bob loves node.js. The proxy allows for truly asynchronous requests to the Embedly API, returning embeds as they finish instead of all at once. If anyone is interested, Bob will add documentation when he has some free time, probably during the next Embedly Hack Week.

Sean, the master of Chrome plugins, developed a super top secret Chrome plugin. We could tell you about it, but then we'd have to kill you. Or we could make you sign an NDA, but paperwork is messy.

We sometimes get too focused on business development, answering support tickets, and making sure the servers stay up and running. It's nice every once in a while to take a step back and reap what we've sown. We're constantly surprised with what we've managed to accomplish over the last two years.

We love to hear how others are using the Embedly API. Let us know in comments, and as always, we're available at support@embed.ly with any questions.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/698961/biopic.jpg http://posterous.com/users/4wubefeoogpj Tom Boetig tboetig Tom Boetig
Fri, 18 Nov 2011 14:05:19 -0800 Support, Right. http://blog.embed.ly/support-right http://blog.embed.ly/support-right

Support tickets are the bane of our existence and so are blog posts,  we do say that. In reality, support tickets have been a healthy way for us to grow. I cannot count how many features we've added just from listening to our users.

Generally, most of our requests come from developers, which can be resolved with code samples, a promise to fix, or a link to something in our docs.  We also receive non-developer requests that usually require us to ask lots of questions or blame it on Wordpress.

We are a heavy-engineering team. We believe in doing support the right way, "the way we want it":

  • Don't make someone wait for a 2 second fix, just do it (thanks, Nike).
  • Know your audience. Google them and respond appropriately.
  • Have a developer answer tickets.
  • Don't be an a-hole. It's hard, I know.

Send us tickets or requests to support@embed.ly, we will make you happy.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1170988/645166728903.jpg http://posterous.com/users/4xg5qnL2rXcB Arthur Gibson agibson Arthur Gibson
Wed, 16 Nov 2011 09:30:00 -0800 Bootleggers http://blog.embed.ly/bootleggers http://blog.embed.ly/bootleggers

In our opinion censorship on the web is never a good thing. While we are not here to get on the soapbox or try to instill values, SOPA is just bad for business. 

Embedly is a company that deals with millions of links a day. If you start censoring what people can link to, inevitably that hurts us. 

When a user embeds an infringing video, who is at risk? The site that embedded the video, the video hosting service or the delivery mechanism? I certainly don't want to find out. Maybe it will be like prohibition and Embedly will become the bootleggers of our time, transporting content from provider to publisher.

If you visit embed.ly today you will see a tiny band over our logo in solidarity for the cause.

Screen_shot_2011-11-16_at_8

We encourage you to read the bill, watch the video and visit americancensorship.org to learn more.

 

Sean

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Mon, 07 Nov 2011 10:33:00 -0800 Technical Implementation of the Embedly Usage Policy http://blog.embed.ly/implementing-embedly-usage-policy http://blog.embed.ly/implementing-embedly-usage-policy

In this post we will take an in depth look at how Embedly tracks API usage. Warning: this post is not for the faint of heart.

Overview

Our API consists of a cluster of tornado boxes behind two nginx load balancers. API usage is based on the nginx access log at the load balancer. This allows us to seperate tracking from the API.

Nginx logs are transported from the load balancer to the log queue. The log queue is simply a Redis list. Logs are transported to the queue with logstash. Logstash also parses the access logs into structured data, making it easier to process them later. We could have written all of our log processing as a logstash filter, but we have a lot of existing code that we wanted to use during the processing. Our existing code is written in python, so we have choosen to process logs in two steps, logstash being the first.

Drawing1

Processing

As you may imagine, the log processor pulls the logs from the queue to processes them. Our log processor is a modular system that does a number of things with the logs, such as anayltics and performance monitoring. This post covers API usage tracking. Below is some pseudocode to illustrate the process.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def process(log):
      api_key, urls, status = parse(log)

      if status in (200,404):

          rounded_timestamp = round(log.timestamp)
          url_ids = hash_urls(urls)
          customer_id = get_customer_id(api_key)

          zkey = "hourly::$customer_id::$rounded_timestamp"
          redis.multi()
          # count unique URLs for hour before adding new set
          start = redis.zcard(zkey)
          for url_id in url_ids:
              redis.zincrby(zkey, url_id, 1)
          # count unique URLs for the hour after adding new set
          end = redis.zcard(zkey)
          redis.exec()

          new_url_count = end - start

          pkey = "usage::$customer_id::$period_end"
          new_period_count = redis.incrby(pkey, new_url_count)
          old_period_count = new_period_count - new_url_count
          for threshold in thresholds:
              if new_period_count >= threshold and \
                      old_period_count < threshold:
                  threshold.action(customer_id)

It is worth explaining how the URL parameters are parsed. Each URL parameter needs to be canonized so that we don't count the same resource multiple times in the same hour. Not only do we have to deal with all sorts of URL ugliness, but we also must support redirected URLs, including link shortening services. Since our API has already processed the URL, the redirecting work is already complete, we just need to fetch it from our cache. First we canonize the URL, then we look in the cache to see if it redirects to any final URL, then we hash it.

Now that we have a customer ID, a rounded timestamp and a list of unique URL parameter hashes, it's finally time to track usage. The usage tracker is a simple integer that we increment as requests come in. URLs can only be incremented if the URL is unique for the hour, so another temporary data structure is needed to track hourly uniques. A Redis sorted set is used to track the hourly URLs. This allows us to not only keep track of how many unique URLs come in per hour, but also how many times each URL was requested. The sorted set's score is used to keep track of frequency. We can roll up this data when it is done being collected, and reuse it for analytics.

Triggers

There is a check to see if any interesting thresholds have been reached and if any actions need to be triggered as a result. When a customer reaches 80% and 100% of their monthly usage allowance, we send out an email notification letting you know. If the account is a free account and the 100% is reached, then the key is blocked until the end of the period.

Conclusion

If you made it this far, I'm proud of you. I hope this had shed some light on exactly how we track API usage. Thanks for reading!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/316649/opaopa.png http://posterous.com/users/37lsKWTjMVZD doki_pen doki_pen doki_pen
Mon, 31 Oct 2011 16:07:00 -0700 The Battle against Chargify Product Versions http://blog.embed.ly/the-battle-against-chargify-product-versions http://blog.embed.ly/the-battle-against-chargify-product-versions

Let's write about engineering shall we? I mean we do a lot of product work at Embedly and rarely do posts about the backend. Today I'm going to write about our battle against product versions within Chargify.

The-battle

We have used Chargify since January to power our recurring payments. This was before Recurly and Stripe came out with their JavaScript libraries that make us very jealous. PCI compliance and Authorize.net are nightmares.

Last week we updated our pricing structures to reflect a $1 price decrease. One late night of QVC and an analysis of everyone else's pricing plans convinced us that $19 had a better conversion rate than $20 (So far this has held true for us as well).

Making this update in Chargify was a simple of matter of changing the product pricing within Chargify, and like magic people were signing up for $19 plans. The problem came when we wanted everyone to pay the same price.

What Chargify doesn't explain well is that every time you change a product you create a new version of it. This makes sense. If a customer signs up for a $10 plan and then you up that plan to $20 you usually want to grandfather that customer in. However, if you want all your customers on the most recent version of the product, there is no easy way to do that.

There is a thread on the Chargify support forum that explains what you must do to migrate the subscriptions to the new plan:

Unfortunately, there's no direct way to change existing subscriptions to the latest version - but you can create a temp product, switch them to that and then switch them back. That will make sure they are on the latest version.

So for every single user, we would have to switch them manually over to a temp plan and switch them back via the Chargify admin. For the hundreds of paying customers we have this would take hours. No. Thank. You.

Instead, I opted to use the Chargify API. What's incredible is that for a given subscription, Chargify does not pass back the version of the product that the user is on; only the most current one. I'll say that again. There is no way to know what version of a product the user is subscribed to via the API. Holler.

I created a small script that does all the heavy lifting. You have to first create a testing plan that is $0 with the handle 'testing'. After that you can run the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
A simple script to update Chargify subscriptions to the most recent version.
"""
import base64
import requests
import json

API_KEY = 'Chargify API Key'
HOST = 'https://<Chargify Host Name>.chargify.com'

headers = {
    "Authorization": "Basic %s" % base64.encodestring('%s:%s' % (API_KEY, 'x'))[:-1],
    "User-Agent": "Migrate-to-Latest-Version",
    "Content-Type": "application/json"
}

def update_plan(id, handle):
    
    print 'Updating %s to %s' % (id, handle)

    d = {
        "subscription":{
            'product_handle' : handle
        }
    }
    data = json.dumps(d)
    url = '%s/subscriptions/%s.json' % (HOST, id)

    r = requests.put(url, data=data, headers=headers)
    
    if r.error:
        print 'Error: %s ID: %s HANDLE: %s' % (r.error, id, handle)


def update(pages=10):
    
    for p in range(1, pages+1):
        r = requests.get('%s/subscriptions.json?per_page=200&page=%s' % (HOST, p), headers=headers)
        data = json.loads(r.content)

        # No more subs:
        if not data:
            break

        # We only update the plans that were not canceled.
        subs = [s['subscription'] for s in data if s['subscription']['canceled_at'] is None]

        for s in subs:
            handle = s['product']['handle']
            id = s['id']
    
            #update to testing account.
            update_plan(id, 'testing')
    
            #update back to the right plan
            update_plan(id, handle)


#Spot checking and testing.
def update_one(id):
    sub = get_id(id)
    handle = sub['product']['handle']
        
    #update to testing account.
    update_plan(id, 'testing')
    
    #update back to the right plan
    update_plan(id, handle)

#Utils that help
def get_id(id):
    url = '%s/subscriptions/%s.json' % (HOST, id)
    r = requests.get(url, headers=headers)
    return json.loads(r.content)['subscription']
    
def get_all(pages=10):
    
    subs = []
    for p in range(1, pages+1):
        r = requests.get('%s/subscriptions.json?per_page=200&page=%s' % (HOST, p), headers=headers)
        data = json.loads(r.content)
        
        # No more subs:
        if not data:
            break

        subs.extend([s['subscription'] for s in data])

    return subs

if __name__ == '__main__':
    update()

Done! You have now updated everyone to the correct plan. The first time I ran this script a few customers did not get updated. I don't know why, but we ended up having to spot check every subscription to make sure that it had the correct version. Annoying, but it only took us about 15 minutes with two people on board.

Hopefully this will save someone else some time when trying to migrate subscriptions to the most current plan.

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Wed, 26 Oct 2011 10:06:00 -0700 Pricing Nip/Tuck http://blog.embed.ly/pricing-niptuck http://blog.embed.ly/pricing-niptuck

It's time for your quarterly dose of design news with me, Tom, He Who Makes Things Pretty.

It's been a few months since we last redesigned the website, and I'm getting antsy. I would probably get throttled if I decided to completely redo the site for a 3rd time, so I have to settle for updating individual pages.

First in my queue is our pricing page. Our old pricing page was a simple, utilitarian table of information. It got the job done, but lacked pizazz. Feast your eyes on our new pricing page in all it's soft-gradient glory. For those of you who love tables, we still have it here.

When coming to our new and improved pricing page, the first thing you might say to yourself is "hey, these plans are all a dollar cheaper than I'm paying now." Good news! Your plan just got a dollar cheaper. You're welcome.

The second thing you might notice is the addition of a $99 plan. You talked, we listened. We know that making the jump from a $20 to a $200 (or $19 to $199) plan is a large pill to swallow. Our new Basic plan sits happily in the middle, ready to take your calls. The features are the same as the $19 Starter plan, you just get more URL requests per month.

In addition to the pricing changes, we have pushed our Enterprise-level features to the front. We understand that Enterprise customers have specific needs. Contact us today and we can discuss how Embedly can help your business succeed.

Thanks to Assistly for the design inspiration. Pricing pages aren't the sexiest things to design, and theirs was leaps and bounds better than all the others we sourced.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/698961/biopic.jpg http://posterous.com/users/4wubefeoogpj Tom Boetig tboetig Tom Boetig
Thu, 20 Oct 2011 12:20:00 -0700 jQuery Preview http://blog.embed.ly/jquery-preview http://blog.embed.ly/jquery-preview

"This will save you a ton of time." - Everyone

jQuery Preview is an easy to use plugin that allows developers to create a URL submission tool using jQuery and Embedly. It looks something like this:

Jquery-preview

It's super easy to set up. All you need is a form, a little bit of Javascript and you can allow users to choose a thumbnail, edit metadata and embed content from over 200+ sources. It's f-ing amazing and to prove it here is all the code you need to implement it client side.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<html>
  <head>
    <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script>
    <script src="http://scripts.embed.ly/p/0.1/jquery.preview.full.min.js" type="text/javascript"></script>
    <link rel="stylesheet" href="http://scripts.embed.ly/p/0.1/css/preview.css" />
  </head>
  <body>
    <form action="/update" method="POST">
        <input id="url" type="text" name="url"/>
    </form>
    <script>
        $('#url').preview({key:'your_embedly_key'})
    </script>
  </body>
</html>

We put together a number of examples, so you should go check out the demo site. It has everything you need to get started.

Dive in, use it, it's awesome. For those interested, here is a bit more backstory.

Back in August we posted an EPIC, and I mean EPIC post on how to create a Facebook like URL submission tool with Embedly with EXT. It was a monster. 4,000+ words that was the authoritative masterpiece on creating an easy to use URL tool. It was so long that the average time on that page is almost 6 minutes, but guess what, no one used it.

I blame the fact that it wasn't packaged into a nice library and, well, it was written using EXT core. 

We used EXT because it was an example for a client, but this time around we decided to go with the most popular Javascript framework. We also mixed in a few tools that helped the process along. All the html is generated by the Mustache.js templating library and we use Underscore.js to add a bunch of utility functions.

Please let us know what you think. Here are a couple helpful links:

Code: https://github.com/embedly/jquery-preview

Documentation: http://embedly.github.com/jquery-preview/demo/

Issues: https://github.com/embedly/jquery-preview/issues

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley
Fri, 14 Oct 2011 06:22:00 -0700 Vimeo works with Embedly and Wordpress http://blog.embed.ly/vimeo-videos-work-with-embedly-and-wordpress http://blog.embed.ly/vimeo-videos-work-with-embedly-and-wordpress

We have heard that users are having trouble embedding Vimeo.com videos in their Wordpress blogs. The issue has been highlighted as a redirect not happening from "www.vimeo.com" to "vimeo.com" as that is hardcoded in Wordpress's core code. The nice thing we did with our Embedly Wordpress plugin is made all the providers and endpoints accessible through an API call that can be updated occasionally when a provider goes down or makes changes to its endpoint.

**Update: Wordpress has updated the core code, not sure when that propogates to local wordpress instances. If you still want to use Embedly you will get about 150 more embed providers out of the plugin.

We know Vimeo embeds work in our Embedly Wordpress plugin and encourage you to try it out. You can search "embedly" in your plugins section to download it .  Also, here is a quick video of using the plugin with a Vimeo video:

As usual you can contact us at support@embed.ly with any issues. Enjoy!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1170988/645166728903.jpg http://posterous.com/users/4xg5qnL2rXcB Arthur Gibson agibson Arthur Gibson
Tue, 13 Sep 2011 14:21:00 -0700 Cooking up Something New http://blog.embed.ly/cooking-up-something-new http://blog.embed.ly/cooking-up-something-new

Comingsoon

In the coming weeks we're launching something brand new. If you've been on the fence about trying Embedly, because you don't have the time or the resources to spend on integration, then this is your lucky day.

We're introducing Embedly Anywhere, the easiest and fastest way to get up and running on Embedly. Anywhere is a drop-in solution. Just include our one line of JavaScript in your site, and we take care of the rest. It's fully customizable, and themeable.

We are looking for a few people to beta test this new offering for us. Sign up here, and we'll send you a message when we have Anywhere up and running.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/698961/biopic.jpg http://posterous.com/users/4wubefeoogpj Tom Boetig tboetig Tom Boetig
Tue, 30 Aug 2011 04:51:00 -0700 Whitelisting IP Addresses http://blog.embed.ly/whitelisting-ip-addresses http://blog.embed.ly/whitelisting-ip-addresses

We wanted to share a quick feature update with developers. Embedly now lets you whitelist IP addresses through your Dashboard!

About a month ago we launched a feature that allowed developers to whitelist certain referrers to protect their API key from being used maliciously. Today we are going to let server side implementations of Embedly do the same thing. Just head to the Dashboard and you will see a new box on your homepage `Manage your IPs`.

Ip_dash

If you click `Manage` you will be taken to the IP Address management view that allows developers to input any IP Address they want to whitelist. These changes are instantaneous, so be careful! We don't want you bringing down production implementations, now do we?

Ip_manage

As with Referrers, we also allow you to test out your patterns through a handy pattern checker tool. It makes it simple to add patterns andd then verify that your IP Addresses will work.

Ip_test

That's it! Now get whitelisting. If you have any questions please contact us at support@embed.ly.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1242620/12444_538807393694_19300375_32185718_3725877_n.jpg http://posterous.com/users/4bhqr2CIva81 Sean Creeley screeley Sean Creeley