Author Archives: Andres

DARPA Shredder Challenge

The DARPA Shredder Challenge from the point of view of Goldsong – 10th place. Would you have been able to solve the “easy” portion of the challenge? Find out answering the questions in red.

The DARPA Shredder challenge

The Defense Advanced Research Project Agency – DARPA – was interested in understanding how people solve complex puzzles. They issued the DARPA Shredder Challenge, a contest to put together in 4 weeks, from Oct. 27 to Dec 2, 2011, five shredded documents, each one progressively more difficult than the previous one.

The pieces obtained after shredding the documents were arranged and photographed. The contestants got these images and had to extract the pieces from the images, orient them correctly and then rearranged them until the reconstructed portion of the original document was sufficient to answer a question (or questions) associated with the document. The contestants would get points only for answering the questions, not for assembling the document.

Initial page of puzzle 1

Initial page of puzzle 1

There were 9000 entries, 69 of which scored at least 1 of 50 possible points. I participated using the name Goldsong and finished in the 10-th place. This is a summary of my experience in the contest.


[columns count=”2″ gap=”1.0em”]

puzzle1_solution_optPuzzle 1

Like many other contestants, I found about the challenge in slashdot.com, after it had already started and when many people had already solved this first puzzle. This puzzle had about 200 chads, i.e., it was a warm-up.

With a late start and a simple puzzle, I had no qualms about doing it by hand, in the most unsophisticated way possible: I cut the chads from the tiff images in which they came and pasted them in Keynote, the Apple version of Power Point.

In retrospect, my selection of tool was poor. Keynote is a good presentation tool but it is no drawing program and thus, using it as a canvas to assemble the chads was inefficient. It took me about 20 hrs to assemble enough chads to be able to read the message of the page. This task would not have taken more than 8 hrs with a suitable tool.

The puzzles only needed to be completed to the point at which one could answer a question associated with it, and to explain how you figured out the answer. Personally, I found the questions to have a difficulty similar to that of solving the puzzles. For this puzzle, the question was What is the appropriate title being referenced? Well.. we can see the solved puzzle by clicking on the image.. What would have been your answer? We’ll discuss the results at the end of the article.
[/columns]


[columns count=”2″ gap=”1.0em”]

Challenge 2 - solutionPuzzle 2

The second puzzle had about 350 chads but unlike the first one, it had been written in lined-paper, with two ink colors and had a stain from a coffee mug, i.e., a lot of hints. Hence, it looked also pretty easy to do by hand. The time will come when I would need to put together a computational tool to help me assemble the puzzle but this was not it.

I continued using Keynote. Clearly it was not the best tool for this task: I could not zoom in and out of the canvas and, since the canvas was limited in size, I had to have multiple partially redundant canvases. Still, I had gotten better at using it so I spent less time pulling pieces into Keynote and more time pondering how to connect them.

There were many ways to approach this puzzle: assemble the coffee mug stain, assemble the borders, assemble the chads with red ink, etc. The beautiful part is that all of these approaches worked equally well. In the end, it took me 20 hrs to put it together. Indeed, a computational tool was not needed for this puzzle either.

The question for this puzzle was What is the deciphered message? Not so easy, right? Give it a try. Remember that assembling the puzzle gave you no points. Only answering the question gave you points.
[/columns]


[columns count=”2″ gap=”1.0em”]

Challenge 3 - solutionPuzzle 3

Puzzles 1 and 2 did not need a computational tool to solve them. Puzzles 4 and 5 definitely were going to need one. Puzzle 3 was an unknown. Should I solve it by hand or should I start to write the tool? Well.. there was a pattern that appeared useful.

Puzzle 1 had 150 useful chads and it me took about 10 hrs to solve it (or at least it should have). Puzzle 2 had 300 useful chads and took about 20 hrs. Puzzle 3 had about 600 useful chads. It appeared that the puzzles had been selected so that each one was twice as hard to solve as the previous one. If that was the case, puzzle 3 should take 40 hrs to solve by hand, pretty doable. Thus, I had my answer.. and Boy, was I wrong!

In the beginning it looked as if assembling the puzzle by hand had been the right choice. After 40 hrs I had assembled about 80% of the puzzle. However, I had no idea how to answer the questions associated with the puzzle. And each time it was more difficult to put pieces together.

Unlike puzzles 1 and 2, the message in this puzzle was a drawing, not a text. We can narrow posible characters that follow a partial word but we cannot do this with a drawing. Likewise, the drawing used only one ink color, not two, and was drawn on white paper, not lined one. Still, the worst part is that for longest time I could not figured out what it was a drawing of.

In the end it took me about 100 hrs to assemble enough pieces to answer the two questions of this puzzle: What is the indicated country? (easy) and What is the indicated city/town? (hard). Can you answer them out from the partially assembled puzzle?
[/columns]


[columns count=”2″ gap=”1.0em”]

Challenge 4 - solutionPuzzle 4

It did not look sensible to attack puzzle 4 by hand: too many chads, about 2300, most of them useful, and very few constraints. However, there were a few chads written in a distictive ink color, hand writing or orientation that stood out. Unbeknownst to the contestants, these few chads contained the answers to the question associated with the puzzle: What are the name and/or initials of the collaborators?

Presumably, the “collaborators” were the people that had written the message of the puzzle. We did not know how many they were. I put together a few of these distinctive chads and got ‘Dave’ and ‘George’ in a matter of a few hours. I also found a 3rd collaborator but I did not realize it, so I did not submit it as an answer and did not get credit for it. Can you spot it?

Likewise, I did not realize that all the answers stood out from the general content of the puzzle. Hence, I did not try to assemble the words with distinctive hand writing or orientation. Instead I started to work on the computational tool that seemed the only viable direction to solve puzzle 4. Big mistake! Although it was not possible to assemble the puzzle without the tool, it was completely feasible to answer the questions without assembling all the puzzle, using a manual method.

At this point in time I was 5th in the leaderboard. Unfortunatelly, life caught up with me, friends from San Francisco arrived to spend the Thanksgiving weekend with us and the hours that I had available to put in the challenge dwindled to nothing marking the end of my participation in the contest. I never got to give a try to puzzle 5. However, given that I did not have a computational tool in place, it was clear to me that I would not have been able to solve puzzle 5 on time.
[/columns]


[columns count=”2″ gap=”1.0em”]

Results

I got a total of 17 pts. I could have gotten 19 pts. if I had realized that I had an additional answer in puzzle 4. Thus, the highest that I could have ranked is 7th, very distant from the score of 50 pts. earned by the winning team.

Puzzle 1

2 pts: What is the appropriate title being referenced?

You needed to solve the whole puzzle because both “Litvak” and “1937” were needed references embeded in the text of the message. Anatole Litvak directed two movies in 1937: “Tovarich” and “The woman I loved”. The greeting in the message, “Comrade”, was the clue that “Tovarich” was the correct answer.

Puzzle 2

4 pts:What is the deciphered message?

I overthought this question. Initially, because of the message, I thought that the cipher was a simple substitution Caesar cipher. However, in spite that the Caesar cipher is a very simple one, the enciphering in the message was even simpler.

There are 6 numbers underlined in the text: 10, 23, 8, 24, 18 and 21. At the bottom of the message, they show you what to do with the first three. Continue the pattern with the last three. The first 3 numbers correspond to the characters “JGO”. The last 3 will correspond to “MEZ” Hence, the deciphered message is “JGOMEZ”

Puzzle 3

2 pts: What is the indicated country?
6 pts: What is the indicated city/town?

The country is easy enough to spot: Cuba. However, the city/town is very difficult to figure out. For a long time I thought that the coordinates in the message, which I had not been able to put together, were the coordinates of the city. It turns out that they were the coordinates of La Havana and Nassau, meant to clue you that the map was that of Cuba.

The city has to be deduced exclusively from the drawing. Initially, because of the dome, I thought that the drawing referred to a church. Eventually, I assembled enough chads to recognize the structure in the middle as a pedestal. From then on it was smooth sailing.

A pedestal implies a statue. An image search for statues in Cuba gave me one in which I could see, on the back, a band stand. The location was the Jose Marti park in Cienfuegos. A specific search for the park showed that the dome was that of the Cienfuegos City Hall. After the contest I found out that the drawing was based on the image of the Jose Marti park from Wikipedia.

Puzzle 4

12 pts:What are the name and/or initials of the collaborators?

I got 3 pts for ‘George’ and ‘Dave’. I would have gotten 2 more for ‘PK’. There were a total of 5 names. You would get 1, 3, 5, 8 or 12 if you found 1, 2, 3, 4 or 5 of the names, respectively.

[/columns]


[columns count=”2″ gap=”1.0em”]

Departing thoughts

The contest was a lot of fun and very challenging. I would have liked to find out about it before it started but I do not know how much that would have really benefited me. Also, in spite that I like to work alone, it was obvious that it would have been great to have been part of a team.

During the contest I was in touch with another team, the “Herded Rappers”, made of people from JPL. I know four of its six members and think very highly of them: Mike Burl, Aaron Kiely, Matt Klimesh and Sam Dolinar. They entered the contest even later than me and thus, when they asked me to join them, I was already in 5th place and declined the offer. However, eventually they did catch up with me and finished in 13th place. In retrospective, it would have made sense to join forces but even in that case it is unlikely that we would have won.

The “Rappers”, like the winning and many of the high ranking teams, had put together a computational tool that helped them select pieces. You could ask the tool things like “Show me all the pieces with the top 20% dark and the bottom 40% white”. This looked very useful but the level of automation is very low, i.e., all the decisions about how well a chad fits with another one still require a person.

[/columns]


[columns count=”2″ gap=”1.0em”]

The right approach: Constraint Satisfaction

Useful as the Rapper’s tool was, I was surprised that it was so different from what I had in mind, i.e., an automated tool based on constraint satisfaction. I was not expecting such tool to solve the problem but I was expecting it to find an approximate solution that could be useful. Thus, it could give you the most likely configurations of chads that are adjecent to a particular one. In this case, each chad constrains all the others and hopefully settles on a reasonable configuration. Although, I did not finish my tool, there was a team that wrote one in exactly this venue.

Andrew Gallagher and Aaron Deever participated in the DARPA Shredder Challenge – team EK55 – and placed 17th. They published a paper, Semi-automatic Assembly of Real Cross-cut Shredded Documents, in 2012 ICIP (IEEE Intl. Conf. Image Processing) in which they describe their approach. In a first step, they segment the pieces out of the image and rotate them to their correct orientation. This step, completely automatic, was also done by many other teams. However, they followed it by a second step based on constraint satisfaction: they paired chads whose boundaries shared ink strokes automatically! As expected, the algorithm not always got it right but they could verify the selection and modified it. With this tool they wre able to solve puzzles 1 and 2 and parts of puzzles 3 and 4. If there had been a 6th puzzle and a bit more time, I would have put my money on them to win the whole thing.

A few months after the DARPA Shredder Challenge finished, Gallagher published a second paper that extended the tool to solve puzzles in which the orientation of the piece was not known: Jigsaw Puzzles with Pieces of Unknown Orientation. The paper was presented in 2012 CVPR ( IEEE Intl. Conf. on Computer Vision and Pattern Recognition, pp. 382-389) and Gallagher kindly made the code available. His algorithm was used to solve puzzles with squared pieces, removing all hints given by the piece’s shape.
[/columns]


[columns count=”2″ gap=”1.0em”]

A critique to Crowd Sourcing

In my mind, there is no doubt that constraint satisfaction, ala Gallagher and Deever tool, is the only efective way to solve this type of puzzles on a consistent basis. My own approach of doing them by hand is cost-effective only as an alternative to writing the tool when there is a time constraint. However, if the tool already exists, it will always win against a manual approach and those who only automate chad extraction and orientation.

Crowdsourcing, much talked about during the contest as the ‘obvious’ approach to tackle this problem, has many problems: besides being subject to sabotage (as it became evident during the contest), it requires making sensible material public, or at least, widening the number of people that under other circumstances would have had access to it.

In addition, crowdsourcing only works if the members of the crowd are already good at solving the given problem, i.e., we just cannot tell 1000 people selected at random to work on the problem; we need 1000 people that are already good at problem solving and this is far from the average Joe. Just look at the distribution of people that were both attracted to the challenge and did well: university professors and researchers with a solid background in computer science and/or mathematics and, more often than not, familiar with computer vision and/or pattern recognition. This is not the archetype of the typical military personnel that might be available to ‘crowdsource’ the problem to.

Finally, by its very nature, the demographics that are good at solving problems of this type is a terrible crowd because they will lose interest, they will want to move on to new problems. Consider how many of the members of the top 100 teams will be willing to work in a crowd sourcing problem one time: many. How many of those will be willing to work on it two times, or three, or 100 times? Very few.
[/columns]


The Darpa Shredder Challenge

was an open contest that run from Oct. 27, 2011 to Dec 2, 2011. Although it finished, you can still download the puzzles from its original site at http://archive.darpa.mil/shredderchallenge/.

3-sum nzp algorithm

We introduce 3-sum nzp, an algorithm to solve 3-sum in quadratic time but about 300% faster than the fastest presently known algorithm found in the literature (as of September 2012). This post could also be used as a tutorial to explain how algorithm complexity affects performance.

You can download the totality of the code discussed in this post here, which includes all the algorithms and the running time tests (Python 2.7, 8kb)

Introduction

I first learned about this problem recently, while auditioning Coursera’s class on algorithms by Sedgewick and Wayne (sep. 2012). Sedgewick and Wayne use different versions of 3-sum to illustrate that the complexity of an algorithm is the more important factor to take into account when attacking a problem (versus the very common practice of using a slow algorithm and just throw money at it via using expensive hardware).

The 3-sum problem is so basic that I kept thinking that there had to be a better way to solve it than the presently accepted ‘best solution’ and, indeed, there was: the 3-sum-nzp algorithm presented here, about 300-400% faster than the current ‘fastest’ solution. On Sep. 9, 2012, I posted in the class’s forum a subset of the timings presented here.

We now go over the problem definition and over pretty much every solution of the 3-sum problem, starting with the brute force approach and ending with the new nzp algorithm. Finally, we present timing results for all the routines for entries of various sizes and present links to the related online references.

The 3-sum problem

The 3-sum problem is defined as follows:

“Given $%N$% distinct integers, what triples $%(A,B,C)$% are such that

$%A+B+C=0$%?”

Some of the variations of this problem are:

  • Is there at least one triple $%(A,B,C)$% such that $%A+B+C=0$%?
  • Is there a pair $%(A,B)$% that adds up to a third integer $%C$%, i.e., $%A+B=C$%?

In addition, there are problems that are 3-sum-hard, i.e., problems that can be linearly mapped to 3-sum so algorithms that solve it can be modified to solve the others. One such problem is 3-collinear, which is defined as:

“Given $%N$% distinct points on a plane, are there 3 points that lie in the same line?”

There are a number of algorithms that solve 3-sum in $%O( N^2\cdot \log N)$% or in $%O( N^2)$%. Whether there is an algorithm that can solve 3-sum in sub-quadratic time is an open problem; nobody has found one for the general case (no parallelism, no specific statistical distribution of the data) and it is widely believed that there is none.

Update (7/2015): In 2014, the original 3SUM conjecture was refuted by Allan Grønlund and Seth Pettie who showed a sub-quadratic algorithm that solves it (wikipedia).

In this post we present a discussion of some 3-sum algorithms and introduce a new one, 3-sum-nzp, that although also quadratic, has a small constant so it performs about 3.0-3.5x faster than the fastest known 3-sum algorithm. This is nothing to snub at because, other than finding a sub-quadratic algorithm, which would be a life achievement, the only thing that we can do is to improve on the constant of the running time, e.g., an $%\sim N/2$% algorithm is not going to allow us to solve orders-of-magnitude larger problems than an $%\sim N$% one would but it definitely is a better choice.

Brute force 3-sum $%- O(N^3)$%

This is the naive implementation of 3-sum; it simply loops over the array 3 times. Its order is cubic so it quickly loses its usefulness for anything other than benchmarking.

def sum3_brute_force(a):
  res = []
  N = len(a)

    # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
  for i in range(0, N-2):           # for i = 0 to N-3
    for j in range(i+1, N-1):       # for j = i+1 to N-2
      for k in range (j+1, N):      # for k = j+1 to N-1
        if a[i] + a[j] + a[k] == 0:
          res.append( [a[i], a[j], a[k]] )
  return res
									


3-sum with binary sort $%- O(N^2 \cdot \log N)$% average

This is a slight variation of the brute force version. First, we sort the array in $%O( N\cdot \log N)$% time with Merge sort or Quick sort. Then we iterate over the array twice to find the first two elements of the triple, $%a[i]$% and $%a[j]$%, in the same way done by the brute force algorithm; this process is $%O( N^2)$%. However, instead of searching for the third element of the triple using a third loop, like in the brute force approach, we look for it using a binary search over the already sorted array. Since the complexity of the binary search is $%O(\log N)$% we have a total complexity of $%O(N^2\cdot \log N)$%

def binary_search(a, key):
  lo, hi = 0, len(a)-1
  while (lo <= hi):
    mid = lo + (hi-lo)/2
    if key < a[mid]:    hi = mid - 1
    elif key > a[mid]:  lo = mid + 1
    else:               return mid
  return None

def sum3_with_sort(a):
  res = []
  N = len(a)
  a.sort()                          #  O(N log N)

  # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
  for i in range(0, N-2):           # for i = 0 to N-3
    for j in range(i+1, N-1):       # for j = i+1 to N-2
      val = -(a[i] + a[j])
      if a[i] < a[j] < val and binary_search( a, val ):
        res.append( [a[i], a[j], val] )
  return res
									


3-sum with hash table $%- O(N^2)$% average

We sped up the brute force version by realizing that if we already have two elements of the triple we could search for the third one in $%O(\log N)$% time using binary search. The hash table approach extends this: we can load the elements of the array in a hash table and then verify the existence of a third element $%a[k]$% such that $%a[i] + a[j] + a[k]=0$% in constant time. Again, the first two loops remain as they were in the brute force approach but the third loop is replaced, not with a $%O(\log N)$% binary search but with a $%O(1)$% hash table lookup:

def sum3_with_hash_table(a):
  res = []
  N = len(a)
  d = {}
  for i in range(N):
    d[a[i]] = 1
  a.sort()                          # O(N log N)

        # Python ranges exclude the last term, i.e., range(0,3) = [0,1,2]
  for i in range(0, N-2):           # for i = 0 to N-3
    for j in range(i+1, N-1):       # for j = i+1 to N-2
      val = -(a[i] + a[j])
      if a[i] < a[j] < val and val in d:
        res.append( [a[i], a[j], val] )
  return res
									

Python, Java and other modern languages include hash tables as standard data structures. In the sample code above, we used dictionaries, the hash tables of Python. We create a hash table $%d$% using the elements of the array as keys. We do not care about the values of the hash table, only about the hash table’s ability to verify membership in constant time. Thus, all the keys of the dictionary have a dummy value of 1. Alternatively, in Python, we could use sets that are also implemented using hash tables and thus, also have a membership checking of $%O(1)$%.

3-sum quadratic time $%- O(N^2)$% for average and worst cases

This was the fastest known 3-sum algorithm. It keeps two pointers that move from the extremes of the array toward the center, bracketing possible triples. The following algorithm is that shown in Wikipedia’s article about 3-sum, modified so that it will return all the triples that add up to zero (Wikipedia has the existence version of 3-sum, i.e., it returns immediately after it finds a triple that adds up to zero):

def sum3_quad(S):
  res = []
  n = len(S)
  S.sort()
  for i in xrange(0,n-2):  # for i=0 to n-3 do:
    a = S[i]
    k = i+1
    l = n-1
    while k<l:
      b = S[k]
      c = S[l]
      if a+b+c == 0:
        res.append([a,b,c])
        l = l-1
        k = k+1
      elif a+b+c > 0:
        l = l-1
      else:
        k = k+1
  return res
									


3-sum quadratic nzp $%- O(N^2)$% for average and worst cases

Presently it is widely believed that 3-sum’s lower bound for the general case is quadratic and that, in particular, the algorithm just shown is the fastest known. This does not mean that we cannot find a faster algorithm; only that this faster algorithm would also have a quadratic performance. Here we present such an algorithm that, in general, is 350% faster than the one described in Wikipedia.

We are interested in triples $%(A, B, C)$% such that $$A+B+C=0$$

Like in the previous algorithms, we start by sorting the array and adding its elements to a hash table. We then divide the problem in 2 parts: finding the triples that contain a zero as one of its elements and then finding those that don’t. These two cases cover all the possible cases because, since all the elements of the array are distinct, we cannot have a triple that has more than one zero.

The triple has a 0

We first address the case in which one of the elements of the triple $%(A,B,C)$% is zero. By force, this element lies between the other two, i.e., if, say, $%B=0$%, then $%A = -C$%. Thus, to find triples of the form $%(A, 0, C)$% we iterate over the positive values of the array, in linear time, and find, in constant time, whether their negatives are in the array by checking their membership in the hash table. If there are more positive numbers than negative ones, we iterate over the negative ones instead. The running time of this process is

  • $%O(1)$% in the best case, when there is a single positive (or negative value)
  • $%\sim N/2$% in the worst case, when there are the same number of positive and negative entries in the array.

Let’s see this with an example. We start with the following set with $%N=8$% entries:

$$a = [30, -40, -20, -10, 40, 0, 10, 5]$$

First we sort the array and we mark 3 locations: that of the last negative value, $%n$%, of zero, $%z$%, and of the first positive value, $%p$%. Thus, if $%0$% is an element of the array then $%z = n+1$% and $%p = z+1 = n+2$%; otherwise, $%z$% is undefined and $%p = n+1$%.

a = [-40, -20, -10, 0, 5, 10, 30, 40]
      ^         ^   ^  ^          ^
      0         n   z  p          N-1
									

In our example we have $%n=2$%, $%z=3$% and $%p=4$%.

For purposes of both the implementation and performance, we rearrange the negative numbers so they are sorted in descending order, i.e., now both positive and negative numbers are sorted in the ascending value of their magnitudes. This change does not alter the values of $%n$%, $%z$% or $%p$%:

a = [-10, -20, -40, 0, 5, 10, 30, 40]
      ^         ^   ^  ^          ^
      0         n   z  p          N-1
									

Now we execute the first step, namely the discovery of the triples that contain a zero. Since there are less negative numbers than positive ones, we iterate over the negative values of the array, from $%0$% to $%n$%, where $%n$% is largest index of a negative value. For each such negative number $%A$% we check for the membership of $%-A$% in the hash table. If this is the case then $%(A, 0, -A)$% is a triple that adds up to zero and we can add it to the result. The partial result at the end of this step is:

$$res = [ [-10, 0, 10], [-40, 0, 40] ]$$

The triple does not have a 0

Now we consider the case in which the triple does not have a zero as one of its elements. We are going to split this problem in two: first, finding the triples whose largest element in absolute terms is positive and then, finding those triples in which it is negative.

Hence, first consider the case in which the largest value of the triple in absolute terms is positive, i.e., $$|A|>|B|>|C|$$ and $%A>0$%. Since $%0$% is not in the triple, then both $%B$% and $%C$% have to be negatives. Now consider the possible values of $%C$%. $%C$% cannot be smaller than $%-A/2$% because if it were, $%B$% would have to be larger than $%-A/2$%. However, this contradicts that $%|B|>|C|$%. In addition, $%B$% cannot be equal to $%C$% because all the integers of the array are distinct. Hence, both $%B$% and $%C$% are negative numbers such that

$$-A \lt B \lt -A/2 \lt C \lt 0$$.

To give a numerical example, if $%A=5$%, $%C$% has to be larger than $%-A/2= -2.5$%, i.e., $%C \in \{-1,-2\}$%, and $%B$% has to be smaller than $%-A/2$% but larger than $%-A$%, i.e., $%B \in \{-3,-4\}$%.

Since the negative numbers are sorted in ascending order of magnitudes, then we know that if there is a triple $%(A,B,C)$% that has an $%A>0$% then $%C$% is to be found at the beginning of the array and it will have a magnitude larger than $%-A/2$%. Thus, we iterate over such possible values of $%C$% and, in constant time, verify the membership of the remaining member of the triple, i.e., $%B=-(A+C)$%.

Notice that it is here, in this second iteration, where the quadratic factor creeps in. Going over all the positive numbers has a cost of $%\sim N/2$% in the worst case, when there are the same number of positive and negative numbers in the array. Going over the negative numbers, although quadratic, is inexpensive because we only iterate over the negative values larger than $%-A/2$%. This makes the search efficient.

Continuing with our example, we first check for triples in which $%A = 5$%. Now we iterate over the negative numbers, starting with $%C = -10$% but $$-10 < -A/2 = -2.5$$ and thus no possible triple can have both $%5$% and $%-10$% under the assumption that $%5$% is the element of the triple with the largest magnitude. Thus, instead of iterating over all the negative values of the array, we iterated over a single one.

The same thing happens when we consider the next positive value, i.e., $%A = 10$%. The first possible negative value is, again, $%C = -10$% but $$-10 < -A/2 = -5$$ and thus no possible triple that adds to zero can have both $%-10$% and $%10$% (unless of course, one of the elements of the triple is $%0$% in which case, the triple would had already been found in the first step).

Now we consider the next positive value, i.e., $%A=30$%. The first possible negative match is $%C=-10$% which is larger that $%-A /2 = -15$%. Thus we look up, in constant time, whether the hash table has a key equal to $$B = -(A+C) = -(30-10) = -20$$ and, indeed, it does so $%(A, B, C) = (30, -10, -20)$% is added to the result.

After iterating over the positive values and adding the results to those obtained when checking for triples with zeroes, we have:

$$res = [ [-10, 0, 10], [-40, 0, 40], [30, -10, -20]]$$

The third (and last) step is the dual of the second step: the case in which the number with the largest absolute value of the triple is negative. This step is exactly the same as the second step, simply exchanging the roles of the positive and the negative numbers, i.e., we iterate over the negative numbers and for each of them, iterate over the positive numbers as long as they are smaller than $%-A/2$%, and search for the remaining value using the keys of the hash table. The result after this last step is:

$$res = [ [-10, 0, 10], [-40, 0, 40], [30, -10, -20], [-40, 30, 10]]$$

This concludes the search.

Sample code

The following code, easy to port to languages like C that do not have iterators, has an average speed up over the other quadratic algorithms of about 3x. Hash tables are not a standard data type in C but they are not difficult to write.

def signum(x):
    return (x > 0) - (x < 0)

def binary_search_2(a, key):
  """ searches an array 'a' for a value in log(n) time
  returns whether it found the item and the last array pos used"""
  lo, hi = 0, len(a)-1
  while (lo <= hi):
    mid = lo + (hi-lo)/2
    if key < a[mid]:    hi = mid - 1
    elif key > a[mid]:  lo = mid + 1
    else:               return True, mid
  return False, mid

def sum3_nzp(a):
  """ nzp without iterators (easy to port to, say, C)"""
  res = []
  a.sort()

  # fast rejection
  if ( a[0]*a[-1] == 0 ) or ( signum(a[0]) == signum(a[-1]) ):
    return res

  d = {}
  for i in xrange(N):
    d[a[i]] = 1

  zero_found, i = binary_search_2(a, 0)
  if zero_found:  n, p = i-1, i+1
  elif a[i] < 0:  n, p = i, i+1
  else:           n, p = i-1, i

  # reverse negatives
  for i in xrange((n+1)/2):
    a[i], a[n-i] = a[n-i], a[i]

  # first deal with zero case
  if zero_found:
    num_neg = n+1
    num_pos = len(a) - num_neg
    if zero_found:
      num_pos -= 1

    if num_neg <= num_pos:
      l, h = 0, n+1
    else:
      l, h = p, N
    for i in xrange(l, h):
      if -a[i] in d:
        res.append([a[i], 0, -a[i]])

  # now assume |A| > |B| > |C| and A > 0.. then B < 0 and C < 0
  for i in xrange(p,N):
    A = a[i]
    bound = A/2.
    for j in xrange(0, n+1):
      C = a[j]
      if -C < bound:
        B = -(A + C)
        if B in d:
          res.append([A, B, C])
      else:
        break

  # now assume that |A| > |B| > |C| and A < 0.. then B > 0 and C > 0
  for i in xrange(0, n+1):
    A = a[i]
    bound = -A/2.
    for j in xrange(p,N):
      C = a[j]
      if C < bound:
        B = -(A + C)
        if B in d:
          res.append([A, B, C])
      else:
        break

  return res
									

The auxiliary signum() and binary_search_2() functions used in this version are also used in the following version that uses iterators. Languages like Java and Python that support them can do away with the indexing of the arrays and marginally improve the performance to about 3.5x.

def sum3_nzp(a):
  res = []
  a.sort()

  # fast rejection
  if ( a[0]*a[-1] == 0 ) or ( signum(a[0]) == signum(a[-1]) ):
    return res

  # using set to check for membership (same as hash table)
  d =set(a)

  zero_found, i = binary_search_2(a, 0)
  if zero_found:  n, p = i-1, i+1
  elif a[i] < 0:  n, p = i, i+1
  else:           n, p = i-1, i

  neg = a[:n+1]
  neg.reverse()    # reverse negatives
  pos = a[p:]

  # first deal with zero case
  if zero_found:
    if len(neg)  <= len(pos):  v = neg
    else:            v = pos
    for A in v:
      if -A in d:
        res.append([A, 0, -A])

  # now assume |A| > |B| > |C| and A > 0.. then B < 0 and C < 0
  for A in pos:
    bound = A/2.
    for C in neg:
      if -C < bound:
        if -(A+C) in d:
          res.append([A, -(A+C), C])
      else:
        break

  # now assume that |A| > |B| > |C| and A < 0.. then B > 0 and C > 0
  for A in neg:
    bound = -A/2.
    for C in pos:
      if C < bound:
        if -(A+C) in d:
          res.append([A, -(A+C), C])
      else:
        break

  return res
									


Experimental results

The following are timings of runs with arrays with up to 320 elements; both the brute force and the binary sort versions take several minutes to complete with arrays not much bigger than those. These timings were obtained in a 2.4 GHz dual core MacPro with 4 Gb of RAM and 4 Mb of L2 cache, averaged over 100 runs using arrays with uniformly-distributed random data.

Below the timings we have the corresponding estimate of the running time of each algorithm: that of the brute force 3-sum is cubic, i.e., ∼N∼2.8, and that of the 3-sum based on the binary sort is slightly above quadratic, i.e., ∼N∼2.1, which is what is expected because its order is $%O(N\cdot \log N)$%. The solutions of all the algorithms are checked against those of the brute force algorithm. The running time of the quadratic algorithms are presented as reference but the size of the arrays is too small to generate confidence; we’ll improve them in the following section when we use large arrays.

        brute f. w/ binsort w/ hash t.   quadrat.  quad. nzp speedup
   10  1.332e-04  1.240e-04  6.053e-05  4.687e-05  4.069e-05  1.2x
   20  7.839e-04  5.628e-04  1.902e-04  1.696e-04  8.370e-05  2.0x
   40  5.221e-03  2.383e-03  6.823e-04  6.679e-04  2.336e-04  2.9x
   80  3.753e-02  1.031e-02  2.551e-03  2.618e-03  7.524e-04  3.5x
  160  2.804e-01  4.581e-02  9.890e-03  1.045e-02  2.703e-03  3.9x
  320  2.231e+00  2.003e-01  4.054e-02  4.281e-02  1.121e-02  3.8x

    brute f.:  T(N)=1.798e-07 x N ^ 2.813
  w/ binsort:  T(N)=9.374e-07 x N ^ 2.127
  w/ hash t.:  T(N)=7.062e-07 x N ^ 1.884
    quadrat.:  T(N)=4.766e-07 x N ^ 1.971
   quad. nzp:  T(N)=6.961e-07 x N ^ 1.636

									

Next we have timings with large arrays, with thousands and tens of thousands of entries. We are no longer computing the timings for the brute force and the binary-sort-based versions; they take too long. However, now that we are using large arrays we can get reliable running times for the quadratic algorithms. The following timings were averaged over 20 runs.

      w/ hash t.   quadrat.  quad. nzp speedup
   10  6.165e-05  4.745e-05  3.960e-05  1.2x
   20  1.884e-04  1.718e-04  8.515e-05  2.0x
   40  6.581e-04  6.553e-04  2.169e-04  3.0x
   80  2.555e-03  2.584e-03  7.543e-04  3.4x
  160  9.699e-03  1.032e-02  2.591e-03  4.0x
  320  3.938e-02  4.261e-02  1.105e-02  3.9x
  640  1.592e-01  1.732e-01  4.283e-02  4.0x
 1280  6.710e-01  7.234e-01  1.888e-01  3.8x
 2560  2.861e+00  3.018e+00  8.962e-01  3.4x
 5120  1.154e+01  1.266e+01  3.941e+00  3.2x
10240  4.897e+01  5.216e+01  1.648e+01  3.2x
20480  2.363e+02  2.329e+02  8.252e+01  2.8x

  w/ hash t.:  T(N)=4.409e-07 x N ^ 2.000
    quadrat.:  T(N)=3.819e-07 x N ^ 2.026
   quad. nzp:  T(N)=1.933e-07 x N ^ 1.957
									

As expected, both the version of the 3-sum based on the hash table and the quadratic version from Wikipedia show a quadratic running time, i.e., ∼N∼2.0. The 3-sum nzp algorithm is also quadratic but its coefficient, i.e., ∼2.0e-7, is half of that of the other two, i.e., ∼4.0e-7. This translates into a consistent 3x speedup over the other two quadratic versions, i.e., a 300% speedup.

In practice the situation is even better than this. Consider a usual algorithm problem, e.g., sorting: the running time of the algorithm is primarily a function of the size of the input data. We can have 10, 1000 or a trillion numbers to sort, i.e., the size of the input array is not constrained. However, 3-sum is an unusual problem because its definition constrains the size of the array. If the data type that we are using is 8-bit numbers (i.e., type char), then the maximum size of the input array is 256 (i.e., from -127 to 128) because of the constraint that all the numbers of the array have to be distinct, i.e., it is just not possible to have arrays with more than 256 chars where all the values are distinct. Likewise, if the data corresponds to pixels on a screen, with a typical linear resolution of, say, about 1500 pixels, then the maximum size of the array is about 1500 elements. Moreover, these are maximums for which every element of the domain is in the array; in the average case, if we are dealing with chars we are likely to have fewer than 256 elements in the array.

With this in mind, we could consider typical cases as having arrays with hundreds to thousands of elements due to the requirement on the uniqueness of each data point. As we can see from the timings, 3-sum-nzp beats the other two quadratic algorithms for any array size, large or small. However, it is around the range of hundreds to thousands that 3-sum nzp has its best relative performance of close to 4x. In this range, the time spent searching for triples is an excellent tradeoff of both algorithm overhead and limitations on resources, e.g., due to the creation of the hash table and sorting, which lowers performance for very small arrays, and to cache misses incurred when comparing widely different locations of an array, which lowers performance for very large ones.

Related links

Reductions, Robert Sedgewick and Kevin Wayne

Algorithms, Robert Sedgewick and Kevin Wayne, chapter 1 – Fundamentals

Visibility graphs and 3-sum, Michael Hoffman

A survey of 3-sum-hard problems, James King

3Sum, wikipedia


NZP – a new quadratic 3-sum algorithm

This post is an extended version of the following original post:

Forum of Algorithms I, by Robert Sedgewick and Kevin Wayne offered under Coursera
First publication date: Sun 9 Sep 2012 11:56:17 PM PDT
Link: https://class.coursera.org/algs4partI-2012-001/forum/thread?thread_id=2043





Latex Symbols

This is primer and reference sheet for LaTeX focused on the symbols and commands used in online forums, like those of Coursera or Udacity, and when writing technical posts in content management systems, like WordPress, Joomla or Drupal.

LaTeX for web pages and forums

LaTeX is a set of commands to write beautiful pages developed by Leslie Lamport on top TeX developed by Donald Knuth. It is the default submission format for many conferences and journal papers. LaTeX is a complete layout and typographical system: it can format pages, manage graphs and pictures, write bibliographies.

However, there is already plenty of control of style in web pages via HTML, CSS and pre-formatted templates so LaTeX has a discrete role in their overall appearance; presently, its main role is to provide the commands to write beautiful math symbols, which are in much demand in technical forums and posts.

 

In-line vs. Own-line

The LaTex commands are written within the text, surrounded by a set of characters. These characters vary from place to place but usually are something like $, $%, $$ or similar.

There are two ways to add LaTeX to our writing:

1. inline, as part of the text as in:

..and that is why $%E = m\cdot c^2$% defines..

which we write as

..and that is why $%E = m\cdot c^2$% defines..

2. In its own line, as in:

$$ E=m\cdot c^2 $$

which we write as

 $$ E=m\cdot c^2 $$

In this writing we will use $% for inline commands and $$ for own-line ones. To post in:

  • Udacity: use $% for inlines and $$ for own-lines.
  • Coursera: use $$ for inlines and $$ for own-lines

 

Printing $ in inline text

A character much used for inline latex is a single $. This works well unless there happens to be a dollar sign in the text that LaTeX will then mistake for the beginning of an inline expression, e.g.,

..$%30 dollars or even $%40..

when we really were writing normal text:

..$30 dollars or even $40..

If we are having this problem is because the site that we are using set the default inline characters to $, a not recommended but unfortunately very common practice. If the site or forum is out of our control, the best we can do is figure out how to print the $ the best we can. The following are partial solutions:

  1. write the text of the dollar sign in a pre-formatted text, as shown in the example above. The problem is that pre-formatted text appears in a box.
  2. write the dollar sign in Latex, i.e., $\$$ which results in $%\\$$%.
  3. write the dollar sign in emphasized text, i.e., <em>$</em> which results in $

If the site is our own, we can do better by not setting a single $ as the default. Two possibilities that work well are:

  1. use the default used by Udacity, i.e., $%
  2. use another unusual combination, e.g., \$

For example, in this site we use the Simple Mathjax plugin which offers the option to change the sequence for inlined LaTeX. To set this sequence to that of Udacity, i.e., $%, set the custom Mathjax config option to

MathJax.Hub.Config({tex2jax: {inlineMath: [[\'$%\',\'$%\'], [\'\\\\(\',\'\\\\)\']]}});

 

Color and highlighting

..the $%\color{red}{Republican}$% and the $%\color{blue}{Democrat}$% parties who..

..the $%\color{red}{Republican}$% and the $%\color{blue}{Democrat}$% parties who..

..and Einstein, who in $%\bbox[yellow]{\mbox{1905}}$% published four papers..

..and Einstein, who in $%\bbox[yellow]{\mbox{1905}}$% published four papers..

 

The \mbox command

It is used to add normal text to Latex expressions:

$%\mbox{Energy} = m\cdot c^2$%

$%\mbox{Energy} = m\cdot c^2$%

 

Predefined function names

These are functions that LaTeX writes in normal text, without needing the \mbox command.
$$
\begin{array}{lllllllllllllll}
\mbox{\arccos} &\hspace{.01in} &
\mbox{\cos} &\hspace{.01in} &
\mbox{\csc} &\hspace{.01in} &
\mbox{\exp} &\hspace{.01in} &
\mbox{\ker} &\hspace{.01in} &
\mbox{\limsup} &\hspace{.01in} &
\mbox{\min} &\hspace{.01in} &
\mbox{\sinh}\\
\mbox{\arcsin} &\hspace{.01in} &
\mbox{\cosh} &\hspace{.01in} &
\mbox{\deg} &\hspace{.01in} &
\mbox{\gdc} &\hspace{.01in} &
\mbox{\lg} &\hspace{.01in} &
\mbox{\ln} &\hspace{.01in} &
\mbox{\Pr} &\hspace{.01in} &
\mbox{\sup}\\
\mbox{\arctan} &\hspace{.01in} &
\mbox{\cot} &\hspace{.01in} &
\mbox{\det} &\hspace{.01in} &
\mbox{\hom} &\hspace{.01in} &
\mbox{\lim} &\hspace{.01in} &
\mbox{\log} &\hspace{.01in} &
\mbox{\sec} &\hspace{.01in} &
\mbox{\tan}\\
\mbox{\arg} &\hspace{.01in} &
\mbox{\coth} &\hspace{.01in} &
\mbox{\dim} &\hspace{.01in} &
\mbox{\inf} &\hspace{.01in} &
\mbox{\liminf} &\hspace{.01in} &
\mbox{\max} &\hspace{.01in} &
\mbox{\sin} &\hspace{.01in} &
\mbox{\tanh}
\end{array}
$$

In addition, we have mod that has two forms: as an operator and as a function, producing parenthesis around itself and its arguments:

$$
a \bmod b \hspace{1.0in} \mbox{and} \hspace{1.0in} \pmod{a+b}
$$$

a \bmod b \hspace{1.0in} \mbox{and} \hspace{1.0in} \pmod{a+b}

 

Tables

We can write a table using the array command followed by the format that each column, e.g., \begin{array}{l,c,r} will start a table with three columns, in which the first, second and third values will be printed to the left, center, and to the right of their respective columns. Columns are separated by &; lines are separated by \\.

$$
\begin{array}{lcr}
\mbox{Johnathan Doe} & \mbox{Washington} & \$1023.95\\
\mbox{Jane Doe} & \mbox{Denver} & \$248.96 \\
\mbox{Kitty Doe} & \mbox{Lake Tahoe} & \$8.96
\end{array}
$$$

$$
\begin{array}{lcr}
\mbox{Johnathan Doe} & \mbox{Washington} & \$1023.95\\
\mbox{Jane Doe} & \mbox{Denver} & \$248.96 \\
\mbox{Kitty Doe} & \mbox{Lake Tahoe} & \$8.96
\end{array}
$$

Usually lines are separated by \\ but in some forums, lines are separated by \\\ or even \\\\.

 

Spacing and dots

Spacing: We can insert small negative spaces with the \! command and thin, medium or thick spaces with \, \: and \;, respectively. Spaces of any size are inserted with the \hspace{} command:

$$ T\!\!\!\!\!X\hspace{0.5in}|\,|\hspace{0.5in}|\:|\hspace{0.5in}|\;| $$$

$$ T\!\!\!\!\!X\hspace{0.5in}|\,|\hspace{0.5in}|\:|\hspace{0.5in}|\;| $$

Single dot, commonly used to indicate a product, is written with \cdot:
$$ \vec{a} = \vec{b}\cdot\vec{c} $$$

$$ \vec{a} = \vec{b}\cdot\vec{c} $$

Horizontal dots: both at the bottom of the line, \ldots, or centered about it, \cdots:

$$ x = \{1, 2, \ldots, n\} \hspace{1.0in} x! = 1 \times 2 \times \cdots \times n $$$

$$ x = \{1, 2, \ldots, n\} \hspace{1.0in} x! = 1 \times 2 \times \cdots \times n $$

Diagonal and vertical dots: written with \ddots and \vdots, are useful for writing matrices:

$$
I = \left|
\begin{array}{ccc}
1 & \cdots & 0\\
\vdots & \ddots & \vdots\\
0 & \cdots & 1
\end{array}
\right|
$$$

$$
I = \left|
\begin{array}{ccc}
1 & \cdots & 0\\
\vdots & \ddots & \vdots\\
0 & \cdots & 1
\end{array}
\right|
$$

Usually lines are separated by \\ but in some forums, lines are separated by \\\ or even \\\\.

 

Superscripts and subcripts

superscripts:
$$a^n \hspace{0.5in} a^n+1 \hspace{0.5in} a^{n+1} \hspace{0.5in} a^{{n+1}^2}$$$

$$a^n \hspace{0.5in} a^n+1 \hspace{0.5in} a^{n+1} \hspace{0.5in} a^{{n+1}^2}$$

subscripts:
$$a_k \hspace{0.5in} a_k+1 \hspace{0.5in} a_{k+1} \hspace{0.5in} a^{{k+1}^2}$$$

$$a_k \hspace{0.5in} a_k+1 \hspace{0.5in} a_{k+1} \hspace{0.5in} a^{{k+1}^2}$$

both:
$$a^n_k \hspace{0.5in} ^2R_3$$$

$$a^n_k \hspace{0.5in} ^2R_3$$

as inline bounds:
$%\sum_{i=0}^{n}i \times \prod_{j=1}^{k}{j} \approx \int_{0}^{n}x dx$%

$%\sum_{i=0}^{n}i \times \prod_{j=1}^{k}{j} \approx \int_{0}^{n}x dx$%

as own-line bounds:
$$\sum_{i=0}^{n}i \times \prod_{j=1}^{k}{j} \approx \int_{0}^{n}x dx$$$

$$\sum_{i=0}^{n}i \times \prod_{j=1}^{k}{j} \approx \int_{0}^{n}x dx$$

 

Fractions, binomials and roots

Fractions:
$$ f(x) = \frac{1}{2} \frac{1}{1 + \frac{1}{n}} $$$

$$ f(x) = \frac{1}{2} \frac{1}{1 + \frac{1}{n}} $$

Binomial coefficient:
$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$$

$$\binom{n}{k} =\frac{n!}{k!(n-k)!}$$

Roots:
$$ \sqrt{4} = 2 \hspace{0.5in} \sqrt[3]{8} = 2 \hspace{0.5in} $$$

$$ \sqrt{4} = 2 \hspace{0.5in} \sqrt[3]{8} = 2 \hspace{0.5in} $$

 

Limits, integrals and derivatives

limits:
$$ \lim_{n \rightarrow \infty} \frac{1}{n} = 0 $$$

$$ \lim_{n \rightarrow \infty} \frac{1}{n} = 0 $$

Integrals
$$ f(x) = \int_{a}^{b} x^2 dx \hspace{0.5in} \oint_{L} f(x) dx$$$

$$ f(x) = \int_{a}^{b} x^2 dx \hspace{0.5in} \oint_{L} f(x) dx$$

Derivatives and partial derivatives
$$ \frac{dx}{dt} = \dot{x} \hspace{0.5in} \frac{\partial^2 f}{\partial x \partial y} = g(x)$$$

$$ \frac{dx}{dt} = \dot{x} \hspace{0.5in}
\frac{\partial^2 f}{\partial x \partial y} = g(x)$$

 

Multi-line equations

We can think of them as two-column tables: we align the equal sign using an ampersand and we separate the lines with \\:

$$
\begin{align}
F(x) & = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots \\
& = \exp(x)
\end{align}
$$$

$$
\begin{align}
F(x) & = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots \\
     & = \exp(x)
\end{align}
$$

Usually lines are separated by \\ but in some forums, lines are separated by \\\ or even \\\\.

 

Delimiters and Function definitions

Delimiters
These are characters that, like parenthesis or brackets, surround a block of text. They always come in pairs: an opening delimiter and a closing one:

$$
\begin{array}{lllllllllll}
( & \mbox{(} & \hspace{.4in} &
) & \mbox{)} & \hspace{.8in} &
[ & \mbox{]} & \hspace{.4in} &
] & \mbox{]}\\
\{ & \mbox{\{} & \hspace{.4in} &
\} & \mbox{\}} & \hspace{.8in} &
\lfloor & \mbox{\lfloor} & \hspace{.4in} &
\rfloor & \mbox{\rfloor}\\
\lceil & \mbox{\lceil} & \hspace{.4in} &
\rceil & \mbox{\rceil} & \hspace{.8in} &
\langle & \mbox{\langle} & \hspace{.4in} &
\rangle & \mbox{\rangle}
\end{array}
$$

Notice that the curly bracket is a special character in LaTeX (it is used to indicate the arguments of a command) and thus we have to escape it with a backslash, i.e., \{ and \}.

Function definitions
In the following example, we use delimiters – curly braces – to write the Ackermann function. Since we only need the left curly brace we use an invisible delimiter, the period, as the closing delimiter. The size of the delimiter adapts to the size of the block.

$$
A(m,n,1) = \left\{
\begin{array}{ll}
n+1 & \mbox{if } m = 0 \\
A(m-1,1) & \mbox{if } m > n \mbox{ and } n = 0\\
A(m-1, A(m,n-1)) & \mbox{if } m > 0 \mbox{ and } n > 0
\end{array}
\right.
$$$

$$
A(m,n,1) = \left\{
\begin{array}{ll}
n+1 & \mbox{if } m = 0 \\
A(m-1,1) & \mbox{if } m > n \mbox{ and } n = 0\\
A(m-1, A(m,n-1)) & \mbox{if } m > 0 \mbox{ and } n > 0
\end{array}
\right.
$$

Usually lines are separated by \\ but in some forums, lines are separated by \\\ or even \\\\.

 

Vectors and matrices

Vectors

$$^i\vec{P}_j(a) = \left| \begin{array}{cccc}a & b & c & d \end{array} \right|^T$$$

$$^i\vec{P}_j(a) = \left| \begin{array}{cccc}a & b & c & d \end{array} \right|^T$$

matrices
$$
I = \left|
\begin{array}{ccc}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{array}
\right|
$$$

$$ I = \left|
\begin{array}{ccc}
1 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 1
\end{array}
\right|
$$

Usually lines are separated by \\ but in some forums, lines are separated by \\\ or even \\\\.

 

Calligraphic type

LaTeX provides a calligraphic type only for the uppercase letters:
$$
\begin{array}{llllllllllllll}
\cal A & \mbox{\cal A} & \hspace{.15in} &
\cal B & \mbox{\cal B} & \hspace{.15in} &
\cal C & \mbox{\cal C} & \hspace{.15in} &
\cal D & \mbox{\cal D} & \hspace{.15in} &
\cal E & \mbox{\cal E}\\
\cal F & \mbox{\cal F} & \hspace{.15in} &
\cal G & \mbox{\cal G} & \hspace{.15in} &
\cal H & \mbox{\cal H} & \hspace{.15in} &
\cal I & \mbox{\cal I} & \hspace{.15in} &
\cal J & \mbox{\cal J}\\
\cal K & \mbox{\cal K} & \hspace{.15in} &
\cal L & \mbox{\cal L} & \hspace{.15in} &
\cal M & \mbox{\cal M} & \hspace{.15in} &
\cal N & \mbox{\cal N} & \hspace{.15in} &
\cal O & \mbox{\cal O}\\
\cal P & \mbox{\cal P} & \hspace{.15in} &
\cal Q & \mbox{\cal Q} & \hspace{.15in} &
\cal R & \mbox{\cal R} & \hspace{.15in} &
\cal S & \mbox{\cal S} & \hspace{.15in} &
\cal T & \mbox{\cal T}\\
\cal U & \mbox{\cal U} & \hspace{.15in} &
\cal V & \mbox{\cal V} & \hspace{.15in} &
\cal W & \mbox{\cal W} & \hspace{.15in} &
\cal X & \mbox{\cal X} & \hspace{.15in} &
\cal Y & \mbox{\cal Y}\\
\cal Z & \mbox{\cal Z} & \hspace{.15in} &
& & \hspace{.15in} &
& & \hspace{.15in} &
& & \hspace{.15in} &
&
\end{array}
$$

For example:

..and let $%{\cal X} = {\cal A} \cup {\cal B}$% be the space..

..and let $%{\cal X} = {\cal A} \cup {\cal B}$% be the space..

 

Math accents

$$
\begin{array}{lllllllllll}
\hat{a} & \mbox{\hat{a}} & \hspace{.4in} &
\acute{a} & \mbox{\acute{a}} & \hspace{.4in} &
\bar{a} & \mbox{\bar{a}} & \hspace{.4in} &
\dot{a} & \mbox{\dot{a}}\\
\check{a} & \mbox{\check{a}} & \hspace{.4in} &
\grave{a} & \mbox{\grave{a}} & \hspace{.4in} &
\vec{a} & \mbox{\vec{a}} & \hspace{.4in} &
\ddot{a} & \mbox{\ddot{a}}\\
\breve{a} & \mbox{\breve{a}} & \hspace{.4in} &
\tilde{a} & \mbox{\tilde{a}} & \hspace{.4in} &
& & \hspace{.4in} &
&
\end{array}
$$

For example:

$%Andr\acute{e}\!s\;Casta\tilde{n}o \hspace{0.5in} \ddot{x}=\frac{d \dot{x}}{dt}$%

$%Andr\acute{e}\!s\;Casta\tilde{n}o \hspace{0.5in} \ddot{x}=\frac{d \dot{x}}{dt}$%

although, if the accent is not going to be applied to a math symbol, it is much easier to write it using the standard character modifier and the \mbox command:

$%\mbox{Andrés Castaño}$%

$%\mbox{Andrés Castaño}$%

 

Over and underlining

The \hat and the \tilde commands have wide versions – \widehat and \widetilde – that try to accommodate large arguments:

$$\widetilde{a + b} \hspace{0.5in} \widehat{a + b}$$$

$$\widetilde{a + b} \hspace{0.5in} \widehat{a + b}$$

A more common type of over-lining uses the \overline command,

$$ \overline{A \cdot B} \; = \overline{A} \; + \overline{B} $$$

$$ \overline{A \cdot B} \; = \overline{A} \; + \overline{B} $$

or an over- or under-brace,

$$ \overbrace{1, 2, \underbrace{3, \ldots, 99}_{97}, 100}^{100}$$$

$$ \overbrace{1, 2, \underbrace{3, \ldots, 99}_{97}, 100}^{100}$$

 

Greek alphabet

$$
\begin{array}{lllllllllll}
\alpha & \mbox{\alpha} & \hspace{.2in} &
\theta & \mbox{\theta} & \hspace{.2in} &
o & \mbox{o} & \hspace{.2in} &
\tau & \mbox{\tau}\\
\beta & \mbox{\beta} & \hspace{.2in} &
\vartheta & \mbox{\vartheta} & \hspace{.2in} &
\pi & \mbox{\pi} & \hspace{.2in} &
\upsilon & \mbox{\upsilon}\\
\gamma & \mbox{\gamma} & \hspace{.2in} &
\iota & \mbox{\iota} & \hspace{.2in} &
\varpi & \mbox{\varpi} & \hspace{.2in} &
\phi & \mbox{\phi}\\
\delta & \mbox{\delta} & \hspace{.2in} &
\kappa & \mbox{\kappa} & \hspace{.2in} &
\rho & \mbox{\rho} & \hspace{.2in} &
\varphi & \mbox{\varphi}\\
\epsilon & \mbox{\epsilon} & \hspace{.2in} &
\lambda & \mbox{\lambda} & \hspace{.2in} &
\varrho & \mbox{\varrho} & \hspace{.2in} &
\chi & \mbox{\chi}\\
\varepsilon & \mbox{\varepsilon} & \hspace{.2in} &
\mu & \mbox{\mu} & \hspace{.2in} &
\sigma & \mbox{\sigma} & \hspace{.2in} &
\psi & \mbox{\psi}\\
\zeta & \mbox{\zeta} & \hspace{.2in} &
\nu & \mbox{\nu} & \hspace{.2in} &
\varsigma & \mbox{\varsigma} & \hspace{.2in} &
\omega & \mbox{\omega}\\
\eta & \mbox{\eta} & \hspace{.2in} &
\xi & \mbox{\xi} & \hspace{.2in} &
& & \hspace{.2in} &
& \\
\Gamma & \mbox{\Gamma} & \hspace{.2in} &
\Lambda & \mbox{\Lambda} & \hspace{.2in} &
\Sigma & \mbox{\Sigma} & \hspace{.2in} &
\Psi & \mbox{\Psi}\\
\Delta & \mbox{\Delta} & \hspace{.2in} &
\Xi & \mbox{\Xi} & \hspace{.2in} &
\Upsilon & \mbox{\Upsilon} & \hspace{.2in} &
\Omega & \mbox{\Omega}\\
\Theta & \mbox{\Theta} & \hspace{.2in} &
\Pi & \mbox{\Pi} & \hspace{.2in} &
\Phi & \mbox{\Phi} & \hspace{.2in} &
&
\end{array}
$$

 

Operation symbols

$$
\begin{array}{lllllllllll}
\pm & \mbox{\pm} & \hspace{.2in} &
\cap & \mbox{\cap} & \hspace{.2in} &
\diamond & \mbox{\diamond} & \hspace{.2in} &
\oplus & \mbox{\oplus}\\
\mp & \mbox{\mp} & \hspace{.2in} &
\cup & \mbox{\cup} & \hspace{.2in} &
\bigtriangleup & \mbox{\bigtriangleup} & \hspace{.2in} &
\ominus & \mbox{\ominus}\\
\times & \mbox{\times} & \hspace{.2in} &
\uplus & \mbox{\uplus} & \hspace{.2in} &
\bigtriangledown & \mbox{\bigtriangleudown} & \hspace{.2in} &
\otimes & \mbox{\otimes}\\
\div & \mbox{\div} & \hspace{.2in} &
\sqcap & \mbox{\sqcap} & \hspace{.2in} &
\triangleleft & \mbox{\triangleleft} & \hspace{.2in} &
\oslash & \mbox{\\oslash}\\
\ast & \mbox{\ast} & \hspace{.2in} &
\sqcup & \mbox{\sqcup} & \hspace{.2in} &
\triangleright & \mbox{\triangleright} & \hspace{.2in} &
\odot & \mbox{\\odot}\\
\star & \mbox{\star} & \hspace{.2in} &
\vee & \mbox{\vee} & \hspace{.2in} &
\lhd & \mbox{\lhd} & \hspace{.2in} &
\bigcirc & \mbox{\\bigcirc}\\
\circ & \mbox{\circ} & \hspace{.2in} &
\wedge & \mbox{\wedge} & \hspace{.2in} &
\rhd & \mbox{\rhd} & \hspace{.2in} &
\dagger & \mbox{\dagger}\\
\bullet & \mbox{\bullet} & \hspace{.2in} &
\setminus & \mbox{\setminus} & \hspace{.2in} &
\unlhd & \mbox{\unlhd} & \hspace{.2in} &
\ddagger & \mbox{\ddagger}\\
\cdot & \mbox{\cdot} & \hspace{.2in} &
\wr & \mbox{\wr} & \hspace{.2in} &
\unrhd & \mbox{\unrhd} & \hspace{.2in} &
\amalg & \mbox{\amalg}
\end{array}
$$

 

Relation symbols

To cross out any of these symbols, we use the command \not:
$%x \not\lt y$%

$%x \not\lt y$%

$$
\begin{array}{lllllllllll}
\leq & \mbox{\leq} & \hspace{.2in} &
\geq & \mbox{\geq} & \hspace{.2in} &
\equiv & \mbox{\equiv} & \hspace{.2in} &
\models & \mbox{\models}\\
\prec & \mbox{\prec} & \hspace{.2in} &
\succ & \mbox{\succ} & \hspace{.2in} &
\sim & \mbox{\sim} & \hspace{.2in} &
\perp & \mbox{\perp}\\
\preceq & \mbox{\preceq} & \hspace{.2in} &
\succeq & \mbox{\succeq} & \hspace{.2in} &
\simeq & \mbox{\simeq} & \hspace{.2in} &
\mid & \mbox{\mid}\\
\ll & \mbox{\ll} & \hspace{.2in} &
\gg & \mbox{\gg} & \hspace{.2in} &
\asymp & \mbox{\asymp} & \hspace{.2in} &
\parallel & \mbox{\parallel}\\
\subset & \mbox{\subset} & \hspace{.2in} &
\supset & \mbox{\supset} & \hspace{.2in} &
\approx & \mbox{\approx} & \hspace{.2in} &
\bowtie & \mbox{\bowtie}\\
\subseteq & \mbox{\subseteq} & \hspace{.2in} &
\supseteq & \mbox{\supseteq} & \hspace{.2in} &
\cong & \mbox{\cong} & \hspace{.2in} &
\Join & \mbox{\Join}\\
\sqsubset & \mbox{\sqsubset} & \hspace{.2in} &
\sqsupset & \mbox{\sqsupset} & \hspace{.2in} &
\neq & \mbox{\neq} & \hspace{.2in} &
\smile & \mbox{\smile}\\
\sqsubseteq & \mbox{\sqsubseteq} & \hspace{.2in} &
\sqsupseteq & \mbox{\sqsupseteq} & \hspace{.2in} &
\doteq & \mbox{\doteq} & \hspace{.2in} &
\frown & \mbox{\frown}\\
\in & \mbox{\in} & \hspace{.2in} &
\ni & \mbox{\ni} & \hspace{.2in} &
\propto & \mbox{\propto} & \hspace{.2in} &
& \\
\vdash & \mbox{\vdash} & \hspace{.2in} &
\dashv & \mbox{\dashv} & \hspace{.2in} &
& & &
& \\
\end{array}
$$

 

Arrows

$$
\begin{array}{llllllll}
\leftarrow & \mbox{\leftarrow} & \hspace{.01in} &
\longleftarrow & \mbox{\longleftarrow} & \hspace{.01in} &
\uparrow & \mbox{\uparrow}\\
\Leftarrow & \mbox{\Leftarrow} & \hspace{.01in} &
\Longleftarrow & \mbox{\Longleftarrow} & \hspace{.01in} &
\Uparrow & \mbox{\Uparrow}\\
\rightarrow & \mbox{\rightarrow} & \hspace{.01in} &
\longrightarrow & \mbox{\longrightarrow} & \hspace{.01in} &
\downarrow & \mbox{\downarrow}\\
\Rightarrow & \mbox{\Rightarrow} & \hspace{.01in} &
\Longrightarrow & \mbox{\Longrightarrow} & \hspace{.01in} &
\Downarrow & \mbox{\Downarrow}\\
\leftrightarrow & \mbox{\leftrightarrow} & \hspace{.01in} &
\longleftrightarrow & \mbox{\longleftrightarrow} & \hspace{.01in} &
\updownarrow & \mbox{\updownarrow}\\
\Leftrightarrow & \mbox{\Leftrightarrow} & \hspace{.01in} &
\Longleftrightarrow & \mbox{\Longleftrightarrow} & \hspace{.01in} &
\Updownarrow & \mbox{\Updownarrow}\\
\mapsto & \mbox{\mapsto} & \hspace{.01in} &
\longmapsto & \mbox{\longmapsto} & \hspace{.01in} &
\nearrow & \mbox{\nearrow}\\
\hookleftarrow & \mbox{\hookleftarrow} & \hspace{.01in} &
\hookrightarrow & \mbox{\hookrightarrow} & \hspace{.01in} &
\searrow & \mbox{\searrow}\\
\leftharpoonup & \mbox{\leftharpoonup} & \hspace{.01in} &
\rightharpoonup & \mbox{\rightharpoonup} & \hspace{.01in} &
\swarrow & \mbox{\swarrow}\\
\leftharpoondown & \mbox{\leftharpoondown} & \hspace{.01in} &
\rightharpoondown & \mbox{\rightharpoondown} & \hspace{.01in} &
\nwarrow & \mbox{\nwarrow}\\
\rightleftharpoons & \mbox{\rightleftharpoons} & \hspace{.01in} &
\leadsto & \mbox{\leadsto} & \hspace{.01in} &
&
\end{array}
$$

 

Miscellaneous Symbols

$$
\begin{array}{lllllllllll}
\aleph & \mbox{\aleph} & \hspace{.2in} &
\prime & \mbox{\prime} & \hspace{.2in} &
\forall & \mbox{\forall} & \hspace{.2in} &
\infty & \mbox{\infty}\\
\hbar & \mbox{\hbar} & \hspace{.2in} &
\emptyset & \mbox{\emptyset} & \hspace{.2in} &
\exists & \mbox{\exists} & \hspace{.2in} &
\Box & \mbox{\Box}\\
\imath & \mbox{\imath} & \hspace{.2in} &
\nabla & \mbox{\nabla} & \hspace{.2in} &
\neg & \mbox{\neg} & \hspace{.2in} &
\Diamond & \mbox{\Diamond}\\
\jmath & \mbox{\jmath} & \hspace{.2in} &
\surd & \mbox{\surd} & \hspace{.2in} &
\flat & \mbox{\flat} & \hspace{.2in} &
\triangle & \mbox{\triangle}\\
\ell & \mbox{\ell} & \hspace{.2in} &
\top & \mbox{\top} & \hspace{.2in} &
\natural & \mbox{\natural} & \hspace{.2in} &
\clubsuit & \mbox{\clubsuit}\\
\wp & \mbox{\wp} & \hspace{.2in} &
\bot & \mbox{\bot} & \hspace{.2in} &
\sharp & \mbox{\sharp} & \hspace{.2in} &
\diamondsuit & \mbox{\diamondsuit}\\
\Re & \mbox{\Re} & \hspace{.2in} &
\| & \mbox{\|} & \hspace{.2in} &
\backslash & \mbox{\backslash} & \hspace{.2in} &
\heartsuit & \mbox{\heartsuit}\\
\Im & \mbox{\Im} & \hspace{.2in} &
\angle & \mbox{\angle} & \hspace{.2in} &
\partial & \mbox{\partial} & \hspace{.2in} &
\spadesuit & \mbox{\spadesuit}\\
\mho & \mbox{\mho} & \hspace{.2in} &
& & \hspace{.2in} &
& & \hspace{.2in} &
&
\end{array}
$$

 

Variable-size symbols

$$
\begin{array}{llllllll}
\sum & \mbox{\sum} & \hspace{.3in} &
\bigcap & \mbox{\bigcap} & \hspace{.3in} &
\bigodot & \mbox{\bigodot}\\
\prod & \mbox{\prod} & \hspace{.3in} &
\bigcup & \mbox{\bigcup} & \hspace{.3in} &
\bigotimes & \mbox{\bigotimes}\\
\coprod & \mbox{\coprod} & \hspace{.3in} &
\bigsqcup & \mbox{\bigsqcup} & \hspace{.3in} &
\bigoplus & \mbox{\bigoplus}\\
\int & \mbox{\int} & \hspace{.3in} &
\bigvee & \mbox{\bigvee} & \hspace{.3in} &
\biguplus & \mbox{\biguplus}\\
\oint & \mbox{\oint} & \hspace{.3in} &
\bigwedge & \mbox{\bigwedge} & \hspace{.3in} &
&
\end{array}
$$

Good luck


LaTeX Symbols

This post is an extended version of the following original post:

Title: Adding $%\LaTeX$% (and highlighting, color and math) to your posts
Forum of Introduction to Computer Science, by David Evans offered under Udacity
First publication date: 22 Jul, 2012, 23:05 PST
Link: http://forums.udacity.com/cs101/questions/17565/adding-latex-and-highlighting-colorbluecolor-and-malphatau-h-to-your-posts