Sunday, October 4, 2015

The Force Awakens ... on Twitter


TL;DR

Last week I wrote about Markov Chains in a post titled, "Using Python to Sound Like a Wine Snob". Yesterday morning, while laying in bed, I thought it would be a fun exercise to take this idea further. How about creating a Python script that automatically generates, and posts, humorous tweets in the voice of Yoda, combined with the wisdom of Zen masters? I created and tested the code, and thus the creation of the Yoda parody twitter handle, @YodaUncut.

A day later I felt it would be valuable to write a blog post sharing what I coded. This code is like Burger King, "have it your way, but don't go crazy". Please, don't abuse this with Twitter (see Twitter's Terms of Service). I'm sharing this as educational material that hopefully inspires some people to pursue, or continue learning, coding ... or at least provide some answers for anyone who's stuck with their existing code, searching the web for some answers ... else, for your amusement?!?

Be original

In my example I blend quotes from Yoda and various Zen masters. If you go through this exercise, pick something different. For example, how about combining quotes from Warren Buffett and Darth Vader, or 50 Shades of Grey and Mother Teresa?

Criteria

Here's the initial criteria I came up with:

  1. Purpose: Generate amusing Yoda'esque tweets using Markcov Chains
  2. The tweets would blend the wisdom of Yoda with the koans of various Zen masters
  3. The generated tweets would need to be 140 characters, or less, in length
  4. Create a random latitude and longitude in North America for Yoda's geolocation
  5. Post the tweet on Twitter with Yoda's geolocation

Requirements

In order for the Python script to work, I used the following third party libraries:

  • oauthlib==1.0.3
  • requests==2.7.0
  • requests-oauthlib==0.5.0

The above libraries were installed with:

-----
$ easy_install pip
Searching for pip
Best match: pip 6.0.8
Adding pip 6.0.8 to easy-install.pth file
Installing pip script to /tmp/test/bin
Installing pip3.4 script to /tmp/test/bin
Installing pip3 script to /tmp/test/bin
Using /tmp/test/lib/python2.7/site-packages
Processing dependencies for pip
Finished processing dependencies for pip

$ pip install requests oauthlib requests-oauthlib
Collecting requests
Using cached requests-2.7.0-py2.py3-none-any.whl
Collecting oauthlib
Using cached oauthlib-1.0.3.tar.gz
Collecting requests-oauthlib
Using cached requests_oauthlib-0.5.0-py2.py3-none-any.whl
Installing collected packages: requests-oauthlib, oauthlib, requests
Running setup.py install for oauthlib
Successfully installed oauthlib-1.0.3 requests-2.7.0 requests-oauthlib-0.5.0
-----

The code was written for, and tested with, Python version 2.7.9. It hasn't been tested in with Python 3.

-----
$ python --version
Python 2.7.9
-----

Generate Twitter API and token keys/secrets

You will need to get your Twitter API and token keys/secrets. If you haven't already:

  1. Go to https://apps.twitter.com/
  2. Create a new app
  3. Once created, scroll down to "Application Settings" and click on "manage keys and access tokens"
  4. Copy you "Consumer Key (API Key)" and "Consumer Secret (API Secret)"
  5. Scroll down to "Token Actions" and click on "Create my access token"
  6. Copy your "Access Token" and "Access Token Secret"

Save your keys/secrets. Don't share them. Don't upload them to your git repository.

For the sake of this example, put them in a file called "keys.txt". The syntax of keys.txt will look like the following:

-----
$ cat keys.txt
client_key:
client_secret:
token:
token_secret:
-----

Imports

Starting our Python code with the imports:

  • We want "urllib" for proper encoding of our tweets
  • We import "choice" from "random" so that we can randomly choose text as part of the Markcov Chain
  • We also want to import "randint" from "random" for generating some random numbers for geolocation (latitude and longitude)

The two remaining imports require third party libraries to first be installed:

  • To post our tweet over HTTPS to Twitter's API, we'll need "requests"
  • We also need "OAuth1" from "requests_oauthlib" for authentication with Twitter

Let's start with the Python code...

# Local libraries
import urllib
from random import choice, randint
# Third party libraries
import requests
from requests_oauthlib import OAuth1

Taking about geolocation...

In order for the latitude and longitude geolocation to show up in the tweets, geolocation needs to be enabled for the Twitter account. To enable geolocation sharing:

  1. Go to your Twitter account settings
  2. On the left side, click on "Security and privacy"
  3. Scroll down to "Privacy" > "Tweet location" and check "Add a location to my Tweets"
  4. Click on "Save changes"

Preparing the environment

Adjust the following filenames to whatever you're using. In my example, I have a file that has Yoda quotes, and a second file that has various Zen master quotes.

# Quotes filenames
quote_filename1 = 'yoda_quotes.txt'
quote_filename2 = 'zen_quotes.txt'

The format of both quote files looks something like the following. Each quote is on a new line. For example:

-----
$ tail -n 25 yoda_quotes.txt | head -n 5
size matters not. look at me. judge me by my size do you?
so certain are you. always with you it cannot be done. hear you nothing that I say?
strong am I with the Force, but not that strong. twilight is upon me, and soon night must fall. that is the way of things... the way of the Force.
that is why you fail.
the boy you trained, gone he is, consumed by Darth Vader.

$ tail -n 25 zen_quotes.txt | head -n 5
no yesterday, no tomorrow, and no today.
nothing is exactly as it seems, nor is it otherwise.
one falling leaf is not just one leaf; it means the whole autumn.
the Force is not some kind of excitement, but merely concentration on our usual everyday routine.
the Force is selling water by the river.
-----

Twiter's API base URL

Here we define the base URL for Twitter's API endpoints.

# Base URL for Twitter calls
base_twitter_url = "https://api.twitter.com/1.1/"

Create a function to get your Twitter API and token keys/secrets from keys.txt

def get_twitter_credentials():
    """
    This function reads in the Twitter credentials and organizes them in a dictionary
    :return:  Return the twitter credentials as a dict
    """
    f = open('keys.txt', 'rb')
    raw_credentials = f.readlines()
    f.close()
    credentials = {}
    for credential in raw_credentials:
        key = credential.split(':')[0]
        value = credential.split(':')[1].strip('\n')
        credentials[key] = value
    return credentials

Reading in the quotes

We also need a function that will read in the quotes from our quotes files.

def get_quotes(filename):
    """
    This function read the quotes in from a file
    :param filename:  The filename of the file that contains the quotes
    :return:  A string with the quotes concatenated together. Newlines are removed.
    """
    f = open(filename, 'rb')
    quotes = f.readlines()
    f.close()
    quotes_list = []
    for quote in quotes:
        quotes_list.append(quote.strip('\n'))
    return " ".join(quotes_list)

Create a dictionary for use with the Markcov Chain

This is the function used last week in "Using Python to Sound Like a Wine Snob".

def create_markcov_dict(original_text):
    """
    This function takes the quotes and creates a dictionary with the words
    in Markcov chunks
    :param original_text:  The plaintext to chunck up
    :return:  Return the dictionary with Markcov chunks
    """

    original_text = original_text
    split_text = original_text.split()
    markcov_dict = {}
    for i in xrange(len(split_text) - 2):
        key_name = (split_text[i], split_text[i+1])
        key_value = split_text[i+2]
        if key_name in markcov_dict:
            markcov_dict[key_name].append(key_value)
        else:
            markcov_dict[key_name] = [key_value]
    return markcov_dict

Generating the tweet, 140 characters, or less, using the Markov Chain

This is a modified version of the function used from last week's wine snob example.

def create_markcov_tweet(markcov_dict):
    """
    This function creates the Tweet using the Markov Chain
    :param markcov_dict:  The dictionary with the text in Markcov chunks
    :return:  Return the tweet
    """
    # Pick a random starting point
    selected_words_tuple = choice(markcov_dict.keys())
    markcov_tweet = [selected_words_tuple[0].capitalize(), selected_words_tuple[1]]
    # Generate the Markcov text, ending the Markcov text when we create a "key" that doesn't exist
    while selected_words_tuple in markcov_dict:
        next_word = choice(markcov_dict[selected_words_tuple])
        if len(" ".join(markcov_tweet) + " " + next_word) < 140:
            if (markcov_tweet[-1]).endswith('.'):
                markcov_tweet.append(next_word.capitalize())
            else:
                markcov_tweet.append(next_word)
            selected_words_tuple = (selected_words_tuple[1], next_word)
        else:
            tweet = " ".join(markcov_tweet).strip()
            if tweet.endswith(","):
                tweet = tweet.rstrip(',') + '.'
            elif not (tweet.endswith(".") or
                      tweet.endswith("!") or
                      tweet.endswith("?") or
                      tweet.endswith(";") or
                      tweet.endswith(":")):
                tweet = tweet + '.'
            return tweet

Create a random latitude and longitude in North America

This is dirty, but works fine our purpose.

def get_geolocation():
    """
    This function generates a random latitude and longitude, somwhere
    in Northern America.
    :return:  Returns a dict with the lat and long
    """

    latitude = "{}.{:07d}".format(randint(28, 69), randint(0, 999999))
    longitude = "{}.{:07d}".format(randint(-128, -64), randint(0, 999999))
    return {'lat':latitude, 'long':longitude}

Create function to post our tweet via HTTPS using Twitter's API

This is how we'll upload our tweet to Twitter

def post_tweet(markcov_tweet):
    """
    This function posts the tweet to Twitter
    :param markcov_tweet:  The text to tweet
    :return:  Return the response from requests.post()
    """

    # Setup authentication
    credentials = get_twitter_credentials()
    client_key = credentials['client_key']
    client_secret = credentials['client_secret']
    token = credentials['token']
    token_secret = credentials['token_secret']
    oauth = OAuth1(client_key, client_secret, token, token_secret)
    # Get latitude and longitude for the tweet
    coordinates = get_geolocation()
    # Make the URL
    api_url = "{}statuses/update.json".format(base_twitter_url)
    api_url += "?status={}".format(urllib.quote(markcov_tweet))
    api_url += "&lat={}&long={}".format(coordinates['lat'], coordinates['long'])
    api_url += "&display_coordinates=true"
    # tweet
    response = requests.post(api_url, auth=oauth)

    return response

Tying it all together

Now that we have everything in place, let's take this out for a test drive. We're going to generate a tweet and post it to Twitter!

We begin by reading in, and concatenating, all of the quotes from both quote files.

quotes = get_quotes(quote_filename1) + " " + get_quotes(quote_filename2)

For the sake of this example, let's do a quick test to see the first 1000 characters loaded:

print quotes[0:1000]
I cannot teach him. the boy has no patience. I hear a new apprentice you have, Emperor. or should I call you Darth Sidious? Master Obi-Wan, not victory. the shroud of the dark side has fallen. begun, the Clone War has! Qui-Gon's defiance I sense in you. need that you do not. agree with you, the Council does. your apprentice young Skywalker will be. Yoda I am, fight I will. a labyrinth of evil, this war has become. a trial of being old is this: remembering which thing one has said into which young ears. and well you should not! for my ally is the Force. and a powerful ally it is. life creates it, makes it grow. its energy surrounds us... and binds us. luminous beings are we, not this... crude matter! you must feel the Force around you. here, between you, me, the tree, the rock... everywhere! even between the land and the ship. around the survivors, a perimeter create! at an end your rule is... and not short enough it was. awww, cannot get your ship out...eh-heheheh! careful you must be

We'll feed text into our Markcov dictionary creation function

markcov_dict = create_markcov_dict(quotes)

Let's also see how many chunks of text we have in our newly created dictionary as a quick test.

print len(markcov_dict)
1595

Generate the tweet

We'll generate the tweet with our Markcov function and quotes:

markcov_tweet = create_markcov_tweet(markcov_dict)
# Let's see what we'll be tweeting and the number of characters of the tweet
print("[+] Tweet ({}): \"{}\"".format(len(markcov_tweet), markcov_tweet))
[+] Tweet (140): "Of evil, this war has become. A trial of being old is this: remembering which thing one has said into which young ears. And well you should."

Ready ... aim ... FIRE!

Finally, let's post the tweet and get a status of whether or not the tweet was posted successfully!

response = post_tweet(markcov_tweet)
if response.status_code == 200:
    print("[+] Tweet posted successfully")
else:
    print("[-] Tweet post failed")
[+] Tweet posted successfully

Back to this world...

The tweet has now been posted on Twitter. Good job!

I hope you found this useful. Please share suggestions, improvements, and any additional features you've created to enhance the code.

Afterthoughts

  • If you want to run this via cron, you'll want to use absolute paths for the filenames (both quotes files and the keys.txt file).
  • If you get an "InsecurePlatformWarning" when running the script, no worries. The tweets will still post. If you want to get rid of the warning:

    $ sudo apt-get install python-dev libffi-dev libssl-dev
    $ pip install requests[security]
  • Sample Code: I wrote some sample code and put it in a Github repository called Twoda. I added the ability to Tweet animated GIFs from Giphy that pertain to the general theme of your tweets.

Thursday, October 1, 2015

Using Python to Sound Like a Wine Snob


Web Scraping with Beautiful Soup & Using Markov Chain to Create

Wine Snob Gibberish

Some Background

The term, Markov chain, is named after Russian mathematician, Andrey Markov (1871 - 1897). In mathematics, a Markov Chain is a discrete random process with the Markov property. According to Wikipedia, "A stochastic process has the Markov property if the conditional probability distribution of future states of the process (conditional on both past and present states) depends only upon the present state, not on the sequence of events that preceded it." This process changes randomly throughout each iteration in discrete steps. Saaaay what!? Jason Young has a video on YouTube that better explains what a Markov Chain is.

"Markon, Markov, Markon, Markov" - Mr. Miyagi [not really]

Our lives have all graced by Markov Chains. Think about those websites websites you've come across with nonsensical text. These pages are often generated by the use of the Markov Chain. Why do these pages exist? They're used to optimize search engine rankings (the darker side of SEO). Bummer.

Fun with the Markov Chain

While researching Markov Chains, I came across Tony's (@tonyfischetti) blog post. It inspired me to create a Python script that emulates essentially what his example does, but with using the BeautifulSoup library to scrape the initial website content.

Requirements

  • beautifulsoup4 # This is for extracting the data we want from the downloaded web content
  • requests # This is for downloading the web content
  • Note: I used Python 2.7.9 when I created this. I haven't tested the code with Python 3.

Steps

  • Download initial webpage from winespectator.com and determine last "page" number

The idea behind web scraping is to get raw content from a wbsite and extract from it usable data. This is where the python library, BeautifulSoup, comes in handy.

Let's Begin

We'll start by importing the modules we'll need to download a website and extract the data we want.

import requests
from bs4 import BeautifulSoup
from random import choice

We want to pick a random webpage from the website for to feed into our Markov Chain. First, let's find out how many pages this site has at http://www.winespectator.com/dailypicks/category/catid/1/page/???.

Now we'll download the HTML source from a wine website and generate a Beautiful Soup object using the BeautifulSoup function.

base_url = "http://www.winespectator.com/dailypicks/category/catid/1/page"
r = requests.get(base_url)
soup = BeautifulSoup(r.text)

Next, we'll take a look at a section of the website's HTML to figure out what "element" we want to extract in order to get the last page number.

print soup.prettify()[44750:45750]
items -->
      <center>
       <div class="pagination">
        <strong>
         1
        </strong>
        <a href="/dailypicks/category/catid/1/page/2" title="Goto page 2">
         2
        </a>
        <a href="/dailypicks/category/catid/1/page/3" title="Goto page 3">
         3
        </a>
        <a href="/dailypicks/category/catid/1/page/4" title="Goto page 4">
         4
        </a>
        <a href="/dailypicks/category/catid/1/page/5" title="Goto page 5">
         5
        </a>
        <a href="/dailypicks/category/catid/1/page/6" title="Goto page 6">
         6
        </a>
        <a href="/dailypicks/category/catid/1/page/7" title="Goto page 7">
         7
        </a>
        <a href="/dailypicks/category/catid/1/page/2" title="Goto Next Page">
         &gt;&gt;
        </a>
        <a href="/dailypicks/category/catid/1/page/814" title="Goto Last Page">
         Last (814)
        </a>
       </div>
      </center>
     </div>
     <!-- /.mod-container -->
     <div

There's a lot of junk we need to sift through. Looking at the source code from above, we now know we're looking for an element called "div" with class "pagination". Beautiful Soup makes it easy to find and extract this.

As of this writing, we see that the last page is "814". That means the range of possible pages we can download is from 1 - 814. Right on!

Let's extract the last page number (i.e., "814") from the HTML using BeautifulSoup with this knowledge.

html_chunk = soup.find_all("div", class_="pagination")
last_page = (str(html_chunk[0]).split('Last (')[1]).split(')')[0]
print last_page
814

Now let's pick a random page and generate our URL.

random_page_number = choice(xrange(int(last_page)))
url = "{}/{}".format(base_url, random_page_number)
print url
http://www.winespectator.com/dailypicks/category/catid/1/page/288

We'll download this randomly selected page.

r = requests.get(url)
soup = BeautifulSoup(r.text)

Similar to how we found the last page number, let's use BeautifulSoup to look at the HTML we downloaded to figure out what element we're looking for.

print soup.prettify()[41800:42800]
BODEGAS CAMPO VIEJO Rioja Crianza 2007
        </a>
        <h6>
         86 points, $12
        </h6>
        <div class="paragraph">
         Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported.
         <em>
          —Thomas Matthews
         </em>
        </div>
        <!-- /.paragraph -->
       </h5>
      </div>
      <!-- /.list-items -->
      <div class="daily-wine-items">
       <span>
        Jan. 11, 2011
       </span>
       <h5>
        <a href="/wine/detail?note_id=288244">
         STANDING STONE Chardonnay Finger Lakes 2009
        </a>
        <h6>
         85 points, $11
        </h6>
        <div class="paragraph">
         Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made.
         <em>
          —James Molesworth
         </em>
        </d

Analyzing the above HTML, we learn that we're looking for an element called "div" with class "paragraph". Let's extract away...

verbiage = soup.find_all("div", class_="paragraph")
print type(verbiage)
<class 'bs4.element.ResultSet'>

We can now begin looking at the text that Beautiful Soup Extracted. Each element section can be called by its index.

verbiage[0]
<div class="paragraph">4070 Daily Wine Picks found in this category.</div>

We'll probably want to ignore verbiage[0] later.

verbiage[1]
<div class="paragraph">
   Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported.      <em>—Thomas Matthews </em>
</div>

Nice - this looks like some content we want to extract.

Ok, we know we don't want verbiage[0], so we'll start iterating through the entries starting at index 1 (i.e., "[1:]"). We'll also encode the text to UTF-8. Next, we'll want to remove any newlines and tabs that are in the text ... then remove any leading/trailing spaces ... and then split the line so that we ignore the "em" element; we don't care about who wrote the comment on the website. We'll combine all of the sanitized text into a string called scraped_text.

scraped_text = ""
for entry in verbiage[1:]:
    entry = entry.get_text().encode('utf-8')
    entry = entry.replace('\t', '')
    entry = entry.replace('\n', '')
    entry = entry.strip()
    entry = entry.split('—')[0]
    scraped_text += "{} ".format(entry.replace('Back to Top', ''))
scraped_text = str(scraped_text.split('Featured:')[0])

Let's see what we got by printing scrapted_text.

print(scraped_text)
Light, firm tannins support a pleasingly plump texture in this fresh red, which offers black cherry, leaf and tobacco notes, with a smoky finish. Drink now through 2013. 50,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 40,000 cases made. Vibrant and mouthwatering, with a laser beam of lemon, lime, grapefruit and apricot flavors. Hints of fresh herbs and flowers add to the complexity. Drink now. 250,000 cases imported. Syrah-like, with layers of plum, spice and violet flavors framed by a fine layer of tannins, followed by a focused, tar-tinged finish. Drink now. 60,000 cases made. Browse our exclusive lists of the world's top wine values, top value producers and easy-to-find wines.

Yawn. I like wine. Well, I like to drink wine.

At this point, we have the text we want to work with. Let's create the Markcov Chain and generate some new text.

We'll define a function that splits text passed on to it into a dictionary of Markcov Chain chunks, returning the new dict once it's done. For example, take the sentance, "I love walking cats in New York City". The sentance is first chunked into bi-grams:

  • I love
  • love walking
  • walking cats
  • cats in
  • in New
  • New York
  • York City

With Python, we'll make these immutable keys in a dictionary (dict):

  • {('I', 'love'): '', ('love', 'walking'): '', ... }

We'll then need to add values to each of the keys. The values will consist of the word that comes after each instance of the bi-grams. So, in the case of "I love", the third word is "walking".

If we feed more data into our function, there may be multiple instances of "I love". For example, "I love walking cats in New York city. I love eating pizza.". The words "walking" and "eating" both come after "I love" (there are two instances of "I love"). The value we assign to the "('I', 'love')" dictionary key is a list consisting of "['walking', 'eating']"

Our dictionary begins to look like:

  • {('I', 'love'): ['walking', 'eating'], ... }

Once completed, we return the dict.

def create_markcov_dict(original_text):
    original_text = original_text
    split_text = original_text.split()
    markcov_dict = {}
    for i in xrange(len(split_text) - 2):
        key_name = (split_text[i], split_text[i+1])
        key_value = split_text[i+2]
        if key_name in markcov_dict:
            markcov_dict[key_name].append(key_value)
        else:
            markcov_dict[key_name] = [key_value]
    return markcov_dict

Let's send the above function our scraped text from the website.

markcov_dict = create_markcov_dict(scraped_text)
print(markcov_dict)
{('top', 'wine'): ['values,'], ('lime,', 'grapefruit'): ['and'], ('wine', 'values,'): ['top'], ('Hints', 'of'): ['fresh'], ('laser', 'beam'): ['of'], ('add', 'to'): ['the'], ('and', 'mouthwatering,'): ['with'], ('which', 'offers'): ['black'], ('green', 'apple,'): ['melon'], ('violet', 'flavors'): ['framed'], ('notes,', 'with'): ['a'], ('value', 'producers'): ['and'], ('tobacco', 'notes,'): ['with'], ('imported.', 'Up'): ['front,'], ('made.', 'Vibrant'): ['and'], ('and', 'easy-to-find'): ['wines.'], ('mouthwatering,', 'with'): ['a'], ('tannins,', 'followed'): ['by'], ('of', 'tannins,'): ['followed'], ('Tasty,', 'showing'): ['citrus,'], ('flavors', 'framed'): ['by'], ('of', 'plum,'): ['spice'], ('of', 'lemon,'): ['lime,'], ('a', 'pleasingly'): ['plump'], ('40,000', 'cases'): ['made.'], ('and', 'apple'): ['flavors'], ('250,000', 'cases'): ['imported.'], ('values,', 'top'): ['value'], ('2013.', '50,000'): ['cases'], ('flavors', 'that'): ['have'], ('butter', 'hints.'): ['Just'], ('ripeness', 'and'): ['a'], ('lists', 'of'): ['the'], ('and', 'butter'): ['hints.'], ('of', 'the'): ["world's"], ('finish.', 'Drink'): ['now', 'now.'], ('now.', '60,000'): ['cases'], ('Drink', 'now.'): ['1,184', '40,000', '250,000', '60,000'], ('and', 'apricot'): ['flavors.'], ('Syrah-like,', 'with'): ['layers'], ('honest.', 'Drink'): ['now.'], ('that', 'have'): ['a'], ('front,', 'with'): ['green'], ('fine', 'layer'): ['of'], ('top', 'value'): ['producers'], ('1,184', 'cases'): ['made.'], ('and', 'flowers'): ['add'], ('all', 'honest.'): ['Drink'], ('cases', 'imported.'): ['Up', 'Syrah-like,'], ('apple,', 'melon'): ['and'], ('Up', 'front,'): ['with'], ('floral', 'quality.'): ['Balanced'], ('texture', 'in'): ['this'], ('the', 'complexity.'): ['Drink'], ('plum,', 'spice'): ['and'], ('to', 'the'): ['complexity.'], ('now.', '40,000'): ['cases'], ('a', 'fine'): ['layer'], ('flavors.', 'Hints'): ['of'], ('juicy.', 'Drink'): ['now.'], ('fresh', 'herbs'): ['and'], ('tar-tinged', 'finish.'): ['Drink'], ('hints.', 'Just'): ['tangy'], ('and', 'tobacco'): ['notes,'], ('pleasingly', 'plump'): ['texture'], ('framed', 'by'): ['a'], ('Light,', 'firm'): ['tannins'], ('now.', '1,184'): ['cases'], ('of', 'fresh'): ['herbs'], ('with', 'green'): ['apple,'], ('grapefruit', 'and'): ['apricot'], ('melon', 'and'): ['butter'], ('have', 'a'): ['pleasant'], ('leaf', 'and'): ['tobacco'], ('cherry,', 'leaf'): ['and'], ('beam', 'of'): ['lemon,'], ('smoky', 'finish.'): ['Drink'], ('red,', 'which'): ['offers'], ('keep', 'it'): ['all'], ('showing', 'citrus,'): ['pear'], ('the', "world's"): ['top'], ('offers', 'black'): ['cherry,'], ('now', 'through'): ['2013.'], ('in', 'this'): ['fresh'], ('now.', '250,000'): ['cases'], ('complexity.', 'Drink'): ['now.'], ('a', 'laser'): ['beam'], ('made.', 'Tasty,'): ['showing'], ('Balanced', 'and'): ['juicy.'], ('60,000', 'cases'): ['made.'], ('our', 'exclusive'): ['lists'], ('this', 'fresh'): ['red,'], ('firm', 'tannins'): ['support'], ('Drink', 'now'): ['through'], ('flowers', 'add'): ['to'], ('pleasant', 'ripeness'): ['and'], ('imported.', 'Syrah-like,'): ['with'], ('producers', 'and'): ['easy-to-find'], ('Just', 'tangy'): ['enough'], ('apple', 'flavors'): ['that'], ('with', 'layers'): ['of'], ('cases', 'made.'): ['Tasty,', 'Vibrant', 'Browse'], ('focused,', 'tar-tinged'): ['finish.'], ('enough', 'on'): ['the'], ('to', 'keep'): ['it'], ('followed', 'by'): ['a'], ('pear', 'and'): ['apple'], ('quality.', 'Balanced'): ['and'], ('plump', 'texture'): ['in'], ('a', 'pleasant'): ['ripeness'], ('black', 'cherry,'): ['leaf'], ('finish', 'to'): ['keep'], ('Browse', 'our'): ['exclusive'], ('it', 'all'): ['honest.'], ('layer', 'of'): ['tannins,'], ('on', 'the'): ['finish'], ('exclusive', 'lists'): ['of'], ('a', 'floral'): ['quality.'], ('the', 'finish'): ['to'], ('made.', 'Browse'): ['our'], ('a', 'smoky'): ['finish.'], ('with', 'a'): ['smoky', 'laser'], ('through', '2013.'): ['50,000'], ('lemon,', 'lime,'): ['grapefruit'], ('apricot', 'flavors.'): ['Hints'], ("world's", 'top'): ['wine'], ('and', 'violet'): ['flavors'], ('Vibrant', 'and'): ['mouthwatering,'], ('and', 'a'): ['floral'], ('tangy', 'enough'): ['on'], ('citrus,', 'pear'): ['and'], ('fresh', 'red,'): ['which'], ('50,000', 'cases'): ['imported.'], ('by', 'a'): ['fine', 'focused,'], ('a', 'focused,'): ['tar-tinged'], ('and', 'juicy.'): ['Drink'], ('tannins', 'support'): ['a'], ('layers', 'of'): ['plum,'], ('support', 'a'): ['pleasingly'], ('spice', 'and'): ['violet'], ('herbs', 'and'): ['flowers']}

We'll create a new function that we can feed this Markov'ian dictionary to and have the newly generated test we want returned.

def create_markcov_text(markcov_dict):
    # Pick a random starting point
    selected_words_tuple = choice(markcov_dict.keys())
    markcov_text = [selected_words_tuple[0], selected_words_tuple[1]]

    # Generate the Markcov text, ending the Markcov text when we create a "key" that doesn't exist
    while selected_words_tuple in markcov_dict:
        next_word = choice(markcov_dict[selected_words_tuple])
        markcov_text.append(next_word)
        selected_words_tuple = (selected_words_tuple[1], next_word)

    # Return our newly generated Markcov poem/story/text
    return (" ".join(markcov_text)).capitalize()

We'll pass markcov_dict to the above function.

markcov_text = create_markcov_text(markcov_dict)

Drumroll ... let's finally print our newly generated wine'snobbery text.

print(markcov_text)
Tobacco notes, with a laser beam of lemon, lime, grapefruit and apricot flavors. Hints of fresh herbs and flowers add to the complexity. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 40,000 cases made. Vibrant and mouthwatering, with a smoky finish. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Tasty, showing citrus, pear and apple flavors that have a pleasant ripeness and a floral quality. Balanced and juicy. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 250,000 cases imported. Up front, with green apple, melon and butter hints. Just tangy enough on the finish to keep it all honest. Drink now. 1,184 cases made. Browse our exclusive lists of the world's top wine values, top value producers and easy-to-find wines.

Cheers!

Monday, February 6, 2012

Retrieving a Stolen iPhone in Under 72 Hours

Image representing iPhone as depicted in streamingmedia.com
Image via streamingmedia.com

Within 53 hours I was able to get a stolen iPhone safely into police custody. Here's a rough timeline of the steps I went took to get the phone back to the rightful owner:


Saturday, 2/4/2012 @ 8:45 AM -- iPhone was "lost" (i.e., stolen).
  • Called stolen iPhone and it rang four times before going to voicemail, suggesting that it was powered on and had reception. Used the "Find iPhone" app to locate the phone using the Apple ID credentials of the stolen iPhone, but it was unable locate the phone.
  • Using the "Find iPhone" app, sent lock code to stolen iPhone to ensure that it was locked and required an unlock code to access the phone.
  • Using the "Find iPhone" app, sent messages with sound to the stolen iPhone stating that the phone was lost and to call ###-###-#### (my Google Voice number). No response.
  • Shortly thereafter the iPhone was powered down by the "someone" who had possession of the phone.
  • I had the owner of the stolen iPhone change passwords to accounts accessed by the iPhone (e.g., Gmail, Dropbox, etc).
  • Setup the email account used as the Apple ID of the stolen iPhone to forward a copy of all mail from "noreply@me.com" to an account I setup at Boxcar. The reason for doing this was to have push notifications sent to my phone moments after the stolen iPhone would be powered on and receive the commands that I sent from the "Find iPhone" app.
    • There's a Boxcar iOS app that I installed on the device that I was doing the tracking from.
  • Opted not to report the phone as stolen with AT&T yet since I wanted to be able to continue tracking the phone. 
  • Also opted not to remotely wipe the iPhone via the "Find iPhone" app for the same reason.
  • The "Erase all data on iPhone after 10 failed passcode attempts" option was turned off on the iPhone. This was a good thing since it prevented the stolen iPhone from being wiped by 10 failed passcode entries and becoming un-trackable. 

Sunday, 2/5/2012 @ 10:00 AM -- the iPhone was powered on by "someone" and the location of the phone was identified.
  • I received a push notification from Boxcar showing that an email from noreply@me.com was received. That meant that the stolen iPhone was powered on and was now locatable.
  • Used both the "Find iPhone" and "Find Friends" iPhone apps by Apple to track the location of the phone.
    • Another option was logging into iCloud with the Apple ID and password associated with the stolen iPhone ... which I did.
  • Location of the phone tracked to a residential address.
  • Used Google maps and street view to look at the house.
  • Identified the owner of the house using PropertyShark.
  • Gathered information about the owner using Intelius.
  • Again, sent messages with sound to the stolen iPhone stating that the phone was lost and to call ###-###-#### (my Google Voice number). No response.
  • The phone was powered down by the "someone" who had possession of the phone roughly five minutes after it was powered on.
  • Checked AT&T for any unauthorized calls. There were no unauthorized calls.
  • A police report was submitted online to the police department where the phone was stolen. 
    • The police department where the phone was currently located (different city than where the phone was stolen) would not accept a report directly since the theft occurred in a different city.

Monday, 2/6/2012 @ 10:46 AM -- the iPhone was powered on and left on.
  • Using both the "Find iPhone" and "Find Friends" apps, the GPS location of the stolen iPhone was the same address as the address that was identified on Sunday.
  • A police report was submitted online to the police department. The location of theft was intentionally left vague, implying that the theft occurred in the city where the phone was currently being tracked to. The police department was willing to accept the incident report.

Monday, 2/6/2012 @ 1:04 PM -- Called the records and dispatch departments of the PD from the city where the stolen iPhone was currently located.
  • Gave the incident report tracking number to dispatch.
  • After a lengthy conversation, dispatch agreed to send an officer to the house and that the officer would call me back if I needed to cause the stolen iPhone to make a sound.

Monday, 2/6/2012 @ 1:36 PM -- Received a call from the responding officer.
  • The police officer stated that he went to the residential address.
  • The officer stated that the owners of the house were at the residence.
  • The police officer gained possession of the phone.
  • The police officer asked me for the unlock code and some contact data that was on the phone to verify ownership.
  • The officer relayed the convoluted story that the individual who had stolen the iPhone told him.
  • We agreed to check the phone into the police department's chain-of-custody and the stolen iPhone will be picked up by the rightful owner soon.
  • Called the police department from where the phone was stolen, stated that the iPhone was retrieved by another police department, and the case was closed.
... and that's a happy ending.

Apple has more information about locating a lost or stolen iPhone here.

Tuesday, January 17, 2012

Koobface Analysis

Today Facebook announced that it will share the data it has collected about the group of people behind the Koobface virus. Facebook didn't provide any details about the "Koobface gang". However, in a separate blog post independent researchers Jan Drömer and Dirk Kollberg of SophosLabs did provide details of their analysis. I found the SophosLabs article a very interesting read in that it details the painstakingly slow process investigators must endure to piece security incidents together and that given enough time and resources "cybercrimes" can be solved.

"Up until now, Drömer and Kollberg's research has been a closely-guarded secret, known only to a select few in the computer security community and shared with various law enforcement agencies around the globe" ... "At the police's request we have kept the information confidential, but last week news began to leak onto the internet about Anton 'Krotreal' Korotchenko - meaning the cat was well and truly out of the bag." -- Graham Cluley, Sophos analyst
Link to Analysis: http://nakedsecurity.sophos.com/koobface/

Monday, December 19, 2011

DHS Cybersecurity Strategy and New California eCrime Unit

WASHINGTON - JANUARY 08:  The Department of Ho...
Image by Getty Images via @daylife
A couple of interesting items within the information security world...

I. The Department of Homeland Security has released a new cybersecurity strategy document with a two-pronged approach:
  1. Protecting critical infrastructure today
  2. Building a more secure cybersecurity ecosystem for the future
Download the Blueprint for a Secure Cyber Future document (PDF).

II. California Attorney General Kamala D. Harris has announced the creation of a new eCrime Unit to investigate and prosecute technology crime.

"The primary mission of the eCrime Unit is to investigate and prosecute multi-jurisdictional criminal organizations, networks, and groups that perpetrate identity theft crimes, use an electronic device or network to facilitate a crime, or commit a crime targeting an electronic device, network or intellectual property." READ MORE