Altcoin Text Mining with Python Pt.1

Introduction

I’ve been always interested in cryptocurrency and have been wondering is it ever possible to know which altcoin is going to be the next hit. Bitcoin and Etherium has been promising this whole period and the size is already too huge to collapse (as I am aware). Also, I’m very much believing that Bitcoin is going to be the next steady financial asset just like gold. bitcoin vs gold 1970s However, I do not see why not we aren’t investing on some small but potential coins like WEMIX and SAND. It would be dumb to invest all your bullets into these, but these coins may become the game changer when you are investing in adequate amount. So, I would like to see if I can make a list of candidates that would leap in the short future.

Planning

The main technique I’m going to implement is text mining with Python. The reason I’m using Python is because it is a lot easy to use when it comes to data-mining and data-manipulation. I chose Twitter as the mining source because it has vast amount of informations related to crypto. Also, it has some rich developer environment which would save my time greatly. You might be familiar with questions such as below. influencer's tweet With questions like this, the replied comments are likely to have some hints about which coin the mass is focused on. So basically, I am going to create an index from scraped retweets/replies and compare it to the real price data. However, there are few things to consider during the process.

  1. Some user might be mentioning the same coin over and over.
  2. Liked retweets/replies could mean something more than none-liked ones.
  3. Some influencers might have some coins that they are trying to promote.
  4. The frequency of mentions could be different throughout certain time.

I am going to adjust to these factors as I proceed.

Progress

1 Text Mine Twitter

In order to have the full control over Twitter to text mine with certain conditions, I had to go through the Twitter API documentation. Also, it would be much more time-efficient than libraries like BeautifulSoup because it is afterall one of the biggest medias. So, first of all, I need to be able to see what a specific influencer has been posting for certain period of time because I need similar question like above. Also, it would return me an endpoint that can be used to track retweets/replies.

1.1 Tweet Lookup

Certain parameters and output that would I must need.

  1. Parameter
    • influencer username
    • from/to time (tweets during certain period)
  2. Output
    • full text
    • useful metrics (likes, retweets/replies counts, etc.)
    • retweets/replies endpoint
    • media attachments (just in case)
    • author (just in case)

To achieve this, I coded the functions below.

import requests
from dotenv import load_dotenv

import os, json
from datetime import datetime, timedelta

import requests

load_dotenv()

API_URL = 'https://api.twitter.com/2'

__header__ = {
        "Authorization": f"Bearer {os.getenv('BEARER_TOKEN')}"
    }

def user_id_lookup(username):
    url = f'{API_URL}/users/by/username/{username}'
    res = requests.get(url, headers=__header__).json()
    return res['data']['id']

def timeline_lookup(id, **kwargs):
    url = f'{API_URL}/users/{id}/tweets'
    payload = {
        'max_results': 100,
        'tweet.fields': 'author_id,attachments,created_at,public_metrics,conversation_id'
    }
    if kwargs:
        payload.update(kwargs)
    res = requests.get(url, params=payload, headers=__header__).json()
    return res

Everything can be downloaded in my github repository.

Written on December 11, 2021