I had an opportunity to attend the NASSCOM Technology and Leadership Forum in Mumbai from 20th to 22nd Feb 2019. It was an amazing experience to listen, gain knowledge and insights from the industry’s best of the bests. CxO’s keynote speeches were focused on technologies like AI, ML, Blockchains et al
My personal favorite out of all the keynote was from Vala Afshar, Chief Digital Evangelist @ Salesforce. It was a privilege to be there and watch him explaining about AI, ML, and the importance of data. Truly inspiring and mesmerized by his depth of knowledge in the IT industry.
As mentioned by Vala Afshar, “Data is the oil of 21st century but oil is just useless thick goop until you refine it into fuel. AI is your refinery“. In those 3 days, a lot of critical information was shared and scattered across all the social media. The reason to write this blog is to share my idea to save those GEM of information which I can keep munching time and again to get inspired and motivated.
Twitter, the most popular social media platform is one of my favorites and wanted to save all those tweets which had hashtag #NASSCOM_TLF (official hashtag of the event). After quick research using ‘DuckDuckGo’, I had decided to use TweePy Twitter for Python module which is developed to use Twitter API to connect, read, write, retweet and send direct messages right from Python.
Tweepy requires twitter app to be created to use Twitter’s API to exchange information between Twitter and Python. I followed this link to set up and run my twitter app.
Here is the code which I wrote to download all the tweets having hashtag #NASSCOM_TLF and save it to an excel file!
# Download tweepy using pip install tweepy
import tweepy
# Pandas dataframe used to get tweets in tabular format and export it excel
import pandas as pd
# Replace consumer key, consumer secret
consumer_key = 'REPLACE'
consumer_secret = 'REPALCE'
# Replace access token key and secret
access_token = 'REPLACE-REPLACE'
access_token_secret = 'REPLACE'
# Authenticate to Twitter API
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
# Create dataframe with columns names
df = pd.DataFrame(columns=['text','timeline', 'username', 'user_id'])
# Initialize lists to store messages
msgs = []
msg =[]
# search twitter with hashtag #NASSCOM_TLF and exclude retweets
for tweet in tweepy.Cursor(api.search, q='#NASSCOM_TLF -filter:retweets',tweet_mode='extended', rpp=100).items():
msg = [tweet.full_text, tweet.created_at, tweet.user.name, tweet.user.screen_name]
msg = tuple(msg)
msgs.append(msg)
# Append tweets stored in lists to dataframe
df = pd.DataFrame(msgs)
# Column header columns having full tweet messages, time, user name and ID
df.columns = ['Tweet Text', 'Tweet Date Time (GMT)', 'Username', 'User ID']
# Check the first 5 tweets to see any errors
print(df.head())
# Create a file
output = "tweets_ntlf.xlsx"
# Export tweets from dataframe to excel
try:
df.to_excel(output, index=False)
except Exception as Error:
print("Unable to get NASSCOM Tweets", Error)

NOTE: I’ve excluded retweets to avoid duplication of information.
Please continue to stop by this blog and share your comments below.
Up Next – Manage Alerts of VMAX array using REST API