Welcome to Odyssey’s documentation!

GithubPython.py

The module for using Google BigQuery on Github Data.

class odyssey.core.bigquery.GithubPython.GithubPython(package='', exclude_forks='auto', limit=None)

Provides functionality to build SQL query, connect with BigQuery, etc.

__init__(package='', exclude_forks='auto', limit=None)

Initialize the GithubPython object.

Parameters:
  • package (string) – Name of python package you are interested in using Odyssey to analyze.
  • exclude_forks (string, list or tuple, optional (default="auto")) – In SQL query, exclude both path that contains exclude_forks and repo_name that contains exclude_forks. If exclude_forks is auto, it is set to a list that contains package name.
  • limit (int or None) – Limit your analysis to a certain amount of results. Usually set for billing limit or performance reason.
Returns:

returns an initialized GithubPython object.

Return type:

object

__weakref__

list of weak references to the object (if defined)

get_all(_filter=None)

Get all data (id, code, repo_name and path) subject to filter.

_filter : Filter object or None, optional (default=None)
Filter the result as defined in the filter object.

Returns

list
Returns a list of BigQueryGithubEntry object
get_context(class_name)

Get context for class usage.

Parameters:class_name (string) – Which class to examine context.
Returns:Returns a list of tuple of (context_string, path, repo_name, count).
Return type:list
get_count(_filter=None)

Get count of files subject to filter.

_filter : Filter object or None, optional (default=None)
Filter the result as defined in the filter object.

Returns

int
Returns an integer for count.
get_import_source(val)

Returns a list of BigQueryGithubEntry that imported val.

Parameters:val (string) – The class/submodule/function to examine sources file on
Returns:Returns a list of BigQueryGithubEntry
Return type:list
get_instantiation(class_name)

Get instantiation information for class usage.

Parameters:class_name (string) – Which class to examine instantiation.
Returns:Returns a nested dict: dict(key=arg, value=dict(key=value_that_arg_sets_to, value=count))
Return type:dict
get_least_imported_class(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n least imported class within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n least imported classes to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include classes that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include classes that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_least_imported_function(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n least imported function within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n least imported function to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include functions that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include functions that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_least_imported_submodule(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n least imported submodule within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n least imported submodule to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include submodules that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include submodules that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_most_imported_class(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n most imported classes within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n most imported classes to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include classes that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include classes that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_most_imported_function(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n most imported function within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n least imported function to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include functions that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include functions that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_most_imported_submodule(n=None, use_count_less_than=None, use_count_more_than=None, _filter=None)

Get n most imported submodule within a certain use count range, subject to filter.

Parameters:
  • n (int or None, optional (default=None)) – the top n least imported submodule to be returned. If set to None, all results will be returned.
  • use_count_less_than (int or None, optional (default=None)) – only include submodules that have use count less than this amount. If none, there will be no restriction.
  • use_count_more_than (int or None, optional (default=None)) – only include submodules that have use count more than this amount. If none, there will be no restriction.
  • _filter (Filter object or None (default=None)) – Filter the result as defined in the filter object.
Returns:

Returns a list of tuple (name, count)

Return type:

list

get_top_import_repo(n=None, _filter=None)

Get top imported repo. See RepoImportCounter for details.

n : int or None, optional (default=None)
the top n most imported repo name to be returned. If set to None, all results will be returned.

Returns

list
Returns a list of repo name.
run(query, project='stellar-arcadia-173703')

Run SQL query with Google BigQuery. Allow large results. Timeout set to 99999999.

Parameters:
  • query (string) – SQL query to be executed.
  • project (string, optional (default="stellar-arcadia-173703")) – Project to run the query on (for billing, logging, etc. purpose)
Returns:

Returns result in python list.

Return type:

list

set_class_list(L)

Set class list which will be used for ImportAnalyzer to classify import.

Parameters:L (list of string) – class list that should be identified by ImportAnalyzer.
set_exclude_forks(exclude_forks)

Reset exclude_forks.

Parameters:exclude_forks (list or None) – See doc of __init__ for definition of exclude_forks attribute.
set_function_list(L)

Set function list which will be used for ImportAnalyzer to classify import.

Parameters:L (list of string) – function list that should be identified by ImportAnalyzer.
set_limit(limit)

Reset limit.

Parameters:limit (int) – See doc of __init__ for definition of limit attribute.
set_package(package)

Reset package name.

Parameters:package (string) – See doc of __init__ for definition of package attribute.
set_submodule_list(L)

Set submodule list which will be used for ImportAnalyzer to classify import.

Parameters:L (list of string) – submodule list that should be identified by ImportAnalyzer.

BigQueryGithubEntry.py

The module that defines BigQueryGithubEntry class.

class odyssey.core.bigquery.BigQueryGithubEntry.BigQueryGithubEntry(_id, code, repo_name, path)

A struct that contains relevant information about an entry in BigQuery Github table.

__init__(_id, code, repo_name, path)

Initialize the BigQueryGithubEntry object.

Parameters:
  • _id (string) – a hashed value representing a file entry in BigQuery Github table. This is provided in Google BigQuery Github table.
  • code (string) – code string.
  • repo_name (string) – name of the repo. (e.g.: scikit-learn/scikit-learn)
  • path (string) – path of the file. (e.g.: doc/HOWTO_DOCUMENT.rst)
Returns:

returns an initialized BigQueryGithubEntry object.

Return type:

object

__str__()

Encode the code string in utf-8 and return. For printing purpose.

__weakref__

list of weak references to the object (if defined)

get_url()

Returns a GitHub url linking to the file. Possibly an invalid link if the file has been removed.

filter.py

The module that defines Filters.

class odyssey.core.bigquery.filter.And(f1, f2)

And filter takes in two filters and requires both to be true.

__init__(f1, f2)

Initialize the And filter.

Parameters:
Returns:

returns an initialized And filter.

Return type:

object

__str__()

String representation of the filter. Also the string that will appear in SQL query

class odyssey.core.bigquery.filter.Contains(s)

Require code content to contain a specific string.

__init__(s)

Initialize the Contains filter.

Parameters:s (string) – String that needs to contain in code content.
Returns:returns an initialized Contains filter.
Return type:object
__str__()

String representation of the filter. Also the string that will appear in SQL query

class odyssey.core.bigquery.filter.Filter

Base class for other filters to inherit.

__weakref__

list of weak references to the object (if defined)

class odyssey.core.bigquery.filter.Or(f1, f2)

Or filter takes in two filters and requires one of them to be true.

__init__(f1, f2)

Initialize the Or filter.

Parameters:
Returns:

returns an initialized Or filter.

Return type:

object

__str__()

String representation of the filter. Also the string that will appear in SQL query

ImportAnalyzer.py

The module that defines ImportAnalyzer.

class odyssey.core.analyzer.ImportAnalyzer.ImportAnalyzer(package, accepted_list)

ImportAnalyzer analyzes how classes, submodules and functions are imported.

__init__(package, accepted_list)

Initialize the ImportAnalyzer.

Parameters:
  • package (string) – Python package to be counted.
  • accepted_list (string) – A list of tokens that will be extracted out and counted.
Returns:

returns an initialized ImportAnalyzer object.

Return type:

object

__weakref__

list of weak references to the object (if defined)

get_by_filter(f)

Get imported values, filtered by f.

Parameters:f (function) – used as filter(f, get_most_common())
Returns:return a list of tuples containing (value, count)
Return type:list
get_common(n=None, _reverse=True)

Get common imported values.

Parameters:
  • n (int or None, optional (default=None)) – the top n most/least imported values to be returned. If set to None, all results will be returned.
  • _reverse (bool, optional (default=True)) – if _reverse, returns value in descending order.
Returns:

return a list of tuples containing (value, count)

Return type:

list

get_least_common(n=None)

Get least common n imported values.

Parameters:n (int or None, optional (default=None)) – the top n least imported values to be returned. If set to None, all results will be returned.
Returns:return a list of tuples containing (value, count)
Return type:list
get_most_common(n=None)

Get most common n imported values.

Parameters:n (int or None, optional (default=None)) – the top n most imported values to be returned. If set to None, all results will be returned.
Returns:return a list of tuples containing (value, count)
Return type:list
get_source(s)

Get the source entries for a specific value

Parameters:s (string) – Value to get source for. Should be in accepted_list.
Returns:return a list of BigQueryGithubEntry.
Return type:list
parse(entry)

Parse a BigQueryGithubEntry for import analysis.

Parameters:entry (BigQueryGithubEntry) – A BigQueryGithubEntry to be parsed

InstantiationAnalyzer.py

The module that defines InstantiationAnalyzer.

class odyssey.core.analyzer.InstantiationAnalyzer.InstantiationAnalyzer(class_name)

InstantiationAnalyzer parses the code to get the instantiation of classes.

__init__(class_name)

Initialize the InstantiationAnalyzer.

Parameters:class_name (string) – class to be analyzed for instantiation
Returns:returns an initialized InstantiationAnalyzer object.
Return type:object
__weakref__

list of weak references to the object (if defined)

parse(code)

Parse code and analyze for instantiation.

Parameters:code (string) – code string to be parsed.

RepoImportCounter.py

The module that defines RepoImportCounter.

class odyssey.core.analyzer.RepoImportCounter.RepoImportCounter(package)

RepoImportCounter counts how many times other repos import the analyzed package.

__init__(package)

Initialize the RepoImportCounter.

Parameters:package (string) – Python package to be counted
Returns:returns an initialized RepoImportCounter object.
Return type:object
__weakref__

list of weak references to the object (if defined)

get_most_common(n=None)

Get most common n repos.

Parameters:n (int or None, optional (default=None)) – the name of top n repo that imports the package. If set to None, all results will be returned.
Returns:list of tuple containing package name and count.
Return type:list
parse(entry)

Parse a BigQueryGithubEntry for repo import count.

Parameters:entry (BigQueryGithubEntry) – A BigQueryGithubEntry to be parsed

query_builder.py

The module contains helper functions to build SQL query. For example, for the where clause, if there are multiple constraints, we can connect them using connect_with_and / connect_with_or defined in this file.

odyssey.utils.query_builder.connect(connector, *args)

Connect the strings in args with connector. Make sure there’s no additional connector.

odyssey.utils.query_builder.connect_with_and(*args)

Connect the strings in args with AND. Make sure there’s no additional connector.

odyssey.utils.query_builder.connect_with_or(*args)

Connect the strings in args with OR. Make sure there’s no additional connector.

sklearn_meta_data.py

The module contains meta data for sklearn to be used by ImportAnalyzer.

odyssey.utils.sklearn_meta_data.get_all_functions()

A list of all functions. Generated by classes.rst in scikit-learn package.

odyssey.utils.sklearn_meta_data.get_all_models()

A list of all models. Generated by a walk using all_estimators.

odyssey.utils.sklearn_meta_data.get_all_submodules()

A list of all submodules. Generated by classes.rst in scikit-learn package.

Indices and tables