Tracking logs in Tutor?

I understand that the event logs created by many containers are contained in places like $TUTOR_ROOT/data/lms/logs/ for example, but they do not seem to follow the format used for the files generated in /edx/var/log/tracking under the Native installation.

How does someone use these logs with Insights for example? How do you collect them or centralize them? Not running under k8s either.

I know some people will suggest Overhang.IO | Tutor Plugins - cairn but we are not ready to go that way right now because of our previous and current investment in Insights. And as long as it is not “officially” dead, we would prefer to continue to use Insights.

So, how have some of you resolved that issue? It would be a rather large roadblock for us right now if we were to move to Production in the coming weeks and not being able to get the tracking logs like we had them under Native.

Any suggestions are welcome. Thank you.

1 Like

Tracking logs look something like this (data from one of our test environments)

{"name": "/courses/course-v1:HEC+MAN-101.2+A2017/about", "context": {"course_id": "course-v1:HEC+MAN-101.2+A2017", "course_user_tags": {}, "user_id": null, "path": "/courses/course-v1:HEC+MAN-101.2+A2017/about", "org_id": "HEC"}, "username": "", "session": "", "ip": "185.191.171.7", "agent": "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)", "host": "test", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-02-20T16:09:15.420906+00:00", "event_type": "/courses/course-v1:HEC+MAN-101.2+A2017/about", "event_source": "server", "page": null}*
*{"name": "/courses/course-v1:PolyMtl+DDGE101.3+A2019/4c2cdc37264f45a88b9e927ef0588769/", "context": {"course_id": "course-v1:PolyMtl+DDGE101.3+A2019", "course_user_tags": {}, "user_id": null, "path": "/courses/course-v1:PolyMtl+DDGE101.3+A2019/4c2cdc37264f45a88b9e927ef0588769/", "org_id": "PolyMtl"}, "username": "", "session": "", "ip": "40.77.167.65", "agent": "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)", "host": "test", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-02-20T16:41:36.624878+00:00", "event_type": "/courses/course-v1:PolyMtl+DDGE101.3+A2019/4c2cdc37264f45a88b9e927ef0588769/", "event_source": "server", "page": null}

Logs in $TUTOR_ROOT/data/lms/logs/ look something like this (from one of our Tutor test environments)

2022-03-04 12:15:24,231 INFO 1 [celery.worker.consumer.mingle] [user None] [ip None] mingle.py:43 - mingle: searching for neighbors*
*2022-03-04 12:15:25,269 INFO 1 [celery.worker.consumer.mingle] [user None] [ip None] mingle.py:46 - mingle: sync with 1 nodes*
*2022-03-04 12:15:25,270 INFO 1 [celery.worker.consumer.mingle] [user None] [ip None] mingle.py:50 - mingle: sync complete*
*2022-03-04 12:15:25,309 INFO 1 [celery.apps.worker] [user None] [ip None] worker.py:161 - celery@edx.lms.core.default.%bb9991792fbc ready.*

But they do also include some tracking logs

2022-03-04 15:11:14,545 INFO 30 [tracking] [user None] [ip 66.249.70.44] logger.py:41 - {"name": "/robots.txt", "context": {"user_id": null, "path": "/robots.txt", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "", "session": "", "ip": "66.249.70.44", "agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "host": "test", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-03-04T20:11:14.544912+00:00", "event_type": "/robots.txt", "event_source": "server", "page": null}*
*2022-03-04 15:11:16,908 INFO 13 [tracking] [user None] [ip 66.249.70.46] logger.py:41 - {"name": "/faq", "context": {"user_id": null, "path": "/faq", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "", "session": "", "ip": "66.249.70.46", "agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "host": "test", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-03-04T20:11:16.907851+00:00", "event_type": "/faq", "event_source": "server", "page": null}*

Does it mean that we would need to process the current Tutor logs in order to extract the tracking info and create the tracking.log files we had under Native?

Inquiring minds want to know.

@regis what is your take on this?

There is a log file that contains only events $(tutor config printroot)/data/lms/logs/tracking.log

Yes, but it is not in the tracking log format expected by Insights and produced by the Native installation. I would still need to reformat them in order to remove the contents prefixing the event itself.

It should not be difficult to go for example from the format we get

2022-03-08 12:14:29,300 INFO 24 [tracking] [user None] [ip 66.249.66.12] logger.py:41 - {"name": "/", "context": {"user_id": null, "path": "/", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "", "session": "", "ip": "66.249.66.12", "agent": "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "host": "test-host.domain.com", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-03-08T17:14:29.300223+00:00", "event_type": "/", "event_source": "server", "page": null}

to the format expected

{"name": "/", "context": {"user_id": null, "path": "/", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "", "session": "", "ip": "66.249.66.12", "agent": "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)", "host": "test-host.domain.com", "referer": "", "accept_language": "", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-03-08T17:14:29.300223+00:00", "event_type": "/", "event_source": "server", "page": null}

In Django, logs are formatted following the configuration stored in the LOGGING setting. In the native installation, the formatter that is associated to the “tracking” handler is “raw”: edx-platform/logsettings.py at 7686014144930bb9fda4ae8beb66f0392855e55a · openedx/edx-platform · GitHub

In Tutor, that formatter is “standard”: tutor/common_all.py at 38a67e6c64feae4d315008af6d2646308d3acde4 · overhangio/tutor · GitHub

All you have to do to emit tracking logs in the raw format is to create a plugin with the following “openedx-common-settings” patch:

LOGGING["handlers"]["tracking"]["formatter"] = "raw"

Beware that logs emitted in the raw format will not be correctly parsed by Cairn – though we could certainly fix that in the future.

2 Likes

Thanks @regis that explains a lot. I will give it a try on my migration test system. And then I will have to decide if we give Cairn a try in the next few weeks. I’ll contact you for a license in the next few weeks.

One of my colleagues just brought up a very good point while I was discussing this solution with him.

And you even brought it up.

Our previous and current tracking logs that are currently in S3 and used by Insights are in the “raw” format and would not be correctly parsed by Cairn even if we were to bring them back on the local server.

We will need to think this through a little bit more and decide how we want to handle previous analytics vs current analytics vs future analytics. This might get more political than technical with our partners and course creators.

I understand that analytics may be a sensitive topic. Regarding the format of tracking logs, I’m fairly confident that we can get the raw format to work with Cairn.