don't understand the size difference by now, though
Conversation
Notices
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:36 UTC jartigag
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:37 UTC jartigag
`copy from program '*csv'` is an even better trick
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:39 UTC jartigag
`set default` is a good trick to add columns
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:41 UTC jartigag
let's move it to #postgresql
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:42 UTC jartigag
just want to paste here the original raw data, for the record.
i've only processed 2020-2021 data -
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:45 UTC jartigag
comparing my machines. i don't know very much about cpus, but to me left-side seems better than right-side, isn't it? (except for number of cores and the highlighted line, but it should be irrelevant for this processing, since it isn't multithread).
right-side processes faster, as you can see 🤷
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:45 UTC jartigag
well, this could be the reason:
left-side, hdd disk
right-side, ssd disk🤔
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:46 UTC jartigag
it's gonna be a hard work night
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:48 UTC jartigag
let's put some sense on those dataframes. these some of the most common errors on <1500 elo #chess players:
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:49 UTC jartigag
now i begin to see something 🎉
apparently good players make less mistakes than not-so-good ones.. 🤔😜
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:50 UTC jartigag
to this (4x2% each 5 mins).
i know, obviously parallelization is key. but in this case the important decision was to split raw data and review it manually, instead of wasting more time trying to automatize everything.
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:51 UTC jartigag
such a simple "tweak" (much simpler than multiprocessing.Pool and anything else i've tried these days) make a decisive improvement in performance ✌️
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:51 UTC jartigag
from this (1% each ~3h):
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:52 UTC jartigag
6h 😶 fortunately it was done in one night
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:53 UTC jartigag
recently i started an analysis project, using #lichess data. these screenshots were just downloading this year so far and uncompressing a month 🥵
now i have to preprocess such a BIG data 😅
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:53 UTC jartigag
found a solution. #grep to the rescue! 🦸
https://github.com/jartigag/chess-blunders/blob/master/data/raw/pre_preprocess.sh
-
jartigag (jartigag@mastodon.social)'s status on Sunday, 21-Nov-2021 23:44:33 UTC jartigag
i knew #pandas wasn't a good idea with this #csv files..
why i keep doing this to myself? 🤦♂️
-
jartigag (jartigag@mastodon.social)'s status on Monday, 22-Nov-2021 23:30:46 UTC jartigag
right now, it looks like this (hopefully tomorrow i will have all the data)
-
jartigag (jartigag@mastodon.social)'s status on Tuesday, 23-Nov-2021 21:27:16 UTC jartigag
covid19-lockdown effect you say? 😁
-