don't understand the size difference by now, though
Conversation
Notices
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:36 UTC jartigag -
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:37 UTC jartigag `copy from program '*csv'` is an even better trick
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:39 UTC jartigag `set default` is a good trick to add columns
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:41 UTC jartigag let's move it to #postgresql
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:42 UTC jartigag just want to paste here the original raw data, for the record.
i've only processed 2020-2021 data -
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:45 UTC jartigag comparing my machines. i don't know very much about cpus, but to me left-side seems better than right-side, isn't it? (except for number of cores and the highlighted line, but it should be irrelevant for this processing, since it isn't multithread).
right-side processes faster, as you can see 🤷
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:45 UTC jartigag well, this could be the reason:
left-side, hdd disk
right-side, ssd disk🤔
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:46 UTC jartigag it's gonna be a hard work night
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:48 UTC jartigag let's put some sense on those dataframes. these some of the most common errors on <1500 elo #chess players:
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:49 UTC jartigag now i begin to see something 🎉
apparently good players make less mistakes than not-so-good ones.. 🤔😜
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:50 UTC jartigag to this (4x2% each 5 mins).
i know, obviously parallelization is key. but in this case the important decision was to split raw data and review it manually, instead of wasting more time trying to automatize everything.
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:51 UTC jartigag such a simple "tweak" (much simpler than multiprocessing.Pool and anything else i've tried these days) make a decisive improvement in performance ✌️
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:51 UTC jartigag from this (1% each ~3h):
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:52 UTC jartigag 6h 😶 fortunately it was done in one night
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:53 UTC jartigag recently i started an analysis project, using #lichess data. these screenshots were just downloading this year so far and uncompressing a month 🥵
now i have to preprocess such a BIG data 😅
-
jartigag (jartigag@mastodon.social)'s status on Wednesday, 17-Nov-2021 20:45:53 UTC jartigag found a solution. #grep to the rescue! 🦸
https://github.com/jartigag/chess-blunders/blob/master/data/raw/pre_preprocess.sh
-
jartigag (jartigag@mastodon.social)'s status on Sunday, 21-Nov-2021 23:44:33 UTC jartigag i knew #pandas wasn't a good idea with this #csv files..
why i keep doing this to myself? 🤦♂️
-
jartigag (jartigag@mastodon.social)'s status on Monday, 22-Nov-2021 23:30:46 UTC jartigag right now, it looks like this (hopefully tomorrow i will have all the data)
-
jartigag (jartigag@mastodon.social)'s status on Tuesday, 23-Nov-2021 21:27:16 UTC jartigag covid19-lockdown effect you say? 😁
-